[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-20 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2839#issuecomment-59822350 LGTM provided it passes tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-20 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-59866772 There might've been some Jenkins issues recently; going to restart it. --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-20 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-59866801 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-20 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-59866791 BTW for the style, you can do "sbt/sbt scalastyle" locally if you want. Not sure there's a command in Maven. --- If your project is set up for it, you ca

[GitHub] spark pull request: [SPARK-4050][SQL] Fix caching of temporary tab...

2014-10-23 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2912#issuecomment-60315168 Interesting question. For uncaching, I think it makes the most sense to refer to the tables by name even if two cached tables might be the same RDD. Thus you should not

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

2014-10-23 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2126#discussion_r19316607 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala --- @@ -19,19 +19,16 @@ package

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

2014-10-23 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2126#issuecomment-60327161 This looks good to me based on my understanding of Mesos. @tnachen will this still work okay if Mesos is not running as root (and can't switch user)? --- If your pr

[GitHub] spark pull request: mesos executor ids now consist of the slave id...

2014-10-23 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1358#issuecomment-60327322 @tsliwowicz your fix seems good -- thanks for getting to the bottom of this! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

2014-10-23 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2126#issuecomment-60341385 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

2014-10-24 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2126#issuecomment-60354910 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

2014-10-24 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2126#issuecomment-60410309 This might be some issue that snuck into master. @marmbrus have you seen this test failure? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-4102] Remove unused ShuffleReader.stop(...

2014-10-27 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2966#issuecomment-60690678 IMO we should keep this because it will be needed for other shuffle managers that launch threads. Because this is a developer API (where other people may plug in their

[GitHub] spark pull request: [SPARK-4102] Remove unused ShuffleReader.stop(...

2014-10-27 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2966#issuecomment-60690706 BTW the ??? should indeed be changed to {}, it's weird that it says not implemented. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2819#issuecomment-60693263 So regarding the interface, as I mentioned to Joseph, I would like the interfaces to be the same so that people can easily copy code between the languages. Many people

[GitHub] spark pull request: [SPARK-3961] [MLlib] [PySpark] Python API for ...

2014-10-27 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2819#issuecomment-60693309 BTW we can also leave out the default args for now and add them later, if we want to take more time to decide this. But the Python API should definitely include all the

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-27 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1658#discussion_r19453138 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala --- @@ -220,6 +227,83 @@ class JavaSparkContext(val sc: SparkContext) extends

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-27 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-60709302 Thanks for the update, Kevin. Note that there are still a few comments from me on https://github.com/apache/spark/pull/1658/files, do you mind dealing with those

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-27 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1658#discussion_r19453170 --- Diff: core/src/main/scala/org/apache/spark/rdd/BinaryFileRDD.scala --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2841#discussion_r19498392 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala --- @@ -435,23 +446,67 @@ private[parquet] class

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2841#discussion_r19498480 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala --- @@ -423,10 +436,8 @@ private[parquet] class

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2841#discussion_r19498615 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTestData.scala --- @@ -255,6 +256,10 @@ private[sql] object ParquetTestData

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2841#discussion_r19498603 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala --- @@ -460,29 +515,85 @@ private[parquet] class

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-28 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2841#issuecomment-60820609 Hey @saucam, I took a look at this too because I had tried upgrading to Parquet 1.6 in a different branch to use decimals. Made a few comments above. Apart that

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-28 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2841#discussion_r19498939 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetFilters.scala --- @@ -209,25 +221,25 @@ private[sql] object ParquetFilters

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] [WIP] Support fixed-...

2014-10-28 Thread mateiz
GitHub user mateiz opened a pull request: https://github.com/apache/spark/pull/2983 [SPARK-3930] [SPARK-3933] [WIP] Support fixed-precision decimal in SQL, and some optimizations - Adds optional precision and scale to Spark SQL's decimal type, which behave similarly to tho

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-10-28 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2983#issuecomment-60859445 I've marked this as not WIP anymore, because the main TODOs left are in the Hive support. I intend to send that as a separate patch, though I can also add it here.

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-10-28 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2983#issuecomment-60862637 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-10-29 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2983#issuecomment-60884047 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-29 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-60979360 @kmader btw if you don't have time to deal with these comments, let me know; I might be able to take the patch from where it is and implement them. --- If your pr

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-10-29 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2983#issuecomment-60979400 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-10-29 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3003#issuecomment-61006272 IMO we should have a limit by default, but just make it large (say 1 GB). Otherwise nobody will know to configure this. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-10-29 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/3003#discussion_r19572924 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -524,7 +529,14 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-10-29 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/3003#discussion_r19572955 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -68,6 +68,10 @@ private[spark] class TaskSetManager( val

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-10-29 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/3003#discussion_r19573001 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -68,6 +68,10 @@ private[spark] class TaskSetManager( val

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-10-29 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2983#issuecomment-61008300 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-10-29 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2983#issuecomment-61016830 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-10-29 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2983#issuecomment-61020192 Updated this to also read / write decimal precision data from the Hive 0.13 metastore, and fixed a compile error introduced by other recent patches. --- If your project

[GitHub] spark pull request: [SPARK-4102] Remove unused ShuffleReader.stop(...

2014-10-29 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2966#issuecomment-61025155 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-29 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2841#discussion_r19580900 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala --- @@ -460,29 +515,85 @@ private[parquet] class

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-29 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2841#discussion_r19580913 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala --- @@ -460,29 +515,85 @@ private[parquet] class

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-29 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2841#issuecomment-61025530 Alright, thanks for adding the tests. Let's get Michael's feedback on the metadata thing, I don't fully understand it. I guess it allows tasks to query di

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-10-29 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3003#issuecomment-61025739 Hey Davies, one other thing: in Executor, you should just not send back a result if the single result exceeds maxResultSize. That way you avoid killing the driver with

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-10-29 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3003#issuecomment-61025858 Also, can we make TaskResultGetter avoid *fetching* more than maxResultSize for indirect task results? Right now it will fetch them (maybe in parallel) and it's

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-29 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-61031930 Thanks for the update, Kevin. Looks like Jenkins had some issues with git, will retry it. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-10-29 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-61031939 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-10-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/3003#discussion_r19620795 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -210,25 +213,26 @@ private[spark] class Executor( val resultSize

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-10-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/3003#discussion_r19620876 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -563,6 +563,31 @@ class TaskSetManagerSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-10-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/3003#discussion_r19620932 --- Diff: docs/configuration.md --- @@ -112,6 +112,16 @@ of the most common options to set are: + spark.driver.maxResultSize + 1g

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-10-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/3003#discussion_r19620969 --- Diff: docs/configuration.md --- @@ -112,6 +112,16 @@ of the most common options to set are: + spark.driver.maxResultSize + 1g

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-10-30 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3003#issuecomment-61133561 Looks good to me modulo the way the error is propagated from the executor. @kayousterhout do you want to take a look at the TaskResultGetter code too? --- If your

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-10-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/3003#discussion_r19621262 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -563,6 +563,31 @@ class TaskSetManagerSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-10-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/3003#discussion_r19621398 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -563,6 +563,31 @@ class TaskSetManagerSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-10-30 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3003#issuecomment-61193359 Wait, is maxResultSize across collects? That doesn't make sense, it should be enforced for each pending operation. I agree that it would be good to send this size t

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-30 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2841#issuecomment-61193438 Cool, that makes sense. Anyway if this looks good to you, Michael, you should merge it. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-3247][SQL] An API for adding data sourc...

2014-10-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2475#discussion_r19645705 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/package.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-3247][SQL] An API for adding data sourc...

2014-10-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2475#discussion_r19645761 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/package.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-3247][SQL] An API for adding data sourc...

2014-10-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2475#discussion_r19645814 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/package.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-3247][SQL] An API for adding data sourc...

2014-10-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2475#discussion_r19645907 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/package.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-3247][SQL] An API for adding data sourc...

2014-10-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2475#discussion_r19645938 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/package.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-3247][SQL] An API for adding data sourc...

2014-10-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2475#discussion_r19646622 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/package.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-3247][SQL] An API for adding data sourc...

2014-10-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2475#discussion_r19646688 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/package.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-3247][SQL] An API for adding data sourc...

2014-10-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2475#discussion_r19650167 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/package.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-3247][SQL] An API for adding data sourc...

2014-10-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2475#discussion_r19650306 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/package.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-10-31 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2983#issuecomment-61296754 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-10-31 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2983#issuecomment-61305070 Thanks @daveis for the comments; I've now fixed those. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-10-31 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2983#discussion_r19697736 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala --- @@ -90,11 +91,17 @@ object DataType { | "Lon

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-10-31 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2983#discussion_r19697753 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala --- @@ -869,4 +871,35 @@ class ParquetQuerySuite extends QueryTest with

[GitHub] spark pull request: [SPARK-3247][SQL] An API for adding data sourc...

2014-10-31 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2475#discussion_r19701683 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/filters.scala --- @@ -0,0 +1,22 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-3247][SQL] An API for adding data sourc...

2014-10-31 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2475#issuecomment-61358186 The new API for sources looks good to me, thanks for making the changes. It will be easy to plug in a lot of neat data sources here. --- If your project is set up for it

[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...

2014-10-31 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2841#issuecomment-61358979 Thanks, closed it and assigned it to you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

2014-11-01 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-61379142 Thanks @kmader, I merged this now. I manually amended the patch a bit to fix style issues (there were still a bunch of commas without spaces, etc), and I also changed the

[GitHub] spark pull request: SPARK-4040. Update documentation to exemplify ...

2014-11-01 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2964#discussion_r19707119 --- Diff: docs/configuration.md --- @@ -21,16 +21,22 @@ application. These properties can be set directly on a [SparkConf](api/scala/index.html

[GitHub] spark pull request: SPARK-4040. Update documentation to exemplify ...

2014-11-01 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2964#discussion_r19707122 --- Diff: docs/streaming-programming-guide.md --- @@ -586,11 +588,13 @@ Every input DStream (except file stream) is associated with a single [Receiver

[GitHub] spark pull request: SPARK-4040. Update documentation to exemplify ...

2014-11-01 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2964#issuecomment-61379835 Thanks for adding these clarifications, it's a good idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...

2014-11-01 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2983#discussion_r19707994 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala --- @@ -70,16 +70,29 @@ case class GeneratedAggregate

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-11-01 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/3003#discussion_r19708474 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -522,6 +526,24 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-11-01 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/3003#discussion_r19708494 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -210,25 +213,27 @@ private[spark] class Executor( val resultSize

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-11-01 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/3003#discussion_r19708503 --- Diff: docs/configuration.md --- @@ -112,6 +112,18 @@ of the most common options to set are: + spark.driver.maxResultSize + 1g

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-11-01 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/3003#discussion_r19708506 --- Diff: docs/configuration.md --- @@ -112,6 +112,18 @@ of the most common options to set are: + spark.driver.maxResultSize + 1g

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-11-01 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3003#issuecomment-61387167 Hey @davies, this looks good to me. Made a few comments on wording and the name of one method. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-11-01 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3003#issuecomment-61393735 Looks like this has a compile error after the change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

2014-11-02 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3003#issuecomment-61396591 Thanks, merged this in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: use readFully in FixedLengthBinaryRecordReader

2014-11-04 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3093#issuecomment-61740573 Good catch, thanks! Can you check that this is the only version of read() in that code? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: use readFully in FixedLengthBinaryRecordReader

2014-11-04 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3093#issuecomment-61740633 BTW it would be good to open a JIRA issue for this on https://issues.apache.org/jira/browse/SPARK but unfortunately ASF JIRA seems to be down at the moment. --- If your

[GitHub] spark pull request: use readFully in FixedLengthBinaryRecordReader

2014-11-04 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3093#issuecomment-61740644 Jenkins, this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-4222 [CORE] use readFully in FixedLength...

2014-11-05 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3093#issuecomment-61902564 Cool, thanks. Will merge this soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-4186] add binaryFiles and binaryRecords...

2014-11-05 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/3078#discussion_r19916205 --- Diff: python/pyspark/context.py --- @@ -396,6 +396,34 @@ def wholeTextFiles(self, path, minPartitions=None, use_unicode=True): return RDD

[GitHub] spark pull request: [SPARK-4186] add binaryFiles and binaryRecords...

2014-11-05 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3078#issuecomment-61903062 Looks good, I just noticed one weird thing in the docs (probably an issue in the Java/Scala docs but we might as well fix those too). --- If your project is set up for

[GitHub] spark pull request: SPARK-4040. Update documentation to exemplify ...

2014-11-05 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2964#issuecomment-61903163 This looks fine to merge into 1.2; will do so. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-4186] add binaryFiles and binaryRecords...

2014-11-06 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3078#issuecomment-61941161 Looks good, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-2373]RDD add span function (split an RD...

2014-08-28 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1306#issuecomment-53838250 IMO this is too specialized to include. It's small enough that applications can do it themselves, but also fairly confusing unless your RDD is already sorted in som

[GitHub] spark pull request: [SPARK-3309] [PySpark] Put all public API in _...

2014-08-29 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2205#discussion_r16922689 --- Diff: python/pyspark/__init__.py --- @@ -61,13 +61,16 @@ from pyspark.conf import SparkConf from pyspark.context import SparkContext

[GitHub] spark pull request: spark-729: predictable closure capture

2014-08-29 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1322#issuecomment-53948235 Alright, feel free to describe this on the JIRA too if you'd like input. --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark pull request: [SPARK-3094] [PySpark] compatitable with PyPy

2014-08-30 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2144#issuecomment-53970492 @davies just curious, do all the unit tests run if you do `run-tests` with `pypy`? We should make sure they do, and add a command in there to test this in Jenkins (ask

[GitHub] spark pull request: [SPARK-2889] Create Hadoop config objects cons...

2014-08-30 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1843#issuecomment-53971505 Thanks Marcelo! I've merged this in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-1919] Fix Windows spark-shell --jars

2014-08-30 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/2211#discussion_r16932123 --- Diff: repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala --- @@ -965,11 +966,9 @@ class SparkILoop(in0: Option[BufferedReader], protected val out

[GitHub] spark pull request: [SPARK-3010] fix redundant conditional

2014-08-30 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1992#issuecomment-53972938 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3010] fix redundant conditional

2014-08-30 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1992#issuecomment-53972940 Looks good to me pending tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: Check if margin > 0, not if prob > 0.5

2014-08-30 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1057#issuecomment-53972967 Hey @naftaliharris, might closing this pull request now that this has been fixed in other PRs? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [WIP] SPARK-1192: The document for most of the...

2014-08-30 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/85#issuecomment-53973039 @CodingCat are you still working on this patch? The doc page changed significantly in 1.0, so maybe a lot of this info is still in, but it would be good to look over it and

  1   2   3   4   5   6   7   8   9   10   >