[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-05-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13371#discussion_r65303008 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -578,62 +583,6 @@ private[sql]

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-05-31 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/13371 It is a good idea to add it if parquet supports it (I have an impression that parquet does not support it. But maybe I am wrong). I think having benchmark results is a good practice, so we can

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-05-31 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13371#discussion_r65302925 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -344,6 +344,11 @@ private[sql] class

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-05-31 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13371#discussion_r65302899 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -344,6 +344,11 @@ private[sql] class

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-05-31 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/13371 BTW, I can't see any reason not to add a row-group level filter for parquet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-05-31 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/13371 @yhuai As you can see, this is not to fix a bug/problem. So I think it might be hard to provide a test case for it. I will try to do the benchmark. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-05-31 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13371#discussion_r65301812 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -578,62 +583,6 @@ private[sql] object

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-05-31 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/13371 Can you provide a test case that shows the problem? Also, can you provide benchmark results of the performance benefit? --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-05-31 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13371#discussion_r65301654 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -344,6 +344,11 @@ private[sql] class

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at RowGroups ...

2016-05-31 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13371#discussion_r65301661 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -578,62 +583,6 @@ private[sql] object

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at ...

2016-05-29 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/13371#issuecomment-222408752 also cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at ...

2016-05-29 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/13371#issuecomment-222408282 cc @nongli @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at ...

2016-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13371#issuecomment-93505 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at ...

2016-05-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13371#issuecomment-93503 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at ...

2016-05-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13371#issuecomment-93479 **[Test build #59550 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59550/consoleFull)** for PR 13371 at commit

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at ...

2016-05-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13371#issuecomment-91012 **[Test build #59550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59550/consoleFull)** for PR 13371 at commit

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at ...

2016-05-27 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/13371#issuecomment-90971 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at ...

2016-05-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13371#issuecomment-90965 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at ...

2016-05-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13371#issuecomment-90964 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at ...

2016-05-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13371#issuecomment-90957 **[Test build #59549 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59549/consoleFull)** for PR 13371 at commit

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at ...

2016-05-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13371#issuecomment-90740 **[Test build #59549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59549/consoleFull)** for PR 13371 at commit

[GitHub] spark pull request: [SPARK-15639][SQL] Try to push down filter at ...

2016-05-27 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/13371 [SPARK-15639][SQL] Try to push down filter at RowGroups level for parquet reader ## What changes were proposed in this pull request? When we use vecterized parquet reader, although the