[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-17 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-165379694 @gatorsmile Sorry for the late reply and thanks for the nice catch! The `In` predicate push down issue had been tracked by SPARK-11164, and done as part of

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-165320679 Yeah, you can say that. For example, the original filter is ```not (a = 2 and b in ('1', '2'))```. However, Spark 1.5.2 only pushes down ```not (a = 2)```.

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-165334450 https://github.com/apache/spark/pull/10344 shows that the test fails with out 1.5. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-165339904 @gatorsmile @liancheng Looks like we only push a part of the predicate down if we do not understand other parts. Is there any other kind of combinations that can trigger

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-165340920 @yhuai Based on my understanding, if including the fix of `IN` in this PR, we have covered all the filters. The only exceptions are the ones explained in

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-165297020 Yeah, it works without https://github.com/apache/spark/pull/5700. However, I still hope we can backport https://github.com/apache/spark/pull/5700. Without

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-165298276 Sure, will do it tonight. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-165301819 @gatorsmile So, the problem is Spark SQL generates wrong parquet filter? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-165296448 @gatorsmile how about we also create a jira against 1.5? So, we can use that to test the fix (later when we merge PR, we can merge this one if there is no conflict.

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-165297940 @gatorsmile Can you create a pr for 1.5? We can do this. The first commit is to just have your test case. Then, our jenkins should fail. Finally, we add your fix and

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/10278#discussion_r47877766 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -265,7 +268,10 @@ private[sql] object

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-14 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164604885 @liancheng can you look at this? Seems pretty serious if we are returning wrong answers. /cc @yhuai --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164265237 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164265233 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164264969 **[Test build #47623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47623/consoleFull)** for PR 10278 at commit

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164255495 **[Test build #47623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47623/consoleFull)** for PR 10278 at commit

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164232814 After reading the source codes, it does not make sense we do not push down `IN` to Parquet in the above example: ```"not (a = 2 and b in ('1', '2'))"```.

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/10278 [SPARK-12218] [SQL] Fixed the Parquet's filter generation rule when `Not` is included in Parquet filter pushdown When applying the operator `Not`, the current generation rule for Parquet

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164190262 **[Test build #47616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47616/consoleFull)** for PR 10278 at commit

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164175763 **[Test build #47615 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47615/consoleFull)** for PR 10278 at commit

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164178387 After reading the other push-down PR, I think it also needs a review from @liancheng . Welcome any comment! Thanks! --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164188245 **[Test build #47615 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47615/consoleFull)** for PR 10278 at commit

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164188283 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164188282 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164209122 **[Test build #47618 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47618/consoleFull)** for PR 10278 at commit

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164213638 **[Test build #47618 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47618/consoleFull)** for PR 10278 at commit

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164213704 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164213703 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164203719 Its fine if the test only fails on 1.5 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164204488 Great! : ) Let me also post the test case I did in the latest 1.5. Without my fix, the first call of show() did not return the row (2, 0).

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164204557 I might find another bug in Parquet pushdown. Will submit another PR later when I can confirm it. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164202075 Do you have a test case that actually shows a wrong answer being computed? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164202142 This only happens in 1.5. Do you need me to write a test case for 1.5? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164202611 Any bug fix should have a regression test. We could always change the optimizer in a way that does not hide this bug anymore. --- If your project is set up for it,

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164202727 Ok, will make a try to force it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164198466 **[Test build #47616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47616/consoleFull)** for PR 10278 at commit

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164198540 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10278#issuecomment-164198539 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your