[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-17 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-165379694
  
@gatorsmile Sorry for the late reply and thanks for the nice catch!

The `In` predicate push down issue had been tracked by SPARK-11164, and 
done as part of PR #8956. Unfortunately that we didn't merge that PR due to 
other problems in it. Could you please add SPARK-11164 to your PR title?

For the `Not` push-down rule:

1. I'm for adding it to branch-1.5 since it's a pretty safe one.
2. I think we might also want to add more general [CNF][1] conversion rule 
to master, which should be done in a separate PR, of course.

Since we don't have existential / universal quantifier in our predicates, I 
think CNF conversion in Spark SQL can be as simple as keeping pushing `Not` and 
`Or` inward (or downward) using De Morgan's laws and the distributive law:

```scala
object CNFConversion extends Rule[LogicalPlan] {
  override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case filter: Filter =>
  import org.apache.spark.sql.catalyst.dsl.expressions._

  filter.copy(condition = filter.condition.transform {
case Not(x Or y) => !x && !y
case Not(x And y) => !x || !y
case (x And y) Or z => (x || z) && (y || z)
case x Or (y And z) => (x || y) && (x || z)
  })
  }
}
```

(Notice that this version doesn't handle common expression elimination.)

That said, the `Not` push-down rule is actually a subset of CNF conversion. 
There had once been a PR aimed to add CNF conversion for data source filter 
push-down only, but wasn't merged (see SPARK-6624 and PR #6713). As @marmbrus 
commented there, CNF conversion might be worth adding to the optimizer.

@rxin @marmbrus Not super confident about the CNF conversion conclusion 
above, please correct me if I'm wrong.

[1]: https://en.wikipedia.org/wiki/Conjunctive_normal_form



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-165320679
  
Yeah, you can say that. 

For example, the original filter is ```not (a = 2 and b in ('1', '2'))```. 
However, Spark 1.5.2 only pushes down ```not (a = 2)```. Thus, the returned 
data from Parquet is incomplete and thus data loss happens.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-165334450
  
https://github.com/apache/spark/pull/10344 shows that the test fails with 
out 1.5.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-165339904
  
@gatorsmile @liancheng Looks like we only push a part of the predicate down 
if we do not understand other parts. Is there any other kind of combinations 
that can trigger this issue? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-165340920
  
@yhuai Based on my understanding, if including the fix of `IN` in this PR, 
we have covered all the filters. The only exceptions are the ones explained in 
https://issues.apache.org/jira/browse/SPARK-11153

Since 1.6 already has the fix (https://github.com/apache/spark/pull/5700) 
that can push `Not` operator to the inner most level, we can say 1.6 is not 
affected by the bug even if some filters are not pushed down. 

Please correct me if anything is not appropriate, @liancheng Thank you! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-165297020
  
Yeah, it works without https://github.com/apache/spark/pull/5700. 

However, I still hope we can backport 
https://github.com/apache/spark/pull/5700. Without it, it will not push down 
the these filters to Parquet. That means, it will have a negative performance 
impact.  

If you need it, I also can create another JIRA for backporting 
https://github.com/apache/spark/pull/5700

Please let me know your opinions. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-165298276
  
Sure, will do it tonight. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-165301819
  
@gatorsmile So, the problem is Spark SQL generates wrong parquet filter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-165296448
  
@gatorsmile how about we also create a jira against 1.5? So, we can use 
that to test the fix (later when we merge PR, we can merge this one if there is 
no conflict. Otherwise, we merge that one to 1.5 and merge this one to 1.6 and 
master).

Also, do we need to backport #5700 to 1.5? Without it, your fix also works, 
right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-165297940
  
@gatorsmile Can you create a pr for 1.5? We can do this. The first commit 
is to just have your test case. Then, our jenkins should fail. Finally, we add 
your fix and jenkins should be good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-16 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/10278#discussion_r47877766
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -265,7 +268,10 @@ private[sql] object ParquetFilters {
   rhsFilter <- createFilter(schema, rhs)
 } yield FilterApi.or(lhsFilter, rhsFilter)
 
-  case sources.Not(pred) =>
+  // Here, we assume the Optimizer's rule BooleanSimplification has 
pushed `Not` operator
+  // to the inner most level.
+  case sources.Not(pred)
+if !pred.isInstanceOf[sources.And] && 
!pred.isInstanceOf[sources.Or] =>
--- End diff --

Nit: The following version might be clearer:

```scala
  // (Copy your comment here)
  case sources.Not(_: sources.And) | sources.Not(_: sources.Or) =>
None

  case sources.Not(pred) =>
createFilter(schema, pred).map(FilterApi.not)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-14 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164604885
  
@liancheng can you look at this?  Seems pretty serious if we are returning 
wrong answers.

/cc @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164265237
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47623/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164265233
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164264969
  
**[Test build #47623 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47623/consoleFull)**
 for PR 10278 at commit 
[`50733c6`](https://github.com/apache/spark/commit/50733c6239b721ecb1f0691bb3d4680235c15a18).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164255495
  
**[Test build #47623 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47623/consoleFull)**
 for PR 10278 at commit 
[`50733c6`](https://github.com/apache/spark/commit/50733c6239b721ecb1f0691bb3d4680235c15a18).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164232814
  
After reading the source codes, it does not make sense we do not push down 
`IN` to Parquet in the above example:
```"not (a = 2 and b in ('1', '2'))"```. 

We should fix these two issues in both 1.5.x and 1.6.x


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/10278

[SPARK-12218] [SQL] Fixed the Parquet's filter generation rule when `Not` 
is included in Parquet filter pushdown

When applying the operator `Not`, the current generation rule for Parquet 
filters simply applies `Not` to all the inclusive/underlying filters. 

For example, when the filter is ```"not (a = 2 and b in ('1', '2'))"```, 
the generated filter is ```not (a=2)```. When we push down this filter to 
Parquet, it will remove all the eligible rows satisfying the condition ```not(b 
in ('1', '2'))```

In the current 1.6, the Optimizer's rule BooleanSimplification added the 
following new rules in the PR(https://github.com/apache/spark/pull/5700): (BTW, 
should we move this to analyzer?) 
```
not(A and B) => not(A) or not(B)
not(A or B) => not(A) and not(B)
```
I do not think we should redo it in the Parquet filter generation. Thus, I 
just added a condition to avoid the incorrect results in case the Optimizer is 
unable to handle all the cases. 

**Question**: how can we include the PR 
https://github.com/apache/spark/pull/5700 into 1.5? Do you need me to submit a 
new PR for 1.5? Or you can do it? This is a critical PR because the result will 
be incorrect without the fix.

CC the original reviewers of https://github.com/apache/spark/pull/5700: 
@marmbrus @cloud-fan 

Thanks!

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark parquetFilterNot

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10278.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10278


commit 79be2c3581551ab24273f3da472269814d0d736e
Author: gatorsmile 
Date:   2015-12-12T18:10:16Z

added a condition for `Not` operator in ParquetFilter.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164190262
  
**[Test build #47616 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47616/consoleFull)**
 for PR 10278 at commit 
[`2ff70bf`](https://github.com/apache/spark/commit/2ff70bfac2c9be9e75cc7840dd3844854f565325).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164175763
  
**[Test build #47615 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47615/consoleFull)**
 for PR 10278 at commit 
[`79be2c3`](https://github.com/apache/spark/commit/79be2c3581551ab24273f3da472269814d0d736e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164178387
  
After reading the other push-down PR, I think it also needs a review from 
@liancheng . Welcome any comment! Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164188245
  
**[Test build #47615 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47615/consoleFull)**
 for PR 10278 at commit 
[`79be2c3`](https://github.com/apache/spark/commit/79be2c3581551ab24273f3da472269814d0d736e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164188283
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47615/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164188282
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164209122
  
**[Test build #47618 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47618/consoleFull)**
 for PR 10278 at commit 
[`c9af771`](https://github.com/apache/spark/commit/c9af771adb998b54c8bfcbdf64ac4fc1b82d14ad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164213638
  
**[Test build #47618 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47618/consoleFull)**
 for PR 10278 at commit 
[`c9af771`](https://github.com/apache/spark/commit/c9af771adb998b54c8bfcbdf64ac4fc1b82d14ad).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164213704
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47618/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164213703
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164203719
  
Its fine if the test only fails on 1.5


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164204488
  
Great! : )

Let me also post the test case I did in the latest 1.5. Without my fix, the 
first call of show() did not return the row (2, 0). 

```scala
withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true") {
  withTempPath { dir =>
val path = s"${dir.getCanonicalPath}/table1"
(1 to 5).map(i => (i, (i%2).toString)).toDF("a", 
"b").write.parquet(path)

val df = sqlContext.read.parquet(path).where("not (a = 2 and b in 
('1'))")
df.show()

val df1 = sqlContext.read.parquet(path).where("not (a = 2) or not(b 
in ('1'))")
df1.show()
  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164204557
  
I might find another bug in Parquet pushdown. Will submit another PR later 
when I can confirm it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164202075
  
Do you have a test case that actually shows a wrong answer being computed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164202142
  
This only happens in 1.5. Do you need me to write a test case for 1.5?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164202611
  
Any bug fix should have a regression test.  We could always change the 
optimizer in a way that does not hide this bug anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164202727
  
Ok, will make a try to force it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164198466
  
**[Test build #47616 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47616/consoleFull)**
 for PR 10278 at commit 
[`2ff70bf`](https://github.com/apache/spark/commit/2ff70bfac2c9be9e75cc7840dd3844854f565325).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164198540
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47616/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12218] [SQL] Fixed the Parquet's filter...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10278#issuecomment-164198539
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org