Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/10278#issuecomment-165379694
@gatorsmile Sorry for the late reply and thanks for the nice catch!
The `In` predicate push down issue had been tracked by SPARK-11164, and
done as part of PR #8956. Unfortunately that we didn't merge that PR due to
other problems in it. Could you please add SPARK-11164 to your PR title?
For the `Not` push-down rule:
1. I'm for adding it to branch-1.5 since it's a pretty safe one.
2. I think we might also want to add more general [CNF][1] conversion rule
to master, which should be done in a separate PR, of course.
Since we don't have existential / universal quantifier in our predicates, I
think CNF conversion in Spark SQL can be as simple as keeping pushing `Not` and
`Or` inward (or downward) using De Morgan's laws and the distributive law:
```scala
object CNFConversion extends Rule[LogicalPlan] {
override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case filter: Filter =>
import org.apache.spark.sql.catalyst.dsl.expressions._
filter.copy(condition = filter.condition.transform {
case Not(x Or y) => !x && !y
case Not(x And y) => !x || !y
case (x And y) Or z => (x || z) && (y || z)
case x Or (y And z) => (x || y) && (x || z)
})
}
}
```
(Notice that this version doesn't handle common expression elimination.)
That said, the `Not` push-down rule is actually a subset of CNF conversion.
There had once been a PR aimed to add CNF conversion for data source filter
push-down only, but wasn't merged (see SPARK-6624 and PR #6713). As @marmbrus
commented there, CNF conversion might be worth adding to the optimizer.
@rxin @marmbrus Not super confident about the CNF conversion conclusion
above, please correct me if I'm wrong.
[1]: https://en.wikipedia.org/wiki/Conjunctive_normal_form
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]