Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19776#discussion_r152156397
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
---
@@ -497,7 +497,19 @@ object DataSourceStrategy {
Some(sources.IsNotNull(a.name))
case expressions.And(left, right) =>
- (translateFilter(left) ++
translateFilter(right)).reduceOption(sources.And)
+ // See SPARK-12218 for detailed discussion
+ // It is not safe to just convert one side if we do not understand
the
+ // other side. Here is an example used to explain the reason.
+ // Let's say we have (a = 2 AND trim(b) = 'blah') OR (c > 0)
+ // and we do not understand how to convert trim(b) = 'blah'.
+ // If we only convert a = 2, we will end up with
+ // (a = 2) OR (c > 0), which will generate wrong results.
+ // Pushing one leg of AND down is only safe to do at the top level.
+ // You can see ParquetFilters' createFilter for more details.
+ for {
+ leftFilter <- translateFilter(left)
+ rightFilter <- translateFilter(right)
+ } yield sources.And(leftFilter, rightFilter)
--- End diff --
We do not need to clean up the codes in this PR. Let us minimize the code
changes and it can simplify the backport.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]