[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...

viirya Tue, 21 Nov 2017 04:31:01 -0800

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19776#discussion_r152260557
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
    @@ -497,7 +497,19 @@ object DataSourceStrategy {
             Some(sources.IsNotNull(a.name))
     
           case expressions.And(left, right) =>
    -        (translateFilter(left) ++ 
translateFilter(right)).reduceOption(sources.And)
    +        // See SPARK-12218 for detailed discussion
    +        // It is not safe to just convert one side if we do not understand 
the
    +        // other side. Here is an example used to explain the reason.
    +        // Let's say we have (a = 2 AND trim(b) = 'blah') OR (c > 0)
    +        // and we do not understand how to convert trim(b) = 'blah'.
    +        // If we only convert a = 2, we will end up with
    +        // (a = 2) OR (c > 0), which will generate wrong results.
    +        // Pushing one leg of AND down is only safe to do at the top level.
    +        // You can see ParquetFilters' createFilter for more details.
    +        for {
    +          leftFilter <- translateFilter(left)
    +          rightFilter <- translateFilter(right)
    +        } yield sources.And(leftFilter, rightFilter)
    --- End diff --
    
    Although Catalyst predicate expressions are all converted to 
`sources.Filter` when we try to push down them. Not all convertible filters can 
be handled by Parquet and ORC. So I think we still can face the case only one 
sub-filter of `AND` can be pushed down by the file format.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19776: [SPARK-22548][SQL] Incorrect nested AND expressio...

Reply via email to