[GitHub] spark pull request #22574: [SPARK-25559][SQL] Just remove the unsupported pr...

dbtsai Thu, 27 Sep 2018 23:37:24 -0700

Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22574#discussion_r221153414
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
    @@ -488,26 +494,25 @@ private[parquet] class ParquetFilters(
               .map(_(nameToParquetField(name).fieldName, value))
     
           case sources.And(lhs, rhs) =>
    -        // At here, it is not safe to just convert one side if we do not 
understand the
    -        // other side. Here is an example used to explain the reason.
    -        // Let's say we have NOT(a = 2 AND b in ('1')) and we do not 
understand how to
    -        // convert b in ('1'). If we only convert a = 2, we will end up 
with a filter
    -        // NOT(a = 2), which will generate wrong results.
    -        // Pushing one side of AND down is only safe to do at the top 
level.
    -        // You can see ParquetRelation's initializeLocalJobFunc method as 
an example.
    -        for {
    -          lhsFilter <- createFilter(schema, lhs)
    -          rhsFilter <- createFilter(schema, rhs)
    -        } yield FilterApi.and(lhsFilter, rhsFilter)
    +        // If the unsupported predicate is in the top level `And` 
condition or in the child
    +        // `And` condition before hitting `Not` or `Or` condition, it can 
be safely removed.
    +        (createFilterHelper(nameToParquetField, lhs, canRemoveOneSideInAnd 
= true),
    --- End diff --
    
    Addressed. Thanks.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22574: [SPARK-25559][SQL] Just remove the unsupported pr...

Reply via email to