Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/22574#discussion_r221153414
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -488,26 +494,25 @@ private[parquet] class ParquetFilters(
.map(_(nameToParquetField(name).fieldName, value))
case sources.And(lhs, rhs) =>
- // At here, it is not safe to just convert one side if we do not
understand the
- // other side. Here is an example used to explain the reason.
- // Let's say we have NOT(a = 2 AND b in ('1')) and we do not
understand how to
- // convert b in ('1'). If we only convert a = 2, we will end up
with a filter
- // NOT(a = 2), which will generate wrong results.
- // Pushing one side of AND down is only safe to do at the top
level.
- // You can see ParquetRelation's initializeLocalJobFunc method as
an example.
- for {
- lhsFilter <- createFilter(schema, lhs)
- rhsFilter <- createFilter(schema, rhs)
- } yield FilterApi.and(lhsFilter, rhsFilter)
+ // If the unsupported predicate is in the top level `And`
condition or in the child
+ // `And` condition before hitting `Not` or `Or` condition, it can
be safely removed.
+ (createFilterHelper(nameToParquetField, lhs, canRemoveOneSideInAnd
= true),
--- End diff --
Addressed. Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]