HyukjinKwon commented on a change in pull request #23855: [SPARK-26930][SQL]
Tests in ParquetFilterSuite don't verify filter class
URL: https://github.com/apache/spark/pull/23855#discussion_r258885021
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
##########
@@ -109,13 +114,16 @@ class ParquetFilterSuite extends QueryTest with
ParquetTest with SharedSQLContex
DataSourceStrategy.selectFilters(maybeRelation.get,
maybeAnalyzedPredicate.toSeq)
assert(selectedFilters.nonEmpty, "No filter is pushed down")
- selectedFilters.foreach { pred =>
+ val pushedParquetFilters = selectedFilters.map { pred =>
val maybeFilter = parquetFilters.createFilter(
new SparkToParquetSchemaConverter(conf).convert(df.schema), pred)
assert(maybeFilter.isDefined, s"Couldn't generate filter predicate
for $pred")
- // Doesn't bother checking type parameters here (e.g. `Eq[Integer]`)
- maybeFilter.exists(_.getClass === filterClass)
Review comment:
@cloud-fan, to cut it short,
https://github.com/apache/spark/commit/ef77003178eb5cdcb4fe519fc540917656c5d577
missed to add an `assert`. Pushed filters are now like
before: `a == 1`
after: `is not null && a == 1`
These are turned into Parquet filters and here, it checks the filter's
class. I thought it's overkill to traverse tree and/or whitelisting somehow
with more codes. Currently, it simply checks the root filter after disabling
the constraint filter.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]