Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/15049#discussion_r150754360
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -327,6 +327,13 @@ object SQLConf {
.booleanConf
.createWithDefault(false)
+ val PARQUET_RECORD_FILTER_ENABLED =
buildConf("spark.sql.parquet.recordLevelFilter.enabled")
+ .doc("If true, enables Parquet's native record-level filtering using
the pushed down " +
+ "filters. This configuration only has an effect when
'spark.sql.parquet.filterPushdown' " +
+ "is enabled.")
+ .booleanConf
+ .createWithDefault(true)
--- End diff --
From the benchmark numbers, looks Spark-side filtering is always better.
This default value should not change final results too. So a default value
`false` should make sense.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]