Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/15049#discussion_r147554720
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -258,6 +258,11 @@ object SQLConf {
.booleanConf
.createWithDefault(false)
+ val PARQUET_RECORD_FILTER_ENABLED =
buildConf("spark.sql.parquet.recordFilter")
+ .doc("Whether to allow the record-level filtering in Parquet for Spark
to filter them.")
+ .booleanConf
+ .createWithDefault(true)
--- End diff --
It does show a similar pattern. However, ORC's filter pushdown does not
support filtering record by record but only skipping the blocks (stripe), up to
my knowledge. I am aware of bloom filter in ORC too. My untested rough wild
guess is, it is faster than Spark side filtering.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]