[GitHub] spark pull request #15049: [SPARK-17310][SQL] Add an option to disable recor...

HyukjinKwon Sat, 28 Oct 2017 06:44:25 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15049#discussion_r147554720
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -258,6 +258,11 @@ object SQLConf {
         .booleanConf
         .createWithDefault(false)
     
    +  val PARQUET_RECORD_FILTER_ENABLED = 
buildConf("spark.sql.parquet.recordFilter")
    +    .doc("Whether to allow the record-level filtering in Parquet for Spark 
to filter them.")
    +    .booleanConf
    +    .createWithDefault(true)
    --- End diff --
    
    It does show a similar pattern. However, ORC's filter pushdown does not 
support filtering record by record but only skipping the blocks (stripe), up to 
my knowledge. I am aware of bloom filter in ORC too. My untested rough wild 
guess is, it is faster than Spark side filtering.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15049: [SPARK-17310][SQL] Add an option to disable recor...

Reply via email to