[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...

dongjoon-hyun Thu, 03 May 2018 10:22:12 -0700

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21224#discussion_r185876764
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
    @@ -342,6 +342,7 @@ class ParquetFileFormat
           sparkSession.sessionState.conf.parquetFilterPushDown
         // Whole stage codegen (PhysicalRDD) is able to deal with batches 
directly
         val returningBatch = supportBatch(sparkSession, resultSchema)
    +    val pushDownDate = sqlConf.parquetFilterPushDownDate
    --- End diff --
    
    Can we pass `pushed` instead of declaring new `pushDownDate`? 
    The following can be handled at line 345 here.
    
    ```scala
           // Try to push down filters when filter push-down is enabled.
           val pushed = if (enableParquetFilterPushDown) {
             filters
               // Collects all converted Parquet filter predicates. Notice that 
not all predicates can be
               // converted (`ParquetFilters.createFilter` returns an 
`Option`). That's why a `flatMap`
               // is used here.
              .flatMap(new 
ParquetFilters(pushDownDate).createFilter(requiredSchema, _))
              .reduceOption(FilterApi.and)
           } else {
             None
           }
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...

Reply via email to