Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21623#discussion_r198553569
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
    @@ -22,16 +22,23 @@ import java.sql.Date
     import org.apache.parquet.filter2.predicate._
     import org.apache.parquet.filter2.predicate.FilterApi._
     import org.apache.parquet.io.api.Binary
    +import org.apache.parquet.schema.PrimitiveComparator
     
     import org.apache.spark.sql.catalyst.util.DateTimeUtils
     import org.apache.spark.sql.catalyst.util.DateTimeUtils.SQLDate
    +import org.apache.spark.sql.internal.SQLConf
     import org.apache.spark.sql.sources
     import org.apache.spark.sql.types._
    +import org.apache.spark.unsafe.types.UTF8String
     
     /**
      * Some utility function to convert Spark data source filters to Parquet 
filters.
      */
    -private[parquet] class ParquetFilters(pushDownDate: Boolean) {
    +private[parquet] class ParquetFilters() {
    +
    +  val sqlConf: SQLConf = SQLConf.get
    --- End diff --
    
    This should pass in `pushDownDate` and `pushDownStartWith` like the 
previous version did with just the date setting.
    
    The SQLConf is already available in ParquetFileFormat and it *would* be 
better to pass it in. The problem is that this class is instantiated in the 
function (`(file: PartitionedFile) => { ... }`) that gets serialized and sent 
to executors. That means we don't want SQLConf and its references in the 
function's closure. The way we got around this before was to put boolean config 
vals in the closure instead. I think you should go with that approach.
    
    I'm not sure what `SQLConf.get` is for or what a correct use would be. 
@gatorsmile, can you comment on use of `SQLConf.get`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to