Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21623#discussion_r198553569
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -22,16 +22,23 @@ import java.sql.Date
import org.apache.parquet.filter2.predicate._
import org.apache.parquet.filter2.predicate.FilterApi._
import org.apache.parquet.io.api.Binary
+import org.apache.parquet.schema.PrimitiveComparator
import org.apache.spark.sql.catalyst.util.DateTimeUtils
import org.apache.spark.sql.catalyst.util.DateTimeUtils.SQLDate
+import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.sources
import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
/**
* Some utility function to convert Spark data source filters to Parquet
filters.
*/
-private[parquet] class ParquetFilters(pushDownDate: Boolean) {
+private[parquet] class ParquetFilters() {
+
+ val sqlConf: SQLConf = SQLConf.get
--- End diff --
This should pass in `pushDownDate` and `pushDownStartWith` like the
previous version did with just the date setting.
The SQLConf is already available in ParquetFileFormat and it *would* be
better to pass it in. The problem is that this class is instantiated in the
function (`(file: PartitionedFile) => { ... }`) that gets serialized and sent
to executors. That means we don't want SQLConf and its references in the
function's closure. The way we got around this before was to put boolean config
vals in the closure instead. I think you should go with that approach.
I'm not sure what `SQLConf.get` is for or what a correct use would be.
@gatorsmile, can you comment on use of `SQLConf.get`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]