Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/15049#discussion_r150401639
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
---
@@ -370,13 +372,11 @@ class ParquetFileFormat
} else {
logDebug(s"Falling back to parquet-mr")
// ParquetRecordReader returns UnsafeRow
- val reader = pushed match {
- case Some(filter) =>
- new ParquetRecordReader[UnsafeRow](
- new ParquetReadSupport,
- FilterCompat.get(filter, null))
- case _ =>
- new ParquetRecordReader[UnsafeRow](new ParquetReadSupport)
+ val reader = if (pushed.isDefined && enableRecordFilter) {
--- End diff --
Hm, I am active here. Could you share what's the problem here in this
solution you'd imagine and discuss first?
I think there's no point of disabling row group filtering and, @jiangxb1987
asked if it actually disables row group filtering too, which might downgrade
the performance. Current change does not do this.
I added a test for this concern -
https://github.com/apache/spark/pull/15049#discussion_r147573054.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]