[GitHub] spark pull request #15049: [SPARK-17310][SQL] Add an option to disable recor...

HyukjinKwon Sat, 11 Nov 2017 22:40:17 -0800

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15049#discussion_r150401639
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
    @@ -370,13 +372,11 @@ class ParquetFileFormat
           } else {
             logDebug(s"Falling back to parquet-mr")
             // ParquetRecordReader returns UnsafeRow
    -        val reader = pushed match {
    -          case Some(filter) =>
    -            new ParquetRecordReader[UnsafeRow](
    -              new ParquetReadSupport,
    -              FilterCompat.get(filter, null))
    -          case _ =>
    -            new ParquetRecordReader[UnsafeRow](new ParquetReadSupport)
    +        val reader = if (pushed.isDefined && enableRecordFilter) {
    --- End diff --
    
    Hm, I am active here. Could you share what's the problem here in this 
solution you'd imagine and discuss first?
    
    I think there's no point of disabling row group filtering and, @jiangxb1987 
asked if it actually disables row group filtering too, which might downgrade 
the performance. Current change does not do this.
    
    I added a test for this concern - 
https://github.com/apache/spark/pull/15049#discussion_r147573054.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15049: [SPARK-17310][SQL] Add an option to disable recor...

Reply via email to