[GitHub] spark pull request #19907: [SPARK-22712][SQL] Use `buildReaderWithPartitionV...

dongjoon-hyun Wed, 06 Dec 2017 09:57:59 -0800

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19907#discussion_r155312307
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 ---
    @@ -167,8 +169,10 @@ class OrcFileFormat
             val iter = new RecordReaderIterator[OrcStruct](orcRecordReader)
             Option(TaskContext.get()).foreach(_.addTaskCompletionListener(_ => 
iter.close()))
     
    -        val unsafeProjection = UnsafeProjection.create(requiredSchema)
    -        val deserializer = new OrcDeserializer(dataSchema, requiredSchema, 
requestedColIds)
    +        val colIds = requestedColIds ++ 
List.fill(partitionSchema.length)(-1).toArray[Int]
    +        val unsafeProjection = UnsafeProjection.create(resultSchema)
    --- End diff --
    
    Parquet Vectorization work like the following.
    ```
          // UnsafeRowParquetRecordReader appends the columns internally to 
avoid another copy.
          if (parquetReader.isInstanceOf[VectorizedParquetRecordReader] &&
              enableVectorizedReader) {
            iter.asInstanceOf[Iterator[InternalRow]]
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19907: [SPARK-22712][SQL] Use `buildReaderWithPartitionV...

Reply via email to