Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19907#discussion_r155213150
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 ---
    @@ -167,8 +169,10 @@ class OrcFileFormat
             val iter = new RecordReaderIterator[OrcStruct](orcRecordReader)
             Option(TaskContext.get()).foreach(_.addTaskCompletionListener(_ => 
iter.close()))
     
    -        val unsafeProjection = UnsafeProjection.create(requiredSchema)
    -        val deserializer = new OrcDeserializer(dataSchema, requiredSchema, 
requestedColIds)
    +        val colIds = requestedColIds ++ 
List.fill(partitionSchema.length)(-1).toArray[Int]
    +        val unsafeProjection = UnsafeProjection.create(resultSchema)
    --- End diff --
    
    can we follow parquet and just join the data row and partition row, and do 
a final unsafe projection? It's much easier and there is no performance 
difference.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to