Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19907#discussion_r155213150
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
---
@@ -167,8 +169,10 @@ class OrcFileFormat
val iter = new RecordReaderIterator[OrcStruct](orcRecordReader)
Option(TaskContext.get()).foreach(_.addTaskCompletionListener(_ =>
iter.close()))
- val unsafeProjection = UnsafeProjection.create(requiredSchema)
- val deserializer = new OrcDeserializer(dataSchema, requiredSchema,
requestedColIds)
+ val colIds = requestedColIds ++
List.fill(partitionSchema.length)(-1).toArray[Int]
+ val unsafeProjection = UnsafeProjection.create(resultSchema)
--- End diff --
can we follow parquet and just join the data row and partition row, and do
a final unsafe projection? It's much easier and there is no performance
difference.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]