Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/19907#discussion_r155312307
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
---
@@ -167,8 +169,10 @@ class OrcFileFormat
val iter = new RecordReaderIterator[OrcStruct](orcRecordReader)
Option(TaskContext.get()).foreach(_.addTaskCompletionListener(_ =>
iter.close()))
- val unsafeProjection = UnsafeProjection.create(requiredSchema)
- val deserializer = new OrcDeserializer(dataSchema, requiredSchema,
requestedColIds)
+ val colIds = requestedColIds ++
List.fill(partitionSchema.length)(-1).toArray[Int]
+ val unsafeProjection = UnsafeProjection.create(resultSchema)
--- End diff --
Parquet Vectorization work like the following.
```
// UnsafeRowParquetRecordReader appends the columns internally to
avoid another copy.
if (parquetReader.isInstanceOf[VectorizedParquetRecordReader] &&
enableVectorizedReader) {
iter.asInstanceOf[Iterator[InternalRow]]
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]