Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/20619#discussion_r168711619
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
---
@@ -395,16 +395,19 @@ class ParquetFileFormat
ParquetInputFormat.setFilterPredicate(hadoopAttemptContext.getConfiguration,
pushed.get)
}
val taskContext = Option(TaskContext.get())
- val parquetReader = if (enableVectorizedReader) {
+ val iter = if (enableVectorizedReader) {
val vectorizedReader = new VectorizedParquetRecordReader(
convertTz.orNull, enableOffHeapColumnVector &&
taskContext.isDefined, capacity)
+ val recordReaderIterator = new
RecordReaderIterator(vectorizedReader)
+ // Register a task completion lister before `initalization`.
--- End diff --
Those constructors didn't look heavy to me.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]