viirya commented on a change in pull request #26993: [SPARK-30338][SQL] Avoid
unnecessary InternalRow copies in ParquetRowConverter
URL: https://github.com/apache/spark/pull/26993#discussion_r363451729
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
##########
@@ -318,10 +318,33 @@ private[parquet] class ParquetRowConverter(
new ParquetMapConverter(parquetType.asGroupType(), t, updater)
case t: StructType =>
+ val wrappedUpdater = {
+ // SPARK-30338: avoid unnecessary InternalRow copying for nested
structs:
+ if (updater.isInstanceOf[RowUpdater]) {
+ // `updater` is a RowUpdater, implying that the parent container
is a struct.
+ // We do NOT need to perform defensive copying here because either:
+ //
+ // 1. The path from the schema root to this field consists only
of nested
Review comment:
When we have deeply nested struct inside an array, is it the first case here?
I think it is fine because at the element converter the top level struct
inside an array element will do the defensive copying. So in nested struct
converter, we will see RowUpdater from parent struct so don't need defensive
copying too.
Just maybe good to also update it in the doc.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]