[GitHub] [spark] JoshRosen commented on a change in pull request #26993: [SPARK-30338][SQL] Avoid unnecessary InternalRow copies in ParquetRowConverter

GitBox Fri, 03 Jan 2020 00:58:13 -0800

JoshRosen commented on a change in pull request #26993: [SPARK-30338][SQL] 
Avoid unnecessary InternalRow copies in ParquetRowConverter
URL: https://github.com/apache/spark/pull/26993#discussion_r362738310


 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
 ##########
 @@ -318,10 +318,31 @@ private[parquet] class ParquetRowConverter(
         new ParquetMapConverter(parquetType.asGroupType(), t, updater)
 
       case t: StructType =>
+        val wrappedUpdater = {
+          if (updater.isInstanceOf[RowUpdater]) {
+            // `updater` is a RowUpdater, implying that the parent container 
is a struct.
+            // We do NOT need to perform defensive copying here because either:
+            //
+            //   1. The path from the schema root to this field consists only 
of nested
+            //      structs, so this converter will only be invoked once per 
record and
+            //      we don't need to copy because copying will be done in the 
final
+            //      UnsafeProjection, or
+            //   2. The path from the schema root to this field contains a map 
or array,
+            //      in which case we will perform a recursive defensive copy 
via the
 
 Review comment:
   **Update**: in #27089 I'm removing these other unnecessary ArrayBuffer 
copies.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] JoshRosen commented on a change in pull request #26993: [SPARK-30338][SQL] Avoid unnecessary InternalRow copies in ParquetRowConverter

Reply via email to