JoshRosen commented on a change in pull request #27089: [SPARK-30414][SQL] 
ParquetRowConverter optimizations: arrays, maps, plus misc. constant factors
URL: https://github.com/apache/spark/pull/27089#discussion_r363558759
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
 ##########
 @@ -614,12 +615,12 @@ private[parquet] class ParquetRowConverter(
   }
 
   private trait RepeatedConverter {
-    private var currentArray: ArrayBuffer[Any] = _
+    private[this] val currentArray = new java.util.ArrayList[Any]()
 
 Review comment:
   From prior experience I've found `ArrayList` to be marginally faster; I ran 
some quick-and-dirty non-Spark microbenchmarks and this is indeed still the 
case, but the gain is pretty marginal compared to other factors.
   
   In the interests of code simplicity and clarity, I've backed out that part 
of the change: the code now uses and `clear()`s a `mutable.ArrayBuffer`: 
6d16f596ef6af9fd8946a062f79d0eeace9e1959

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to