Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r231249401 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet] class ParquetRowConverter( override def start(): Unit = { var i = 0 - while (i < currentRow.numFields) { + while (i < fieldConverters.length) { fieldConverters(i).updater.start() currentRow.setNullAt(i) --- End diff -- > I'm going to push a new commit keeping the current code but with a brief explanatory comment. On further careful consideration, I believe that separating the calls to `currentRow.setNullAt(i)` into their own loop actually won't incur any significant performance degradationâif any at all. The performance of the `start()` method is dominated by the calls to `fieldConverters(i).updater.start()` and `currentRow.setNullAt(i)`. Putting the latter calls into their own loop won't change the count of those method calls, just the order. @viirya LMK if you disagree with my analysis. I will push a new commit with separate while loops. I won't use the more elegant `(0 until currentRow.numFields).foreach(currentRow.setNullAt)` because that's not a loop, and I doubt either the Spark or Hotspot optimizer can turn that into a loop.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org