Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229738879 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet] class ParquetRowConverter( override def start(): Unit = { var i = 0 - while (i < currentRow.numFields) { + while (i < fieldConverters.length) { fieldConverters(i).updater.start() currentRow.setNullAt(i) --- End diff -- I can see how this is confusing. As part of the `start` method, all columns in the current row must be set to null. Some of those columns are set to null in https://github.com/apache/spark/blob/6b19f579e5424b5a8c16d6817c5a59b9828efec2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala#L207-L211 The rest of them are set to null in https://github.com/apache/spark/blob/6b19f579e5424b5a8c16d6817c5a59b9828efec2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala#L212-L215 This is equivalent to ```scala var i = 0 while (i < fieldConverters.length) { fieldConverters(i).updater.start() i += 1 } var j = 0 while (j < currentRow.numFields) { currentRow.setNullAt(j) j += 1 } ``` Is that clearer? Maybe I should rewrite the code that way.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org