Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/22880#discussion_r229738879
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
---
@@ -202,11 +204,15 @@ private[parquet] class ParquetRowConverter(
override def start(): Unit = {
var i = 0
- while (i < currentRow.numFields) {
+ while (i < fieldConverters.length) {
fieldConverters(i).updater.start()
currentRow.setNullAt(i)
--- End diff --
I can see how this is confusing. As part of the `start` method, all columns
in the current row must be set to null. Some of those columns are set to null in
https://github.com/apache/spark/blob/6b19f579e5424b5a8c16d6817c5a59b9828efec2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala#L207-L211
The rest of them are set to null in
https://github.com/apache/spark/blob/6b19f579e5424b5a8c16d6817c5a59b9828efec2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala#L212-L215
This is equivalent to
```scala
var i = 0
while (i < fieldConverters.length) {
fieldConverters(i).updater.start()
i += 1
}
var j = 0
while (j < currentRow.numFields) {
currentRow.setNullAt(j)
j += 1
}
```
Is that clearer? Maybe I should rewrite the code that way.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]