[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

mallman Wed, 31 Oct 2018 08:16:03 -0700

Github user mallman commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22880#discussion_r229738879
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
 ---
    @@ -202,11 +204,15 @@ private[parquet] class ParquetRowConverter(
     
       override def start(): Unit = {
         var i = 0
    -    while (i < currentRow.numFields) {
    +    while (i < fieldConverters.length) {
           fieldConverters(i).updater.start()
           currentRow.setNullAt(i)
    --- End diff --
    
    I can see how this is confusing. As part of the `start` method, all columns 
in the current row must be set to null. Some of those columns are set to null in
    
    
https://github.com/apache/spark/blob/6b19f579e5424b5a8c16d6817c5a59b9828efec2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala#L207-L211
    
    The rest of them are set to null in
    
    
https://github.com/apache/spark/blob/6b19f579e5424b5a8c16d6817c5a59b9828efec2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala#L212-L215
    
    This is equivalent to
    
    ```scala
    var i = 0
    while (i < fieldConverters.length) {
      fieldConverters(i).updater.start()
      i += 1
    }
    var j = 0
    while (j < currentRow.numFields) {
      currentRow.setNullAt(j)
      j += 1
    }
    ```
    
    Is that clearer? Maybe I should rewrite the code that way.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

Reply via email to