[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...

HyukjinKwon Sat, 12 May 2018 19:14:30 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21312#discussion_r187787343
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala 
---
    @@ -311,6 +311,7 @@ private[arrow] class ArrayWriter(
       override def reset(): Unit = {
         super.reset()
         elementWriter.reset()
    +    valueVector.clear()
    --- End diff --
    
    Looks @BryanCutler added `reset()` interface in 0.9.0 mentioned in:
    
    
https://github.com/apache/spark/blob/eb386be1ed383323da6e757f63f3b8a7ced38cc4/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala#L132
    
    at 
https://github.com/apache/arrow/commit/4dbce607d50031a405af39d36e08cd03c5ffc764 
and https://issues.apache.org/jira/browse/ARROW-1962
    
    but if we think about backporting, probably I guess we can go this way as a 
bug fix as is? Roughly looks making sense. 
    
    Would it be also safe to do:
    
    ```
        valueVector match {
          case fixedWidthVector: BaseFixedWidthVector => 
fixedWidthVector.reset()
          case variableWidthVector: BaseVariableWidthVector => 
variableWidthVector.reset()
          case repeatedValueVector: BaseRepeatedValueVector => 
repeatedValueVector.clear()
          case _ =>
        }
    ```
    
    ? @icexelloss, @BryanCutler and @viirya?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...

Reply via email to