[GitHub] spark pull request #19842: [SPARK-22643][SQL] ColumnarArray should be an imm...

kiszk Wed, 29 Nov 2017 07:56:53 -0800

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19842#discussion_r153829687
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
 ---
    @@ -175,9 +175,7 @@ public ColumnarRow getStruct(int rowId, int size) {
        * Returns the array at rowid.
        */
       public final ColumnarArray getArray(int rowId) {
    -    resultArray.length = getArrayLength(rowId);
    -    resultArray.offset = getArrayOffset(rowId);
    -    return resultArray;
    +    return new ColumnarArray(arrayData(), getArrayOffset(rowId), 
getArrayLength(rowId));
    --- End diff --
    
    Is it better to create `ColumnarArray` for each `rowID` only once (e.g. by 
using caching)? I am curious whether we would see performance overhead for 
creating `ColumnarArray` to access elements of a multi-dimensional array (e.g. 
`a[1][2] + a[1][3]`).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19842: [SPARK-22643][SQL] ColumnarArray should be an imm...

Reply via email to