[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...

kiszk Tue, 23 May 2017 05:54:51 -0700

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18014
  
    @cloud-fan Thank you for your comments. Let me confirm your ideas. 
    1. Do you want to keep array contents in [a primitive data array (e.g. 
intData[])](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java#L43)
 of `UnsafeArrayData`?
    2. How do you want to update `UnsafeArrayData`?
    
    We can map the current `UnsafeArrayData` into `ColumnVector`. The following 
is the format of `UnsafeArrayData`.
    ```
     [numElements][null bits][values or offset&length][variable length portion]
    ```
    * numElements: store it into `arrayLengths[]`
    * [null bits]: ***Need to conversion from bitvector representation to byte 
representation***
    * [values]: store as each data type
    * [offset&length][variable length portion]: store as `ByteType`
    
    The issue is for conversion of `null bits`. However, if we use byte 
representation in `UnsafeArrayData`, it may waste more memory space. To avoid 
this, we could update `ColumnVector` to support bitvector representation for 
nullability of each element.
    
    
    On the other hand, current my approach stores the whole `UnsafeArrayData` 
as a binary into [`byte[] 
data`](https://github.com/apache/spark/pull/18014/files#diff-f1e0f2d99a6cdc0113487f8358861fb3R56).
 The advantage of this approach is no cost for conversion at put/get.
    
    What do you think?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...

Reply via email to