Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10820#discussion_r50080842
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
 ---
    @@ -17,22 +17,37 @@
     package org.apache.spark.sql.execution.vectorized;
     
     import org.apache.spark.memory.MemoryMode;
    -import org.apache.spark.sql.types.DataType;
    +import org.apache.spark.sql.types.*;
     
     /**
      * This class represents a column of values and provides the main APIs to 
access the data
      * values. It supports all the types and contains get/put APIs as well as 
their batched versions.
      * The batched versions are preferable whenever possible.
      *
    - * Most of the APIs take the rowId as a parameter. This is the local 
0-based row id for values
    + * To handle nested schemas, ColumnVector has two types: Arrays and 
Structs. In both cases these
    + * columns have child columns. All of the data is stored in the child 
columns and the parent column
    + * contains nullability, and in the case of Arrays, the lengths and 
offsets into the child column.
    --- End diff --
    
    can you explain how lengths and offsets are stored? also is there a single 
"parent" column that encodes nullability, length, and offset?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to