[GitHub] [spark] sunchao commented on a change in pull request #35483: [SPARK-38179][SQL] Improve `WritableColumnVector` to better support null struct

GitBox Thu, 10 Feb 2022 14:59:21 -0800


sunchao commented on a change in pull request #35483:
URL: https://github.com/apache/spark/pull/35483#discussion_r804201836




##########
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
##########
@@ -557,6 +579,9 @@ protected void reserveInternal(int newCapacity) {
           Platform.reallocateMemory(lengthData, oldCapacity * 4L, newCapacity 
* 4L);
       this.offsetData =
           Platform.reallocateMemory(offsetData, oldCapacity * 4L, newCapacity 
* 4L);
+    } else if (isStruct()) {
+      this.structOffsetData =
+        Platform.reallocateMemory(structOffsetData, oldCapacity * 4L, 
newCapacity * 4L);

Review comment:
       OK

##########
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
##########
@@ -374,6 +375,14 @@ public void putBooleans(int rowId, int count, byte src, 
int srcIndex) {
    */
   public abstract void putArray(int rowId, int offset, int length);
 
+  /**
+   * Puts a new non-null struct at 'rowId' of this vector, which is backed by 
elements at
+   * 'offset' of child vectors.
+   *
+   * NOTE: this MUST be called after new elements are appended to child 
vectors of a struct vector.
+   */
+  public abstract void putStruct(int rowId, int offset);

Review comment:
       Hmm, I'd prefer `putStruct`, similar to `putArray` above.

##########
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
##########
@@ -457,6 +466,7 @@ public final int appendNull() {
   }
 
   public final int appendNotNull() {
+    assert (!(dataType() instanceof StructType)); // Use appendStruct()

Review comment:
       As mentioned in the PR description, I think it's fine since 
`WritableColumnVector` is a Spark internal API. Also see some [previous 
discussion](https://github.com/apache/spark/pull/34659#discussion_r769131591).

##########
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
##########
@@ -703,13 +720,13 @@ public WritableColumnVector arrayData() {
 
   public abstract int getArrayOffset(int rowId);
 
-  @Override
-  public WritableColumnVector getChild(int ordinal) { return 
childColumns[ordinal]; }
-
   /**
-   * Returns the elements appended.
+   * Returns the offset of a struct element at 'rowId' in the child vectors of 
this.
    */
-  public final int getElementsAppended() { return elementsAppended; }

Review comment:
       Oops I removed this by accident. Seems it isn't used anywhere though.

##########
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
##########
@@ -374,6 +375,14 @@ public void putBooleans(int rowId, int count, byte src, 
int srcIndex) {
    */
   public abstract void putArray(int rowId, int offset, int length);
 
+  /**
+   * Puts a new non-null struct at 'rowId' of this vector, which is backed by 
elements at
+   * 'offset' of child vectors.

Review comment:
       Yea for non-null struct the offset is the same: struct[rowId] is 
constitute of children[i][rowId] for i in [0, struct.len).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on a change in pull request #35483: [SPARK-38179][SQL] Improve `WritableColumnVector` to better support null struct

Reply via email to