sunchao commented on a change in pull request #35483:
URL: https://github.com/apache/spark/pull/35483#discussion_r804201836
##########
File path:
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
##########
@@ -557,6 +579,9 @@ protected void reserveInternal(int newCapacity) {
Platform.reallocateMemory(lengthData, oldCapacity * 4L, newCapacity
* 4L);
this.offsetData =
Platform.reallocateMemory(offsetData, oldCapacity * 4L, newCapacity
* 4L);
+ } else if (isStruct()) {
+ this.structOffsetData =
+ Platform.reallocateMemory(structOffsetData, oldCapacity * 4L,
newCapacity * 4L);
Review comment:
OK
##########
File path:
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
##########
@@ -374,6 +375,14 @@ public void putBooleans(int rowId, int count, byte src,
int srcIndex) {
*/
public abstract void putArray(int rowId, int offset, int length);
+ /**
+ * Puts a new non-null struct at 'rowId' of this vector, which is backed by
elements at
+ * 'offset' of child vectors.
+ *
+ * NOTE: this MUST be called after new elements are appended to child
vectors of a struct vector.
+ */
+ public abstract void putStruct(int rowId, int offset);
Review comment:
Hmm, I'd prefer `putStruct`, similar to `putArray` above.
##########
File path:
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
##########
@@ -457,6 +466,7 @@ public final int appendNull() {
}
public final int appendNotNull() {
+ assert (!(dataType() instanceof StructType)); // Use appendStruct()
Review comment:
As mentioned in the PR description, I think it's fine since
`WritableColumnVector` is a Spark internal API. Also see some [previous
discussion](https://github.com/apache/spark/pull/34659#discussion_r769131591).
##########
File path:
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
##########
@@ -703,13 +720,13 @@ public WritableColumnVector arrayData() {
public abstract int getArrayOffset(int rowId);
- @Override
- public WritableColumnVector getChild(int ordinal) { return
childColumns[ordinal]; }
-
/**
- * Returns the elements appended.
+ * Returns the offset of a struct element at 'rowId' in the child vectors of
this.
*/
- public final int getElementsAppended() { return elementsAppended; }
Review comment:
Oops I removed this by accident. Seems it isn't used anywhere though.
##########
File path:
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
##########
@@ -374,6 +375,14 @@ public void putBooleans(int rowId, int count, byte src,
int srcIndex) {
*/
public abstract void putArray(int rowId, int offset, int length);
+ /**
+ * Puts a new non-null struct at 'rowId' of this vector, which is backed by
elements at
+ * 'offset' of child vectors.
Review comment:
Yea for non-null struct the offset is the same: struct[rowId] is
constitute of children[i][rowId] for i in [0, struct.len).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]