Github user nongli commented on a diff in the pull request:
https://github.com/apache/spark/pull/10961#discussion_r51305930
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
---
@@ -126,12 +147,25 @@ public ArrayData copy() {
list[i] = data.getLong(offset + i);
}
}
+ } else if (dt instanceof DecimalType) {
+ DecimalType decType = (DecimalType)dt;
+ for (int i = 0; i < length; i++) {
+ if (!data.getIsNull(offset + i)) {
+ list[i] = getDecimal(i, decType.precision(), decType.scale());
+ }
+ }
} else if (dt instanceof StringType) {
for (int i = 0; i < length; i++) {
if (!data.getIsNull(offset + i)) {
list[i] = ColumnVectorUtils.toString(data.getByteArray(offset
+ i));
}
}
+ } else if (dt instanceof CalendarIntervalType) {
--- End diff --
This is exposing missing implementation. We never generate arrays or
complex types in the testing currently but implementing it is quite a bit more
work. The issue is that to materialize the list of nested types inside the
array, we need a deep copy of everything.
e.g. array<struct> means we need to copy the row for the struct.
This patch is big enough that I'd prefer to defer. Also, this is extremely
expensive so we might want to rethink how we do this. For example, having
getArray return an iterator instead of the list.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]