Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19601
This approach also works for nested array. I have this implementation in my
machine. For ease of review, I commit the version of only primitive type array
support. If you like it, I can commit the version for nested array support.
Yes, I created a wrap `ColumnarArray` corresponding to an `UnsafeArrayData`
for an array at each nest level. It can **avoid expensive data copy**, which is
very important for array and complex type. This is because data size of these
data structures are large.
If we can still avoid expensive data copy, which is accomplished by
pointing a part of one large array (not copying data from the large array), by
using the new design (you may think about [such a
format](https://github.com/apache/arrow/blob/master/format/Layout.md#example-layout-listlistbyte)),
I am happy to revisit the format with you. It is an issue for internal
implementation. The redesign would not affect the external interfaces
`ColumnVector` and `ColumnarArray`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]