Hi Daniel,

The following method can be used, although it may not be the most efficient
one:

1. You need to provide an implementation of
VectorValueComparator<StructVector>  based on your custom requirements.
2. Sort your struct vector by
the org.apache.arrow.algorithm.sort.IndexSorter, which will produce a
vector with the positions of vector elements in sorted order.
3. Generate the sorted vector by the element positions generated in step 2.
Note that we can use another struct vector and call the
StructVector#copyFrom API.

The above process is an out-of-place sort. An in-place sort is feasible
only if all fields are fixed-width, and you can design an efficient
algorithm based on your specific data properties.

In our current implementation, we do not have a default
VectorValueComparator for struct vectors, and we have no plan to provide
one.
However, we may provide a general out-of-place sorter for vectors of
arbitrary types.

Best,
Liya Fan






On Thu, Aug 26, 2021 at 5:55 AM Daniel Hsu . <[email protected]>
wrote:

> Hi,
>
> My name is Daniel Hsu and I'm an engineer at ByteDance. We're exploring
> Apache Arrow, and have come across a use case that we're not sure about.
>
> We represent our columnar dataset as a StructVector that has one child
> vector per column. We'd like to sort a StructVector by a composite key from
> multiple of the child vectors, but it doesn't seem like this use case is
> supported because:
>
> 1. FixedWidthInPlaceVectorSorter and FixedWidthOutOfPlaceVectorSorter only
> work on fixed width vectors, and a StructVector is not fixed width vector.
> 2.  VariableWidthOutOfPlaceVectorSorter only works
> on BaseVariableWidthVector, and StructVector is not a
> BaseVariableWidthVector.
>
> And while index sorting does work on StructVectors, it isn't able to solve
> our use case.
>
> Is there a recommendation on how to sort a StructVector, or more generally
> how to sort multiple vectors by composite keys? I've attached a simple Java
> file that contains some sample code to demonstrate what I'm referring to.
>
> Note: I sent this email to [email protected] yesterday, but
> on second thought I'm not sure if [email protected] is
> meant for questions.
>
> Best,
> Daniel
>

Reply via email to