Hi Leo,

Can't you call compute.filter with the resultant bool array from step (2)?
🤔

On Fri, Aug 13, 2021, 06:17 Leonhard Gruenschloss <[email protected]>
wrote:

> Hi,
>
> I'd like to filter a ListArray, based on whether a particular value is
> present in each list. Is there a better approach than the one described
> below? Particularly, are there any existing compute functions that I could
> use instead?
>
> Here's a concrete example, with rows consisting of variable-length lists
> of strings:
> ["a", "b", "x"]
> ["c", "d"]
> ["e", "x", "a"]
> ["c"]
> ["d, "e"]
>
> If the element to search for is "x", only the first and third row would be
> retained after filtering:
> ["a", "b", "x"]
> ["e", "x", "a"]
>
> To implement this, the following should work, but is there a better way?
>
> (1) Run the "equal" compute function on the values of the list:
> [false, false, true, false, false, false, true, false, false, false, false]
>
> (2) Linearly scan the result of (1) in lockstep with the list's offsets,
> to keep track of which rows matched:
> [true, false, true, false, false]
>
> (3) Expand the result of (2) by the list lengths:
> [true, true, true, false, false, true, true, true, false, false, false]
>
> (4) Use the "filter" compute function (using the result from (3)) to copy
> only the matching values.
> ["a", "b", "x", "e", "x", "a"]
>
> (5) Using the result of (2), sum up lengths to compute new offsets:
> [0, 3, 6]
>
> (2), (3), and (5) are of course not difficult to implement, but is there
> maybe a trick to use existing compute functions instead? Particularly for
> non-C++ implementations that could make a big performance difference.
>
> Cheers,
> Leo
>
>>

Reply via email to