Hi Leo, Can't you call compute.filter with the resultant bool array from step (2)? 🤔
On Fri, Aug 13, 2021, 06:17 Leonhard Gruenschloss <[email protected]> wrote: > Hi, > > I'd like to filter a ListArray, based on whether a particular value is > present in each list. Is there a better approach than the one described > below? Particularly, are there any existing compute functions that I could > use instead? > > Here's a concrete example, with rows consisting of variable-length lists > of strings: > ["a", "b", "x"] > ["c", "d"] > ["e", "x", "a"] > ["c"] > ["d, "e"] > > If the element to search for is "x", only the first and third row would be > retained after filtering: > ["a", "b", "x"] > ["e", "x", "a"] > > To implement this, the following should work, but is there a better way? > > (1) Run the "equal" compute function on the values of the list: > [false, false, true, false, false, false, true, false, false, false, false] > > (2) Linearly scan the result of (1) in lockstep with the list's offsets, > to keep track of which rows matched: > [true, false, true, false, false] > > (3) Expand the result of (2) by the list lengths: > [true, true, true, false, false, true, true, true, false, false, false] > > (4) Use the "filter" compute function (using the result from (3)) to copy > only the matching values. > ["a", "b", "x", "e", "x", "a"] > > (5) Using the result of (2), sum up lengths to compute new offsets: > [0, 3, 6] > > (2), (3), and (5) are of course not difficult to implement, but is there > maybe a trick to use existing compute functions instead? Particularly for > non-C++ implementations that could make a big performance difference. > > Cheers, > Leo > >>
