Should I open a JIRA on this? On Mon, Nov 29, 2021, 10:52 Alessandro Molina <[email protected]> wrote:
> Oh, ops, sorry my fault, I understood the question reversed :D > > I think that if we had a compute function that returns indices of a > matching value that could also be applied to masks to retrieve the indices > of any "true" value thus also solving your question if combined with is_in > (or any other predicate at that point). That might be a reasonable addition > to compute functions. > > > On Sun, Nov 28, 2021 at 7:00 AM Niranda Perera <[email protected]> > wrote: > >> Hi guys, sorry for the late reply. >> >> Yes, Joris is right. I want the converse (I think 😊 ) of index in. I >> was discussing this with Eduardo in zulip [1]. >> >> I was hoping that I could do this. >> ``` >> values = pa.array([1, 2, 2, 3, 4, 1]) >> to_find= pa.array([1, 2, 1]) >> indices = pc.index_in(to_find, value_set=values) # expected = [0, 5, 1, >> 2, 0, 5] received = [0, 1, 0] >> ``` >> So, index_in does not handle duplicated indices of values (I am guessing >> it creates a hashmap of values, and not a multimap). >> >> One suggestion was to use `aggregations.index`. And I think that might >> work recursively, as follows. But I haven't tested this. >> ``` >> indices = [] >> for f in to_find: >> idx = -1 >> while true: >> idx = pc.index(values, f, start=idx + 1, end=len(values)) >> if idx == -1: >> break >> else: >> indices.append(idx) >> ``` >> >> But I was thinking if it would make sense to give a method to find all >> indices of a value (inner while loop)? >> >> Best >> >> [1] >> https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/Find.20a.20value.20indices.20in.20an.20array/near/262351923 >> >> >> On Thu, Nov 25, 2021 at 3:14 PM Joris Van den Bossche < >> [email protected]> wrote: >> >>> I think "index_in" does the index in the other way around? It gives, >>> for each value of the array, the index in the set. While if I >>> understand the question correctly, Niranda is looking for the index >>> into the array for elements that are present in the set. >>> >>> Something like that could be achieved by using "is_in", and then >>> getting the indices of the True values: >>> >>> >>> pc.is_in(pa.array([1, 2, 3]), value_set=pa.array([1, 3])) >>> <pyarrow.lib.BooleanArray object at 0x7fcc96896a00> >>> [ >>> true, >>> false, >>> true >>> ] >>> >>> To get the location of the True values, in numpy this is called >>> "nonzero", and we have an open JIRA for adding this as a kernel >>> (https://issues.apache.org/jira/browse/ARROW-13035) >>> >>> On Thu, 25 Nov 2021 at 11:17, Alessandro Molina >>> <[email protected]> wrote: >>> > >>> > I think index_in is what you are looking for >>> > >>> > >>> pc.index_in(pa.array([1, 2, 3]), value_set=pa.array([1, 3])) >>> > <pyarrow.lib.Int32Array object at 0x11e2a6580> >>> > [ >>> > 0, >>> > null, >>> > 1 >>> > ] >>> > >>> > On Sat, Nov 20, 2021 at 4:49 AM Niranda Perera < >>> [email protected]> wrote: >>> >> >>> >> Hi all, is there a compute API for searching a value index (and a set >>> of values) in an Array? >>> >> ex: >>> >> ```python >>> >> a = [1, 2, 2, 3, 4, 1] >>> >> values= pa.array([1, 2, 1]) >>> >> >>> >> index = find_index(a, 1) # = [0, 5] >>> >> indices = find_indices(a, values) # = [0, 1, 2, 5] >>> >> ``` >>> >> I am currently using `compute.is_in` and traversing the true indices >>> of the result Bitmap. Is there a better way? >>> >> >>> >> Best >>> >> -- >>> >> Niranda Perera >>> >> https://niranda.dev/ >>> >> @n1r44 >>> >> >>> >> >> >> -- >> Niranda Perera >> https://niranda.dev/ >> @n1r44 <https://twitter.com/N1R44> >> >>
