Should I open a JIRA on this?

On Mon, Nov 29, 2021, 10:52 Alessandro Molina <[email protected]>
wrote:

> Oh, ops, sorry my fault, I understood the question reversed :D
>
> I think that if we had a compute function that returns indices of a
> matching value that could also be applied to masks to retrieve the indices
> of any "true" value thus also solving your question if combined with is_in
> (or any other predicate at that point). That might be a reasonable addition
> to compute functions.
>
>
> On Sun, Nov 28, 2021 at 7:00 AM Niranda Perera <[email protected]>
> wrote:
>
>> Hi guys, sorry for the late reply.
>>
>> Yes,  Joris is right. I want the converse (I think 😊 ) of index in. I
>> was discussing this with Eduardo in zulip [1].
>>
>> I was hoping that I could do this.
>> ```
>> values = pa.array([1, 2, 2, 3, 4, 1])
>> to_find= pa.array([1, 2, 1])
>> indices = pc.index_in(to_find, value_set=values) #  expected = [0, 5, 1,
>> 2, 0, 5] received = [0, 1, 0]
>> ```
>> So, index_in does not handle duplicated indices of values (I am guessing
>> it creates a hashmap of values, and not a multimap).
>>
>> One suggestion was to use `aggregations.index`. And I think that might
>> work recursively, as follows. But I haven't tested this.
>> ```
>> indices = []
>> for f in to_find:
>>   idx = -1
>>   while true:
>>     idx = pc.index(values, f, start=idx + 1, end=len(values))
>>     if idx == -1:
>>       break
>>     else:
>>       indices.append(idx)
>> ```
>>
>> But I was thinking if it would make sense to give a method to find all
>> indices of a value (inner while loop)?
>>
>> Best
>>
>> [1]
>> https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/Find.20a.20value.20indices.20in.20an.20array/near/262351923
>>
>>
>> On Thu, Nov 25, 2021 at 3:14 PM Joris Van den Bossche <
>> [email protected]> wrote:
>>
>>> I think "index_in" does the index in the other way around? It gives,
>>> for each value of the array, the index in the set. While if I
>>> understand the question correctly, Niranda is looking for the index
>>> into the array for elements that are present in the set.
>>>
>>> Something like that could be achieved by using "is_in", and then
>>> getting the indices of the True values:
>>>
>>> >>> pc.is_in(pa.array([1, 2, 3]), value_set=pa.array([1, 3]))
>>> <pyarrow.lib.BooleanArray object at 0x7fcc96896a00>
>>> [
>>>   true,
>>>   false,
>>>   true
>>> ]
>>>
>>> To get the location of the True values, in numpy this is called
>>> "nonzero", and we have an open JIRA for adding this as a kernel
>>> (https://issues.apache.org/jira/browse/ARROW-13035)
>>>
>>> On Thu, 25 Nov 2021 at 11:17, Alessandro Molina
>>> <[email protected]> wrote:
>>> >
>>> > I think index_in is what you are looking for
>>> >
>>> > >>> pc.index_in(pa.array([1, 2, 3]), value_set=pa.array([1, 3]))
>>> > <pyarrow.lib.Int32Array object at 0x11e2a6580>
>>> > [
>>> >   0,
>>> >   null,
>>> >   1
>>> > ]
>>> >
>>> > On Sat, Nov 20, 2021 at 4:49 AM Niranda Perera <
>>> [email protected]> wrote:
>>> >>
>>> >> Hi all, is there a compute API for searching a value index (and a set
>>> of values) in an Array?
>>> >> ex:
>>> >> ```python
>>> >> a = [1, 2, 2, 3, 4, 1]
>>> >> values= pa.array([1, 2, 1])
>>> >>
>>> >> index = find_index(a, 1) # = [0, 5]
>>> >> indices = find_indices(a, values) # = [0, 1, 2, 5]
>>> >> ```
>>> >> I am currently using `compute.is_in` and traversing the true indices
>>> of the result Bitmap. Is there a better way?
>>> >>
>>> >> Best
>>> >> --
>>> >> Niranda Perera
>>> >> https://niranda.dev/
>>> >> @n1r44
>>> >>
>>>
>>
>>
>> --
>> Niranda Perera
>> https://niranda.dev/
>> @n1r44 <https://twitter.com/N1R44>
>>
>>

Reply via email to