> > The better approach to step (2) above, would be to use
> arrow::compute::Unique, but instead of producing unique values, produce
> indexes. This way I could perhaps also setup a function Options that could
> choose to keep the first duplicate, or
> > keep the last, etc.
>
> A version of compute::Unique that returns indices sounds like a pretty
> useful feature to me.  Even if you don't create this I'd recommend
> creating a JIRA for it (unless someone knows of one that already
> exists).
>

Sounds good! I'll try to do that.


> > My C++ is not particularly advanced, so I find it hard to know where to
> start for adapting an existing compute function (also, it is very hard to
> search for the unique function because of "unique_ptr").
>
> I find it a little confusing too :).  Eduardo Ponce had started work
> on a guide of sorts
> (https://github.com/apache/arrow/pull/10296/files).  I'm not sure what
> the status is for this.  It might be an ok place to start.
>
> Also, a dirty hack, when searching for compute functions I add _doc to
> the end of my search string (e.g. unique.*_doc) and it finds the
> docstrings for the functions and I can usually work back from there.
>

Awesome, thanks for the reference and the search tip (totally helpful)!

Thanks for the other advice too!

-Aldrin

Reply via email to