> > The better approach to step (2) above, would be to use > arrow::compute::Unique, but instead of producing unique values, produce > indexes. This way I could perhaps also setup a function Options that could > choose to keep the first duplicate, or > > keep the last, etc. > > A version of compute::Unique that returns indices sounds like a pretty > useful feature to me. Even if you don't create this I'd recommend > creating a JIRA for it (unless someone knows of one that already > exists). >
Sounds good! I'll try to do that. > > My C++ is not particularly advanced, so I find it hard to know where to > start for adapting an existing compute function (also, it is very hard to > search for the unique function because of "unique_ptr"). > > I find it a little confusing too :). Eduardo Ponce had started work > on a guide of sorts > (https://github.com/apache/arrow/pull/10296/files). I'm not sure what > the status is for this. It might be an ok place to start. > > Also, a dirty hack, when searching for compute functions I add _doc to > the end of my search string (e.g. unique.*_doc) and it finds the > docstrings for the functions and I can usually work back from there. > Awesome, thanks for the reference and the search tip (totally helpful)! Thanks for the other advice too! -Aldrin
