It's unlikely there will be much benefit in using dictionary encoding. How are you doing the filtering? It sounds like you might have many filters. `arrow::compute::Filter` will not get great performance if you want to apply many filters. No one has added support for selection vectors but that would probably be the fastest way to apply many filters against the same array. Ideally you could avoid any kind of allocation between each filter pass. Although, if the filters were highly selective you might want to use a list of indices instead of a selection vector. However, this has also not been implemented.
On Wed, Apr 12, 2023 at 7:57 PM Surya Kiran Gullapalli < [email protected]> wrote: > Hi, > I’m trying to run filter based on multiple columns (decided at run time) > in a table. There are more than a million of filters I have to run and > even though I’m getting results it was taking humongous amount of time to > complete the operation. I’m using Acero filter node to get the filtered > table > > I’d like to know if there’ll be a performance improvement if I use > dictionary encoded arrays instead of string arrays? > > Also, any other pointers to speed up the operation > > Thanks, > Surya > > >
