It's unlikely there will be much benefit in using dictionary encoding.  How
are you doing the filtering?  It sounds like you might have many filters.
`arrow::compute::Filter` will not get great performance if you want to
apply many filters.  No one has added support for selection vectors but
that would probably be the fastest way to apply many filters against the
same array.  Ideally you could avoid any kind of allocation between each
filter pass.  Although, if the filters were highly selective you might want
to use a list of indices instead of a selection vector.  However, this has
also not been implemented.

On Wed, Apr 12, 2023 at 7:57 PM Surya Kiran Gullapalli <
[email protected]> wrote:

> Hi,
> I’m trying to run filter based on multiple columns (decided at run time)
> in a table. There are more than a million  of filters I have to run and
> even though I’m getting results it was taking humongous amount of time to
> complete the operation. I’m using Acero filter node to get the filtered
> table
>
> I’d like to know if there’ll be a performance improvement if I use
> dictionary encoded arrays instead of string arrays?
>
> Also, any other pointers to speed up the operation
>
> Thanks,
> Surya
>
>
>

Reply via email to