That works! Thanks. Do you know off hand if this filter would be used in a
predicate pushdown for a parquet dataset? Or would it be possibly coming in
version 8.0.0?

On Wed, Apr 20, 2022 at 3:49 PM Weston Pace <[email protected]> wrote:

> The second argument to `call_function` should be a list (the args to
> the function).  Since `arr3` is iterable it is interpreting it as a
> list of args and trying to treat each row as an argument to your call
> (this is the reason it thinks you have 3 arguments).  This should
> work:
>
>     pc.call_function("struct_field", [arr3],
> pc.StructFieldOptions(indices=[0]))
>
> Unfortunately, that evaluates the function immediately.  If you want
> to create an expression then you need some way to create a call and I
> don't actually know how to do that.  I can do something a little
> hackish:
>
> table = pa.Table.from_pydict({'values': arr3})
> dataset = ds.dataset(table)
> sf_call = ds.field('')._call('struct_field', [ds.field('values')],
> pc.StructFieldOptions(indices=[0]))
> dataset.to_table(filter=sf_call < 200)
>
> However, I suspect there is probably a better way to create a call
> object than `ds.field('')._call(...)`
>
> On Wed, Apr 20, 2022 at 3:09 AM Partha Dutta <[email protected]>
> wrote:
> >
> > I'm trying to use the compute function struct_field in order to create
> an expression for dataset filtering. But running into an error. This is the
> code snippet:
> >
> > arr1 = pa.array([100, 200, 300])
> > arr2 = pa.array([400, 500, 600])
> > arr3 = pa.StructArray.from_arrays([arr1, arr2], ["one", "two"])
> > e = pc.call_function("struct_field", arr3,
> pc.StructFieldOptions(indices=[0])) > 200
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> >   File "pyarrow/_compute.pyx", line 531, in
> pyarrow._compute.call_function
> >   File "pyarrow/_compute.pyx", line 330, in
> pyarrow._compute.Function.call
> >   File "pyarrow/error.pxi", line 143, in
> pyarrow.lib.pyarrow_internal_check_status
> >   File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> > pyarrow.lib.ArrowInvalid: Function 'struct_field' accepts 1 arguments
> but attempted to look up kernel(s) with 3
> >
> > If I try to exclude the options, I get
> > pyarrow.lib.ArrowInvalid: Function 'struct_field' cannot be called
> without options
> >
> > Any advice? I am using pyarrow 7.0.0
> > --
> > Partha Dutta
> > [email protected]
>


-- 
Partha Dutta
[email protected]

Reply via email to