That works! Thanks. Do you know off hand if this filter would be used in a predicate pushdown for a parquet dataset? Or would it be possibly coming in version 8.0.0?
On Wed, Apr 20, 2022 at 3:49 PM Weston Pace <[email protected]> wrote: > The second argument to `call_function` should be a list (the args to > the function). Since `arr3` is iterable it is interpreting it as a > list of args and trying to treat each row as an argument to your call > (this is the reason it thinks you have 3 arguments). This should > work: > > pc.call_function("struct_field", [arr3], > pc.StructFieldOptions(indices=[0])) > > Unfortunately, that evaluates the function immediately. If you want > to create an expression then you need some way to create a call and I > don't actually know how to do that. I can do something a little > hackish: > > table = pa.Table.from_pydict({'values': arr3}) > dataset = ds.dataset(table) > sf_call = ds.field('')._call('struct_field', [ds.field('values')], > pc.StructFieldOptions(indices=[0])) > dataset.to_table(filter=sf_call < 200) > > However, I suspect there is probably a better way to create a call > object than `ds.field('')._call(...)` > > On Wed, Apr 20, 2022 at 3:09 AM Partha Dutta <[email protected]> > wrote: > > > > I'm trying to use the compute function struct_field in order to create > an expression for dataset filtering. But running into an error. This is the > code snippet: > > > > arr1 = pa.array([100, 200, 300]) > > arr2 = pa.array([400, 500, 600]) > > arr3 = pa.StructArray.from_arrays([arr1, arr2], ["one", "two"]) > > e = pc.call_function("struct_field", arr3, > pc.StructFieldOptions(indices=[0])) > 200 > > Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > File "pyarrow/_compute.pyx", line 531, in > pyarrow._compute.call_function > > File "pyarrow/_compute.pyx", line 330, in > pyarrow._compute.Function.call > > File "pyarrow/error.pxi", line 143, in > pyarrow.lib.pyarrow_internal_check_status > > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status > > pyarrow.lib.ArrowInvalid: Function 'struct_field' accepts 1 arguments > but attempted to look up kernel(s) with 3 > > > > If I try to exclude the options, I get > > pyarrow.lib.ArrowInvalid: Function 'struct_field' cannot be called > without options > > > > Any advice? I am using pyarrow 7.0.0 > > -- > > Partha Dutta > > [email protected] > -- Partha Dutta [email protected]
