Hi Spencer, I'm not aware of a helper method that would do this (but don't have a lot of expertise in this area of the code). From a computational perspective, writing a small helper function in python to do the computation in a loop, does not really lose efficiency, because of Arrow's column layout (all the heavy lifting will be pushed down to C++).
Thanks, Micah On Thu, May 18, 2023 at 10:08 AM Spencer Nelson <[email protected]> wrote: > I have a struct array with a few fields. I'd like to compute scalar > aggregations over several of its fields (like computing the min and max of > each field) in a single pass. As a simple case, how about like this: > > struct_type = pa.struct([("x", pa.float64()), ("y", pa.float64())]) > array = pa.array([ > {"x": 1, "y": 2}, > {"x": 3, "y": 4}, > {"x": 5, "y": 6} > ], > struct_type) > > I can compute the min_max of "x" and "y' individually: > > >>> pc.min_max(pc.struct_field(array, 0)) > <pyarrow.StructScalar: [('min', 1.0), ('max', 5.0)]> > > >>> pc.min_max(pc.struct_field(array, 1)) > <pyarrow.StructScalar: [('min', 2.0), ('max', 6.0)]> > > But what I'd really like is some way to apply min_max to the x and y > columns in one go, resulting in something like > > <pyarrow.StructScalar: [('x', {'min': 1.0, 'max': 5.0}), ('y', {'min': > 2.0, 'max': 6.0})]> > > Is this possible from pyarrow? >
