I have a struct array with a few fields. I'd like to compute scalar
aggregations over several of its fields (like computing the min and max of
each field) in a single pass. As a simple case, how about like this:
struct_type = pa.struct([("x", pa.float64()), ("y", pa.float64())])
array = pa.array([
{"x": 1, "y": 2},
{"x": 3, "y": 4},
{"x": 5, "y": 6}
],
struct_type)
I can compute the min_max of "x" and "y' individually:
>>> pc.min_max(pc.struct_field(array, 0))
<pyarrow.StructScalar: [('min', 1.0), ('max', 5.0)]>
>>> pc.min_max(pc.struct_field(array, 1))
<pyarrow.StructScalar: [('min', 2.0), ('max', 6.0)]>
But what I'd really like is some way to apply min_max to the x and y
columns in one go, resulting in something like
<pyarrow.StructScalar: [('x', {'min': 1.0, 'max': 5.0}), ('y', {'min': 2.0,
'max': 6.0})]>
Is this possible from pyarrow?