I have a struct array with a few fields. I'd like to compute scalar
aggregations over several of its fields (like computing the min and max of
each field) in a single pass. As a simple case, how about like this:

struct_type = pa.struct([("x", pa.float64()), ("y", pa.float64())])
array = pa.array([
    {"x": 1, "y": 2},
    {"x": 3, "y": 4},
    {"x": 5, "y": 6}
  ],
  struct_type)

I can compute the min_max of "x" and "y' individually:

>>> pc.min_max(pc.struct_field(array, 0))
<pyarrow.StructScalar: [('min', 1.0), ('max', 5.0)]>

>>> pc.min_max(pc.struct_field(array, 1))
<pyarrow.StructScalar: [('min', 2.0), ('max', 6.0)]>

But what I'd really like is some way to apply min_max to the x and y
columns in one go, resulting in something like

<pyarrow.StructScalar: [('x', {'min': 1.0, 'max': 5.0}), ('y', {'min': 2.0,
'max': 6.0})]>

Is this possible from pyarrow?

Reply via email to