I'd like some help calibrating my expectations regarding acero performance.
I'm finding that some pretty naive numpy is about 10x faster than acero for
my use case.

I'm working with a table with 13,000,000 values. The values are angular
positions on the sky and times. I'd like to filter to a specific one of the
times, and to values within a calculated great-circle distance on the sky.

I've implemented the Vincenty formula (
https://en.wikipedia.org/wiki/Great-circle_distance) for this:

```
def pc_angular_separation(lon1, lat1, lon2, lat2):
     sdlon = pc.sin(pc.subtract(lon2, lon1))
     cdlon = pc.cos(pc.subtract(lon2, lon1))
     slat1 = pc.sin(lat1)
     slat2 = pc.sin(lat2)
     clat1 = pc.cos(lat1)
     clat2 = pc.cos(lat2)

     num1 = pc.multiply(clat2, sdlon)
     num2 = pc.subtract(pc.multiply(slat2, clat1),
pc.multiply(pc.multiply(clat2, slat1), cdlon))
     denominator = pc.add(pc.multiply(slat2, slat1),
pc.multiply(pc.multiply(clat2, clat1), cdlon))
     hypot = pc.sqrt(pc.add(pc.multiply(num1, num1), pc.multiply(num2,
num2)))
     return pc.atan2(hypot, denominator)
```

The resulting pyarrow.compute.Expression is fairly monstrous:

<pyarrow.compute.Expression atan2(sqrt(add(multiply(multiply(cos(Dec_deg),
sin(subtract(RA_deg, 168.9776949652776))), multiply(cos(Dec_deg),
sin(subtract(RA_deg, 168.9776949652776)))),
multiply(subtract(multiply(sin(Dec_deg), -0.9304510671785976),
multiply(multiply(cos(Dec_deg), 0.3664161726591893), cos(subtract(RA_deg,
168.9776949652776)))), subtract(multiply(sin(Dec_deg),
-0.9304510671785976), multiply(multiply(cos(Dec_deg), 0.3664161726591893),
cos(subtract(RA_deg, 168.9776949652776))))))), add(multiply(sin(Dec_deg),
0.3664161726591893), multiply(multiply(cos(Dec_deg), -0.9304510671785976),
cos(subtract(RA_deg, 168.9776949652776)))))>

Then my Acero graph is very simple. Just a table source node, then a filter
node on the timestamp (for exact match), and then another filter node for a
computed value of that expression under a threshold.

For 13 million observations, this takes about 15ms on my laptop using Acero.

But the same computation done with totally naive numpy is about 3ms.

The numpy version has no fanciness, just calling numpy trigonometric
functions and materializing all the intermediate results like you might
imagine, then eventually coming up with a boolean mask over everything and
calling `table.filter(mask)`.

So finally, my question: is this about what I should expect? I know Acero
has an advantage that it *would* work if my data were larger than fits in
memory, which is not true of my numpy approach. But I expected that Acero
would need to only visit columnar values once, so it should be able to
outpace the numpy approach. Should I instead think of Acero as mainly about
working on very large datasets?

-Spencer

Reply via email to