Hi Jacek,

The reason for this big difference between using the numpy vs python
scalar in case of `pc.equal` is because in pyarrow we don't have a
smart casting of python scalars, depending on the other types in the
operation. The Python scalar gets converted to a pyarrow scalar, and
defaults to int64. As a result, we are doing a comparison of uint8 and
int64 types, and then the uint8 array gets cast to the common type,
i.e. int64. While when passing a numpy scalar, pyarrow will preserve
the type and convert that to a uint8 pyarrow scalar, and then the
comparison is done with uint8 and uint8, not requiring any casting.

You can see this difference as well when explicitly creating a pyarrow scalar:

import numpy as np
import pyarrow as pa
import pyarrow.compute as pc
data_np = np.random.randint(0, 100, 10_000_000, dtype="uint8")
data_pa = pa.array(data_np)

In [12]: %timeit pc.equal(data_pa, pa.scalar(115, pa.uint8()))
3.38 ms ± 29.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [13]: %timeit pc.equal(data_pa, pa.scalar(115, pa.int64()))
35.6 ms ± 700 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

(my numbers are generally a bit bigger, but the relative difference of
around x10 without vs with casting is similar to your timings)

Best,
Joris

On Wed, 8 Nov 2023 at 22:04, Jacek Pliszka <[email protected]> wrote:
>
> Hi!
>
> I got surprising results when comparing numpy and pyarrow performance.
>
> val = np.uint8(115)
>
> numpy has similar speed if using 115 and np.uint8(115):
>
> %timeit np.count_nonzero(data_np == val)
> 591 µs ± 3.56 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
> %timeit np.count_nonzero(data_np == 115)
> 598 µs ± 3.73 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>
> strangely it is fastest for b's"
>
> %timeit np.count_nonzero(data_np == b"s")
> 403 µs ± 3.15 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>
> pc.equal is 2.5 slower for np.uint8(115):
>
> %timeit pc.equal(data_pa, val).sum().as_py()
> 1.64 ms ± 8.23 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>
> but much, much slower for 115:
>
> %timeit pc.equal(data_pa, 115).sum().as_py()
> 15.6 ms ± 21.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
> And fails for b"s":
>
> %timeit pc.equal(data_pa, b"s").sum().as_py()
> ArrowNotImplementedError: Function 'equal' has no kernel matching
> input types (uint8, binary)
>
> I wrote it down in https://github.com/apache/arrow/issues/38640
>
> Any chance to get performance closer to numpy?
>
> BR,
>
> Jacek

Reply via email to