[ https://issues.apache.org/jira/browse/ARROW-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170112#comment-17170112 ]
Wes McKinney commented on ARROW-9623: ------------------------------------- My guess is that it's because NumPy does runtime AVX2 dispatch. I made an AVX2 build of pyarrow and I see no performance difference {code} In [5]: arr = np.random.randn(100000000) In [6]: timeit arr * arr 87.3 ms ± 813 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [7]: pa_arr = pa.array(arr) In [8]: timeit pc.multiply(pa_arr, pa_arr) 87.5 ms ± 5.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) {code} > [Python] Performance difference between pc.multiply vs pd.multiply > ------------------------------------------------------------------ > > Key: ARROW-9623 > URL: https://issues.apache.org/jira/browse/ARROW-9623 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Affects Versions: 1.0.0 > Environment: Windows > Pyarrow 1.0.0 > Reporter: H G > Priority: Minor > > Wanted to report the performance difference observed between Pandas and > Pyarrow. > > {code:java} > import numpy as np > import pandas as pd > import pyarrow as pa > import pyarrow.compute as pc > df = pd.DataFrame(np.random.randn(100000000)) > %timeit -n 5 -r 5 df.multiply(df) > table = pa.Table.from_pandas(df) > %timeit -n 5 -r 5 pc.multiply(table[0],table[0]) > {code} > Results: > {code:java} > %timeit -n 5 -r 5 df.multiply(df) > 374 ms ± 15.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)``{code} > > {code:java} > %timeit -n 5 -r 5 pc.multiply(table[0],table[0]) > 698 ms ± 297 ms per loop (mean ± std. dev. of 5 runs, 5 loops each){code} -- This message was sent by Atlassian Jira (v8.3.4#803005)