Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/15821
Here are some rough benchmarks done locally on machine with 16GB mem and 8
cores, using Spark config defaults and taken from 50 trials of calling
`toPandas()` with and without Arrow enabled:
## 1mm Longs
_ | With Arrow | Without Arrow
--|------------|-------------------
count | 50.000000 | 50.000000
mean | 0.190573 | 2.576587
std | 0.078450 | 0.114455
min | 0.139911 | 2.259916
25% | 0.148212 | 2.516289
50% | 0.163769 | 2.555433
75% | 0.184402 | 2.631316
max | 0.518090 | 2.946415
**13.52x speedup** on average
## 1mm Doubles
_ | With Arrow | Without Arrow
--|------------|-------------------
count | 50.000000 | 50.000000
mean | 0.259145 | 2.090295
std | 0.069620 | 0.123091
min | 0.196666 | 1.998588
25% | 0.209051 | 2.015083
50% | 0.230751 | 2.032701
75% | 0.270519 | 2.122219
max | 0.439556 | 2.485232
**8.07x speedup** on average
Script to generate these can be found
[here](https://issues.apache.org/jira/secure/attachment/12849193/benchmark.py)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]