Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21546
Hey @HyukjinKwon , after going through the previous benchmarks, it seems
out-of-order batches had more of an effect on performance that I thought with
`toPandas`. The current revision of this PR (which buffers out of order batches
in the driver JVM) has about a 1.06x - 1.09x speedup which seems a bit
underwhelming after getting ~1.25x when sending out-of-order batches. I still
want to try to verify the old numbers and will hopefully get to that tomorrow.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]