[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

BryanCutler Mon, 27 Aug 2018 14:23:06 -0700

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/21546
  
    Hey @HyukjinKwon , after going through the previous benchmarks, it seems 
out-of-order batches had more of an effect on performance that I thought with 
`toPandas`. The current revision of this PR (which buffers out of order batches 
in the driver JVM) has about a 1.06x - 1.09x speedup which seems a bit 
underwhelming after getting ~1.25x when sending out-of-order batches. I still 
want to try to verify the old numbers and will hopefully get to that tomorrow.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

Reply via email to