[GitHub] [spark] BryanCutler commented on issue #24965: [WIP][SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs

GitBox Wed, 26 Jun 2019 15:49:27 -0700

BryanCutler commented on issue #24965: [WIP][SPARK-27463][PYTHON] Support 
Dataframe Cogroup via Pandas UDFs
URL: https://github.com/apache/spark/pull/24965#issuecomment-506073096
 
 
   > a completely separate arrow stream for every group. In this case we would 
only have to hold 2 batches in memory at any one time, albeit at the cost of 
paying the stream overhead (schema etc) for every group.
   
   Yes, this is what I was getting at. Each complete stream is one group, so 
there wouldn't be more than the 2 groups required in memory at a time.  Btw, 
the reason I prefer to stick to the stream protocol is that Arrow already tests 
this thoroughly and while sending messages piecemeal will probably work fine, 
it's just not actively being tested between Java and C++/Python.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] BryanCutler commented on issue #24965: [WIP][SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs

Reply via email to