BryanCutler commented on issue #24965: [WIP][SPARK-27463][PYTHON] Support 
Dataframe Cogroup via Pandas UDFs
URL: https://github.com/apache/spark/pull/24965#issuecomment-506029253
 
 
   Thanks for working on this @d80tb7 , I have been busy this week but will try 
to take a look soon. I would really prefer to stick with the Arrow stream 
format if at all possible.  Could the Scala side send 2 complete Arrow streams 
to Python sequentially for each group? Then the worker would convert each 
stream into a Pandas DataFrame to evaluate the cogroup UDF. It would add some 
overhead since it will be sending more streams, but I think it will be minimal. 
WDYT?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to