BryanCutler commented on issue #24965: [WIP][SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs URL: https://github.com/apache/spark/pull/24965#issuecomment-506029253 Thanks for working on this @d80tb7 , I have been busy this week but will try to take a look soon. I would really prefer to stick with the Arrow stream format if at all possible. Could the Scala side send 2 complete Arrow streams to Python sequentially for each group? Then the worker would convert each stream into a Pandas DataFrame to evaluate the cogroup UDF. It would add some overhead since it will be sending more streams, but I think it will be minimal. WDYT?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
