BryanCutler commented on issue #24981: [WIP][SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs- Arrow Stream Impl URL: https://github.com/apache/spark/pull/24981#issuecomment-512979848 @d80tb7 thanks for running the benchmarks, it's good to see we can use Arrow stream format without any significant penalty. It would be best to stick with this PR if you can, as @icexelloss there is already a lot of good discussion here. As for the API, I prefer `df1.groupby('id').cogroup(df2.groupby('id')).apply(func)` a little more but not too strongly. I agree we should come to a consensus on one API though and not introduce an alternate form also.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
