zhengruifeng commented on PR #48038: URL: https://github.com/apache/spark/pull/48038#issuecomment-2505211865
@Kimahriman Thanks for working on this, I personally prefer a PR containing both Python changes and Scala changes, to make sure the new feature works. I think this work could be split into several PRs in this way: - `grouped.applyInArrow` in Spark Classic; - `grouped.applyInArrow` in Spark Connect; - `grouped.applyInPandas` in Spark Classic; - `grouped.applyInPandas` in Spark Connect; - `cogrouped.applyInArrow` in Spark Classic; - `cogrouped.applyInArrow` in Spark Connect; - `cogrouped.applyInPandas` in Spark Classic; - `cogrouped.applyInPandas` in Spark Connect; And regarding `cogrouped.applyInArrow` with new iterator interface, I am still not sure how the RecordBatches of both sides should be split. Maybe we can focus on `grouped.applyInXXX` first. WDYT? @HyukjinKwon @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
