Re: [PR] [SPARK-49547][SQL][PYTHON] Support returning iterator of RecordBatches in applyInArrow [spark]

via GitHub Wed, 27 Nov 2024 19:52:04 -0800


zhengruifeng commented on PR #48038:
URL: https://github.com/apache/spark/pull/48038#issuecomment-2505211865


   @Kimahriman Thanks for working on this, I personally prefer a PR containing 
both Python changes and Scala changes, to make sure the new feature works.
   I think this work could be split into several PRs in this way:
   
   - `grouped.applyInArrow` in Spark Classic;
   - `grouped.applyInArrow` in Spark Connect;
   - `grouped.applyInPandas` in Spark Classic;
   - `grouped.applyInPandas` in Spark Connect;
   - `cogrouped.applyInArrow` in Spark Classic;
   - `cogrouped.applyInArrow` in Spark Connect;
   - `cogrouped.applyInPandas` in Spark Classic;
   - `cogrouped.applyInPandas` in Spark Connect;
   
   And regarding `cogrouped.applyInArrow` with new iterator interface, I am 
still not sure how the RecordBatches of both sides should be split. Maybe we 
can focus on `grouped.applyInXXX` first.
   
   WDYT? @HyukjinKwon @xinrong-meng 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49547][SQL][PYTHON] Support returning iterator of RecordBatches in applyInArrow [spark]

Reply via email to