[GitHub] [spark] igorghi commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

via GitHub Tue, 22 Aug 2023 06:14:25 -0700


igorghi commented on PR #38624:
URL: https://github.com/apache/spark/pull/38624#issuecomment-1688166890


   @HyukjinKwon this may be a misunderstanding on my part regarding the inner 
works but for the `repartition(grouping_cols).mapInArrow` workaround, wouldn't 
the [batch 
size](https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html#setting-arrow-batch-size)
 present a problem where we would end up not having the full group available in 
the Arrow RecordBatch depending on the batch size parameter, for example using 
the default 10K batch size and the data have more than 10K rows in any 
partition?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] igorghi commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

Reply via email to