[GitHub] [spark] ion-elgreco commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

via GitHub Sun, 20 Aug 2023 22:46:01 -0700


ion-elgreco commented on PR #38624:
URL: https://github.com/apache/spark/pull/38624#issuecomment-1685688494


   > I get that `cogroup` might not be possible tho. But we can just convert 
pandas back to arrow batches easily. Is this really required for some scenario? 
IIRC this is only useful for addressing nested types.
   
   Even with pandas 2.0+ converting between pandas and Arrow is not fully zero 
copy, so this will add some latency to it. 
   
   And Pandas has some quirks with upcasting, not respecting the original 
datatypes, null handling, and like you say nested dtypes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ion-elgreco commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

Reply via email to