ion-elgreco commented on PR #38624: URL: https://github.com/apache/spark/pull/38624#issuecomment-1685688494
> I get that `cogroup` might not be possible tho. But we can just convert pandas back to arrow batches easily. Is this really required for some scenario? IIRC this is only useful for addressing nested types. Even with pandas 2.0+ converting between pandas and Arrow is not fully zero copy, so this will add some latency to it. And Pandas has some quirks with upcasting, not respecting the original datatypes, null handling, and like you say nested dtypes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
