ion-elgreco commented on PR #38624: URL: https://github.com/apache/spark/pull/38624#issuecomment-1686654192
> Arrow was considered as an internal format initially, and that's the whole reason why pandas came up first. In fact, the number of pandas users are (much) higher given some stats I get given, and is informally considered as the standard TBH. It's too late to deprecate/remote pandas API, and switch the standard to Arrow in any event. That's fine, the pandasUDF shouldn't be deprecated, but there should be at least an alternative to only use Arrow. Especially since the ARROW ecosystem is growing and slowly becoming the Defacto standard for transferring data between different interfaces. > I am open if you'd like to raise a discussion in the dev mailing list, and we can discuss there to reach a consensus - I don't object. Where is this dev mailing list and how do I raise a discussion there? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
