HyukjinKwon edited a comment on issue #26783: [SPARK-30153][PYTHON][WIP] Extend data exchange options for vectorized UDF functions with vanilla Arrow serialization URL: https://github.com/apache/spark/pull/26783#issuecomment-562911318 Pandas DataFrame can be easily changed to Arrow table. If Arrow manages to get rid of the overhead you guys observed (which the project targets), problem solved, no? Looks like we're adding a workaround for the issue in Arrow side. Also, currently all Spark types are being mapped (except MapType but theres already a PR to fix). So, there's no missing type mapping that we can take advantages of by switching to Arrow directly either.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
