[GitHub] [spark] HyukjinKwon edited a comment on issue #26783: [SPARK-30153][PYTHON][WIP] Extend data exchange options for vectorized UDF functions with vanilla Arrow serialization

GitBox Sat, 07 Dec 2019 20:16:31 -0800

HyukjinKwon edited a comment on issue #26783: [SPARK-30153][PYTHON][WIP] Extend 
data exchange options for vectorized UDF functions with vanilla Arrow 
serialization
URL: https://github.com/apache/spark/pull/26783#issuecomment-562911318
 
 
   Pandas DataFrame can be easily changed to Arrow table. If Arrow manages to 
get rid of the overhead you guys observed (which the project targets), problem 
solved, no?
   
   Looks like we're adding a workaround for the issue in Arrow side. Also, 
currently all Spark types are being mapped (except MapType but theres already a 
PR to fix). So, there's no missing type mapping that we can take advantages of 
by switching to Arrow directly either.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon edited a comment on issue #26783: [SPARK-30153][PYTHON][WIP] Extend data exchange options for vectorized UDF functions with vanilla Arrow serialization

Reply via email to