[GitHub] [spark] HyukjinKwon edited a comment on pull request #26783: [SPARK-30153][PYTHON][WIP] Extend data exchange options for vectorized UDF functions with vanilla Arrow serialization

GitBox Mon, 25 Oct 2021 19:42:58 -0700


HyukjinKwon edited a comment on pull request #26783:
URL: https://github.com/apache/spark/pull/26783#issuecomment-951504965



   > For the second one, I guess there might be more requirements than a 
map-style API?
   
   Yeah, I worry about this too. I thought that at least people would be able 
to do it though (given that Python RDD APIs are created on the top of one 
`RDD.mapPartitions`). 
   
   To naturally support all cases we should probably make it as a UDF .. but I 
was hesitant about adding it as `arrow_udf` because we will have to take care 
of other restrictions, and variants like aggregation, window, etc all together, 
and thought that might not be worthwhile - I was initially skeptical about this 
API because I thought that Arrow is rather an internal format instead of 
user-facing.
   
   So this made me propose one (developer) API that doesn’t require considering 
other restrictions
   (e.g., the length of input should be the same as output's in case of scalar 
UDF in `select`) or variants.
   
   I just tend to think that it might be worthwhile to have this one 
generalized version given that it has been requested some times, and the reason 
seems making sense, but still does not have a very strong opinion. I am 
checking w/ other people here :-).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon edited a comment on pull request #26783: [SPARK-30153][PYTHON][WIP] Extend data exchange options for vectorized UDF functions with vanilla Arrow serialization

Reply via email to