[GitHub] [spark] goodwanghan commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

via GitHub Sun, 20 Aug 2023 23:05:44 -0700


goodwanghan commented on PR #38624:
URL: https://github.com/apache/spark/pull/38624#issuecomment-1685705080


   > 
   
   Thanks for the clarification. Actually repartition plus presort should work.
   
   I think technically, this is very doable, the performance should also be 
decent. But I think this is an essential programming interface that the 
official pyspark should have (and given you already have `mapInArrow`, 
`applyInArrow` seems to be a natural expectation from users). It's important 
also because it is a semantic that is totally independent from pandas. You 
underlying implementation of pandas udf is all based on arrow, I feel it is not 
even necessary to call them pandas udfs (I don't expect you to change the 
names, just to share my opinion)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] goodwanghan commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

Reply via email to