[GitHub] [spark] zhengruifeng commented on pull request #40896: [SPARK-43229][ML][PYTHON][CONNECT] Introduce Barrier Python UDF

via GitHub Wed, 10 May 2023 17:51:18 -0700


zhengruifeng commented on PR #40896:
URL: https://github.com/apache/spark/pull/40896#issuecomment-1542989294


   > If we're going to have this standalone API, this should work together with 
other similar API like groupby().applyInPandas.
   
   I think we won't support other Pandas API `groupby().applyInPandas`:
   
   - it is non-trivial to support due to limitation of `RDDBarrier`;
   - the ml side doesn't need them for now;
   
   
   > It seems to me that functions under python/pyspark/sql/pandas/utils.py are 
used internally - in the existing Spark source code only.
   
   > A developer API "should" still be called externally, by developers though, 
e.g. semanticHash.
   
   That is a good point, what about keeping `barrier` in `pandas/utils.py` and 
only used it internally like other helper functions? 
   
   @HyukjinKwon @WeichenXu123 @xinrong-meng 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on pull request #40896: [SPARK-43229][ML][PYTHON][CONNECT] Introduce Barrier Python UDF

Reply via email to