zhengruifeng commented on PR #40896: URL: https://github.com/apache/spark/pull/40896#issuecomment-1542989294
> If we're going to have this standalone API, this should work together with other similar API like groupby().applyInPandas. I think we won't support other Pandas API `groupby().applyInPandas`: - it is non-trivial to support due to limitation of `RDDBarrier`; - the ml side doesn't need them for now; > It seems to me that functions under python/pyspark/sql/pandas/utils.py are used internally - in the existing Spark source code only. > A developer API "should" still be called externally, by developers though, e.g. semanticHash. That is a good point, what about keeping `barrier` in `pandas/utils.py` and only used it internally like other helper functions? @HyukjinKwon @WeichenXu123 @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
