WeichenXu123 commented on code in PR #40896:
URL: https://github.com/apache/spark/pull/40896#discussion_r1184721358
##########
python/pyspark/sql/udf.py:
##########
@@ -249,6 +259,38 @@ def __init__(
self.evalType = evalType
self.deterministic = deterministic
+ # since 3.5.0, we introduce an internal optional function attribute
'_is_barrier',
+ # which is dedicated for integration with external ML training
frameworks including
+ # PyTorch and XGBoost.
+ # It indicates whether this UDF will be executed on barrier mode, and
is only accepted
+ # in methods 'mapInPandas' and 'mapInArrow'.
+ # For example:
+ #
+ # df = spark.createDataFrame([(1, 21), (2, 30)], ("id", "age"))
+ #
+ # def filter_func(iterator):
+ # for pdf in iterator:
+ # yield pdf[pdf.id == 1]
+ #
+ # filter_func._is_barrier = True # Mark this UDF is barrier
Review Comment:
Define a `@barrier` decorator looks better, and we can document `barrier`
doc saying this is a developer API and we should keep it stable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]