HyukjinKwon commented on a change in pull request #28395:
URL: https://github.com/apache/spark/pull/28395#discussion_r417672569
##########
File path: python/pyspark/rdd.py
##########
@@ -877,6 +877,19 @@ def collect(self):
sock_info =
self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
return list(_load_from_socket(sock_info, self._jrdd_deserializer))
+ def collectWithJobGroup(self, groupId, description,
interruptOnCancel=False):
+ """
+ .. note:: Experimental
+
+ When collect rdd, use this method to specify job group.
+
+ .. versionadded:: 3.0.0
Review comment:
I dont think we will ignore the existing definitions of API stability
tags. Might be best to stick to what's documented there in tags ...
Of course we should take it considering the ammended semver but I think it's
a rather rubric to consider when we break.
I kind of strongly think we should deprecate/remove it to stick to the
standard approach for this one specifically later when we're able. I also agree
with having this APIs as a step to migrate meanwhile.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]