WeichenXu123 opened a new pull request #28395:
URL: https://github.com/apache/spark/pull/28395


   ### What changes were proposed in this pull request?
   I add a new API in pyspark RDD class:
   
   def collectWithJobGroup(self, groupId, description, interruptOnCancel=False)
   
   This API do the same thing with `rdd.collect`, but it can specify the job 
group when do collect.
   The purpose of adding this API is, if we use:
   
   ```
   sc.setJobGroup("group-id...")
   rdd.collect()
   ```
   The `setJobGroup` API in pyspark won't work correctly. This related to a bug 
discussed in 
   https://issues.apache.org/jira/browse/SPARK-31549
   
   ### Why are the changes needed?
   Fix bug.
   
   
   ### Does this PR introduce any user-facing change?
   A develop API in pyspark: `pyspark.RDD. collectWithJobGroup`
   
   
   ### How was this patch tested?
   Unit test.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to