Weichen Xu created SPARK-31549:
----------------------------------

             Summary: Pyspark SparkContext.cancelJobGroup do not work correctly
                 Key: SPARK-31549
                 URL: https://issues.apache.org/jira/browse/SPARK-31549
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 2.4.5, 3.0.0
            Reporter: Weichen Xu


Pyspark SparkContext.cancelJobGroup do not work correctly. This is an issue 
existing for a long time. This is because of pyspark thread didn't pinned to 
jvm thread when invoking java side methods, which leads to all pyspark API 
which related to java local thread variables do not work correctly. (Including 
`sc.setLocalProperty`, `sc.cancelJobGroup`, `sc.setJobDescription` and so on.)

This is serious issue. Now there's an experimental pyspark 'PIN_THREAD' mode 
added in spark-3.0 which address it, but the 'PIN_THREAD' mode exists two issue:
* It is disabled by default. We need to set additional environment variable to 
enable it.
* There's memory leak issue which haven't been addressed.

Now there's a series of project like hyperopt-spark, spark-joblib which rely on 
`sc.cancelJobGroup` API (use it to stop running jobs in their code). So it is 
critical to address this issue and we hope it work under default pyspark mode. 
An optional approach is implementing methods like `rdd.setGroupAndCollect`.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to