Xiao Li created SPARK-28485: ------------------------------- Summary: PythonBroadcast may delete the broadcast file while a Python worker still needs it Key: SPARK-28485 URL: https://issues.apache.org/jira/browse/SPARK-28485 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.0.0 Reporter: Xiao Li
How to reproduce: Run "bin/pyspark --master local[1,1] --conf spark.memory.fraction=0.0001" to start PySpark and run the following codes: {code:java} b = sc.broadcast([100]) sc.parallelize([0],1).map(lambda x: 0 if x == 0 else b.value[0]).collect() sc._jvm.java.lang.System.gc() import time time.sleep(5) sc._jvm.java.lang.System.gc() time.sleep(5) sc.parallelize([1],1).map(lambda x: 0 if x == 0 else b.value[0]).collect(){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org