I experienced the below two cases when unpersisting or destroying broadcast variables in pyspark. But the same works good in spark scala shell. Any clue why this happens ? Is it a bug in pyspark?
***Case 1:*** >>> b1 = sc.broadcast([1,2,3]) >>> b1.value [1, 2, 3] >>> b1.destroy() >>> b1.value [1, 2, 3] I can still access the value in driver. ***Case 2:*** >>> b = sc.broadcast([1,2,3]) >>> b.destroy() >>> b.value Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/sdh/Downloads/spark-2.2.1-bin-hadoop2.7/python/pyspark/broadcast.py", line 109, in value self._value = self.load(self._path) File "/home/sdh/Downloads/spark-2.2.1-bin-hadoop2.7/python/pyspark/broadcast.py", line 95, in load with open(path, 'rb', 1 << 20) as f: IOError: [Errno 2] No such file or directory: u'/tmp/spark-eef352c0-6470-4b89-999f-923493a27bc4/pyspark-17d3a9a3-b5c1-4331-b408-8447f078789e/tmpzq4kv0' Rather i should get a message something similar to "Attempted to use broadcast variable after it was destroyed" -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org