[ https://issues.apache.org/jira/browse/SPARK-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen resolved SPARK-3992. ------------------------------- Resolution: Won't Fix Resolving as "Won't Fix" since this issue is really old and is targeted at Spark 1.1.0. Please re-open if this is still an issue. > Spark 1.1.0 python binding cannot use any collections but list as Accumulators > ------------------------------------------------------------------------------ > > Key: SPARK-3992 > URL: https://issues.apache.org/jira/browse/SPARK-3992 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.1.0 > Environment: 3.13.0-36-generic #63-Ubuntu SMP Wed Sep 3 21:30:07 UTC > 2014 x86_64 x86_64 x86_64 GNU/Linux > Reporter: Walter Bogorad > > A dictionary accumulator defined as a global variable is not visible inside a > function called by "foreach()". > Here is the minimal code snippet: > {noformat} > from collections import defaultdict > from pyspark import SparkContext > from pyspark.accumulators import AccumulatorParam > class DictAccumParam(AccumulatorParam): > def zero(self, value): > value.clear() > def addInPlace(self, val1, val2): > return val1 > sc = SparkContext("local", "Dict Accumulator Bug") > va = sc.accumulator(defaultdict(int), DictAccumParam()) > def foo(x): > global va > print "va is:", va > rdd = sc.parallelize([1,2,3]).foreach(foo) > {noformat} > When ran the code snippet produced the following results: > ... > va is: None > va is: None > va is: None > ... > I have verified that the global variables are visible inside foo() called by > foreach only if they are for scalars or lists like in the API doc at > http://spark.apache.org/docs/latest/api/python/ > The problem exists with standard dictionaries and collections. > I also verified that if foo() is called directly, i.e. outside foreach, then > the global variables are visible OK. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org