Python can pickle only objects not classes. It means that SimpleClass
has to importable on every worker node to enable correct
deserialization. Typically it means keeping class definitions in a
separate module and distributing using for example --py-files.
On 01/19/2016 12:34 AM, efwalkermit wrote:
> Should I be able to broadcast a fairly simple user-defined class? I'm having
> no success in 1.6.0 (or 1.5.2):
>
> $ cat test_spark.py
> import pyspark
>
>
> class SimpleClass:
> def __init__(self):
> self.val = 5
> def get(self):
> return self.val
>
>
> def main():
> sc = pyspark.SparkContext()
> b = sc.broadcast(SimpleClass())
> results = sc.parallelize([ x for x in range(10) ]).map(lambda x: x +
> b.value.get()).collect()
>
> if __name__ == '__main__':
> main()
>
>
> $ spark-submit --master local[1] test_spark.py
> [snip]
> File "/Users/ed/src/mrspark/examples/fortyler/test_spark.py", line 14, in
>
> results = sc.parallelize([ x for x in range(10) ]).map(lambda x: x +
> b.value.get()).collect()
> File
> "/Users/ed/.spark/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/broadcast.py",
> line 97, in value
> self._value = self.load(self._path)
> File
> "/Users/ed/.spark/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/broadcast.py",
> line 88, in load
> return pickle.load(f)
> AttributeError: 'module' object has no attribute 'SimpleClass'
> [snip]
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-Broadcast-of-User-Defined-Class-No-Work-tp26000.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
signature.asc
Description: OpenPGP digital signature