Re: PySpark Broadcast of User Defined Class No Work?

2016-01-18 Thread Maciej Szymkiewicz
Python can pickle only objects not classes. It means that SimpleClass
has to importable on every worker node to enable correct
deserialization. Typically it means keeping class definitions in a
separate module and distributing using for example --py-files.


On 01/19/2016 12:34 AM, efwalkermit wrote:
> Should I be able to broadcast a fairly simple user-defined class?  I'm having
> no success in 1.6.0 (or 1.5.2):
>
> $ cat test_spark.py
> import pyspark
>
>
> class SimpleClass:
> def __init__(self):
> self.val = 5
> def get(self):
> return self.val
>
>
> def main():
> sc = pyspark.SparkContext()
> b = sc.broadcast(SimpleClass())
> results = sc.parallelize([ x for x in range(10) ]).map(lambda x: x +
> b.value.get()).collect() 
>
> if __name__ == '__main__':
> main()
>
>
> $ spark-submit --master local[1] test_spark.py
> [snip]
>   File "/Users/ed/src/mrspark/examples/fortyler/test_spark.py", line 14, in
> 
> results = sc.parallelize([ x for x in range(10) ]).map(lambda x: x +
> b.value.get()).collect()
>   File
> "/Users/ed/.spark/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/broadcast.py",
> line 97, in value
> self._value = self.load(self._path)
>   File
> "/Users/ed/.spark/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/broadcast.py",
> line 88, in load
> return pickle.load(f)
> AttributeError: 'module' object has no attribute 'SimpleClass'
> [snip]
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-Broadcast-of-User-Defined-Class-No-Work-tp26000.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>




signature.asc
Description: OpenPGP digital signature


PySpark Broadcast of User Defined Class No Work?

2016-01-18 Thread efwalkermit
Should I be able to broadcast a fairly simple user-defined class?  I'm having
no success in 1.6.0 (or 1.5.2):

$ cat test_spark.py
import pyspark


class SimpleClass:
def __init__(self):
self.val = 5
def get(self):
return self.val


def main():
sc = pyspark.SparkContext()
b = sc.broadcast(SimpleClass())
results = sc.parallelize([ x for x in range(10) ]).map(lambda x: x +
b.value.get()).collect() 

if __name__ == '__main__':
main()


$ spark-submit --master local[1] test_spark.py
[snip]
  File "/Users/ed/src/mrspark/examples/fortyler/test_spark.py", line 14, in

results = sc.parallelize([ x for x in range(10) ]).map(lambda x: x +
b.value.get()).collect()
  File
"/Users/ed/.spark/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/broadcast.py",
line 97, in value
self._value = self.load(self._path)
  File
"/Users/ed/.spark/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/broadcast.py",
line 88, in load
return pickle.load(f)
AttributeError: 'module' object has no attribute 'SimpleClass'
[snip]



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-Broadcast-of-User-Defined-Class-No-Work-tp26000.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org