When the data size is huge, you better of use the torrentBroadcastFactory.

Thanks
Best Regards

On Sun, Sep 14, 2014 at 2:54 PM, Chengi Liu <chengi.liu...@gmail.com> wrote:

> Specifically the error I see when I try to operate on rdd created by
> sc.parallelize method
> : org.apache.spark.SparkException: Job aborted due to stage failure:
> Serialized task 12:12 was 12062263 bytes which exceeds spark.akka.frameSize
> (10485760 bytes). Consider using broadcast variables for large values.
>
> On Sun, Sep 14, 2014 at 2:20 AM, Chengi Liu <chengi.liu...@gmail.com>
> wrote:
>
>> Hi,
>>    I am trying to create an rdd out of large matrix.... sc.parallelize
>> suggest to use broadcast
>> But when I do
>>
>> sc.broadcast(data)
>> I get this error:
>>
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File "/usr/common/usg/spark/1.0.2/python/pyspark/context.py", line 370,
>> in broadcast
>>     pickled = pickleSer.dumps(value)
>>   File "/usr/common/usg/spark/1.0.2/python/pyspark/serializers.py", line
>> 279, in dumps
>>     def dumps(self, obj): return cPickle.dumps(obj, 2)
>> SystemError: error return without exception set
>> Help?
>>
>>
>

Reply via email to