How? Example please..
Also, if I am running this in pyspark shell.. how do i configure
spark.akka.frameSize ??


On Sun, Sep 14, 2014 at 7:43 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> When the data size is huge, you better of use the torrentBroadcastFactory.
>
> Thanks
> Best Regards
>
> On Sun, Sep 14, 2014 at 2:54 PM, Chengi Liu <chengi.liu...@gmail.com>
> wrote:
>
>> Specifically the error I see when I try to operate on rdd created by
>> sc.parallelize method
>> : org.apache.spark.SparkException: Job aborted due to stage failure:
>> Serialized task 12:12 was 12062263 bytes which exceeds spark.akka.frameSize
>> (10485760 bytes). Consider using broadcast variables for large values.
>>
>> On Sun, Sep 14, 2014 at 2:20 AM, Chengi Liu <chengi.liu...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>    I am trying to create an rdd out of large matrix.... sc.parallelize
>>> suggest to use broadcast
>>> But when I do
>>>
>>> sc.broadcast(data)
>>> I get this error:
>>>
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>>   File "/usr/common/usg/spark/1.0.2/python/pyspark/context.py", line
>>> 370, in broadcast
>>>     pickled = pickleSer.dumps(value)
>>>   File "/usr/common/usg/spark/1.0.2/python/pyspark/serializers.py", line
>>> 279, in dumps
>>>     def dumps(self, obj): return cPickle.dumps(obj, 2)
>>> SystemError: error return without exception set
>>> Help?
>>>
>>>
>>
>

Reply via email to