------------------------------------------------------------------------------------------------------------------------------------------def
 parse_record(x):    formatted = list(x[0])    if (type(x[1])!=list) & 
(type(x[1])!=tuple):        formatted.append(x[1])    else:        
formatted.extend(list(x[1]))    return formatted
def writeRecords(records):    output = StringIO.StringIO()    for record in 
records:        writer = csv.writer(output, delimiter=',')        
writer.writerow(record)    return [output.getvalue()]
validation_formatted = validation_res.map(lambda x: 
parse_record(x))validation_partition = 
validation_formatted.repartition(80)validation_partition.mapPartitions(writeRecords).saveAsTextFile(broadcast_path
 + 
"para_predictions")------------------------------------------------------------------------------------------------------------------------------------------
The error happens when saveAsTextFile. I changed the  repartition number to 200 
or 300. But still got this error. 
Thanks for help.
Date: Thu, 8 Oct 2015 13:55:32 -0700
Subject: Re: ValueError: can not serialize object larger than 2G
From: yuzhih...@gmail.com
To: zxd_ci...@hotmail.com
CC: user@spark.apache.org

To fix the problem, consider increasing number of partitions for your job.
Showing code snippet would help us understand your use case better.
Cheers
On Thu, Oct 8, 2015 at 1:39 PM, Ted Yu <yuzhih...@gmail.com> wrote:
See the comment of FramedSerializer() in serializers.py :
    Serializer that writes objects as a stream of (length, data) pairs,    
where C{length} is a 32-bit integer and data is C{length} bytes.
Hence the limit on the size of object.
On Thu, Oct 8, 2015 at 12:56 PM, XIANDI <zxd_ci...@hotmail.com> wrote:
  File "/home/hadoop/spark/python/pyspark/worker.py", line 101, in main

    process()

  File "/home/hadoop/spark/python/pyspark/worker.py", line 96, in process

    serializer.dump_stream(func(split_index, iterator), outfile)

  File "/home/hadoop/spark/python/pyspark/serializers.py", line 126, in

dump_stream

    self._write_with_length(obj, stream)

  File "/home/hadoop/spark/python/pyspark/serializers.py", line 140, in

_write_with_length

    raise ValueError("can not serialize object larger than 2G")

ValueError: can not serialize object larger than 2G



Does anyone know how does this happen?



Thanks!







--

View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ValueError-can-not-serialize-object-larger-than-2G-tp24984.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.



---------------------------------------------------------------------

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands, e-mail: user-h...@spark.apache.org






                                                                                
  

Reply via email to