Hi Yifan, You could also try increasing the spark.kryoserializer.buffer.max.mb
*spark.kryoserializer.buffer.max.mb *(64 Mb by default) : useful if your default buffer size goes further than 64 Mb; Per doc: Maximum allowable size of Kryo serialization buffer. This must be larger than any object you attempt to serialize. Increase this if you get a "buffer limit exceeded" exception inside Kryo. -Todd On Fri, Oct 23, 2015 at 6:51 AM, Yifan LI <iamyifa...@gmail.com> wrote: > Thanks for your advice, Jem. :) > > I will increase the partitioning and see if it helps. > > Best, > Yifan LI > > > > > > On 23 Oct 2015, at 12:48, Jem Tucker <jem.tuc...@gmail.com> wrote: > > Hi Yifan, > > I think this is a result of Kryo trying to seriallize something too large. > Have you tried to increase your partitioning? > > Cheers, > > Jem > > On Fri, Oct 23, 2015 at 11:24 AM Yifan LI <iamyifa...@gmail.com> wrote: > >> Hi, >> >> I have a big sorted RDD sRdd(~962million elements), and need to scan its >> elements in order(using sRdd.toLocalIterator). >> >> But the process failed when the scanning was done after around 893million >> elements, returned with following exception: >> >> Anyone has idea? Thanks! >> >> >> Exception in thread "main" org.apache.spark.SparkException: Job aborted >> due to stage failure: Task 0 in stage 421752.0 failed 128 times, most >> recent failure: Lost task 0.127 in stage 421752.0 (TID 17304, >> small15-tap1.common.lip6.fr): java.lang.NegativeArraySizeException >> at >> com.esotericsoftware.kryo.util.IdentityObjectIntMap.resize(IdentityObjectIntMap.java:409) >> at >> com.esotericsoftware.kryo.util.IdentityObjectIntMap.putStash(IdentityObjectIntMap.java:227) >> at >> com.esotericsoftware.kryo.util.IdentityObjectIntMap.push(IdentityObjectIntMap.java:221) >> at >> com.esotericsoftware.kryo.util.IdentityObjectIntMap.put(IdentityObjectIntMap.java:117) >> at >> com.esotericsoftware.kryo.util.IdentityObjectIntMap.putStash(IdentityObjectIntMap.java:228) >> at >> com.esotericsoftware.kryo.util.IdentityObjectIntMap.push(IdentityObjectIntMap.java:221) >> at >> com.esotericsoftware.kryo.util.IdentityObjectIntMap.put(IdentityObjectIntMap.java:117) >> at >> com.esotericsoftware.kryo.util.MapReferenceResolver.addWrittenObject(MapReferenceResolver.java:23) >> at com.esotericsoftware.kryo.Kryo.writeReferenceOrNull(Kryo.java:598) >> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:566) >> at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:36) >> at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:33) >> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) >> at >> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318) >> at >> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293) >> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) >> at >> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:250) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:236) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> Driver stacktrace: >> at org.apache.spark.scheduler.DAGScheduler.org >> <http://org.apache.spark.scheduler.dagscheduler.org/> >> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >> at >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) >> at scala.Option.foreach(Option.scala:236) >> at >> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) >> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >> >> Best, >> Yifan LI >> >> >> >> >> >> >