A search shows several historical threads for similar Kryo issues, but none seem to have a definitive solution. Currently using Spark 1.2.0.
While collecting/broadcasting/grouping moderately sized data sets (~500MB - 1GB), I regularly see exceptions such as the one below. I’ve tried increasing the spark.kryoserializer.buffer.max.mb value to 10000MB, which I’m certain far exceeds the size of the data, but it doesn’t seem to make any difference. Reducing it to 1MB also doesn’t seem to have an effect... I’ve been able to work around these exceptions in a few cases by reducing the memory consumption of some classes, but it does seem like Spark should be able to handle stuff like this. Any ideas, or suggestions for how to go about debugging this? Cheers Tristan ---- 2015-01-28 20:39:42 INFO DAGScheduler:59 - Stopping DAGScheduler 2015-01-28 20:39:43 INFO MapOutputTrackerMasterActor:59 - MapOutputTrackerActor stopped! 2015-01-28 20:39:43 ERROR Executor:96 - Exception in task 82.0 in stage 8.0 (TID 462) com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 3 Serialization trace: a_field (org.some.class) another_field (org.some.class) otherElements (org.apache.spark.util.collection.CompactBuffer) at com.esotericsoftware.kryo.io.Output.require(Output.java:138) at com.esotericsoftware.kryo.io.Output.writeAscii_slow(Output.java:446) at com.esotericsoftware.kryo.io.Output.writeString(Output.java:306) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:153) at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:146) at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:549) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:570) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293) at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564) at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) at com.twitter.chill.JavaIterableWrapperSerializer.write(JavaIterableWrapperSerializer.scala:21) at com.twitter.chill.JavaIterableWrapperSerializer.write(JavaIterableWrapperSerializer.scala:13) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:37) at com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:33) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:165) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-01-28 20:39:44 ERROR TaskSchedulerImpl:96 - Exception in statusUpdate java.util.concurrent.RejectedExecutionException: Task org.apache.spark.scheduler.TaskResultGetter$$anon$3@58ec279b rejected from java.util.concurrent.ThreadPoolExecutor@70747cc1[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 542] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) at org.apache.spark.scheduler.TaskResultGetter.enqueueFailedTask(TaskResultGetter.scala:97) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$liftedTree2$1$1.apply(TaskSchedulerImpl.scala:304) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$liftedTree2$1$1.apply(TaskSchedulerImpl.scala:298) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.TaskSchedulerImpl.liftedTree2$1(TaskSchedulerImpl.scala:298) at org.apache.spark.scheduler.TaskSchedulerImpl.statusUpdate(TaskSchedulerImpl.scala:283) at org.apache.spark.scheduler.local.LocalActor$$anonfun$receiveWithLogging$1.applyOrElse(LocalBackend.scala:61) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 2015-01-28 20:39:44 INFO MemoryStore:59 - MemoryStore cleared 2015-01-28 20:39:44 INFO BlockManager:59 - BlockManager stopped