You probably shouldn't be collecting a 10g dataset, because that is going to put all the 10g to the driver node ...
On Fri, Oct 4, 2013 at 6:53 PM, Ryan Compton <[email protected]> wrote: > Some hints: I'm doing collect() on a large (~10g??) dataset. If I > shrink that down, I have no problems. Ive tried > > System.setProperty("spark.akka.frameSize", "15420") > > But then I get: > > 13/10/04 18:49:33 ERROR client.Client$ClientActor: Failed to connect to > master > org.jboss.netty.channel.ChannelPipelineException: Failed to initialize > a pipeline. > at > org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:209) > at > org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:183) > at > akka.remote.netty.ActiveRemoteClient$$anonfun$connect$1.apply$mcV$sp(Client.scala:173) > at akka.util.Switch.liftedTree1$1(LockUtil.scala:33) > at akka.util.Switch.transcend(LockUtil.scala:32) > at akka.util.Switch.switchOn(LockUtil.scala:55) > at akka.remote.netty.ActiveRemoteClient.connect(Client.scala:158) > at > akka.remote.netty.NettyRemoteTransport.send(NettyRemoteSupport.scala:153) > at akka.remote.RemoteActorRef.$bang(RemoteActorRefProvider.scala:247) > at > org.apache.spark.deploy.client.Client$ClientActor.preStart(Client.scala:61) > at akka.actor.ActorCell.create$1(ActorCell.scala:508) > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:600) > at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:209) > at akka.dispatch.Mailbox.run(Mailbox.scala:178) > at > akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516) > at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259) > at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) > at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) > at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) > Caused by: java.lang.IllegalArgumentException: maxFrameLength must be > a positive integer: -1010827264 > at > org.jboss.netty.handler.codec.frame.LengthFieldBasedFrameDecoder.<init>(LengthFieldBasedFrameDecoder.java:270) > at > org.jboss.netty.handler.codec.frame.LengthFieldBasedFrameDecoder.<init>(LengthFieldBasedFrameDecoder.java:236) > at > akka.remote.netty.ActiveRemoteClientPipelineFactory.getPipeline(Client.scala:340) > at > org.jboss.netty.bootstrap.ClientBootstrap.connect(ClientBootstrap.java:207) > ... 18 more > 13/10/04 18:49:33 ERROR cluster.SparkDeploySchedulerBackend: > Disconnected from Spark cluster! > 13/10/04 18:49:33 ERROR cluster.ClusterScheduler: Exiting due to error > from cluster scheduler: Disconnected from Spark cluster > > On Fri, Oct 4, 2013 at 5:31 PM, Ryan Compton <[email protected]> > wrote: > > When I turn on Kryo serialization in 0.8 my jobs fail with these > > errors and don't understand what's going wrong. Any ideas? > > > > I've got these properties: > > > > //my usual spark props > > System.setProperty("spark.serializer", > > "org.apache.spark.serializer.KryoSerializer") > > System.setProperty("spark.kryo.registrator", > > classOf[OSIKryoRegistrator].getName) > > System.setProperty("spark.cores.max", "532") > > System.setProperty("spark.executor.memory", "92g") > > System.setProperty("spark.default.parallelism", "256") > > System.setProperty("spark.akka.frameSize", "1024") > > System.setProperty("spark.kryoserializer.buffer.mb","24") > > > > And these errors: > > > > 13/10/04 17:21:56 INFO cluster.ClusterTaskSetManager: Loss was due to > > com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: > > 6, required: 8 > > Serialization trace: > > longitude (com.hrl.issl.osi.geometry.Location) [duplicate 2] > > 13/10/04 17:21:56 INFO cluster.ClusterTaskSetManager: Starting task > > 2.0:728 as TID 1490 on executor 17: node25 (PROCESS_LOCAL) > > 13/10/04 17:21:56 INFO cluster.ClusterTaskSetManager: Serialized task > > 2.0:728 as 1892 bytes in 0 ms > > 13/10/04 17:21:58 INFO cluster.ClusterTaskSetManager: Lost TID 1486 > > (task 2.0:730) > > 13/10/04 17:21:58 INFO cluster.ClusterTaskSetManager: Loss was due to > > com.esotericsoftware.kryo.KryoException > > com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: > > 6, required: 8 > > Serialization trace: > > latitude (com.hrl.issl.osi.geometry.Location) > > at com.esotericsoftware.kryo.io.Output.require(Output.java:138) > > at com.esotericsoftware.kryo.io.Output.writeLong(Output.java:477) > > at > com.esotericsoftware.kryo.io.Output.writeDouble(Output.java:596) > > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$DoubleSerializer.write(DefaultSerializers.java:137) > > at > com.esotericsoftware.kryo.serializers.DefaultSerializers$DoubleSerializer.write(DefaultSerializers.java:131) > > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501) > > at > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:576) > > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213) > > at > com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) > > at > com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:38) > > at > com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:34) > > at > com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) > > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318) > > at > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293) > > at > com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) > > at > org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:126) > > at > org.apache.spark.scheduler.TaskResult.writeExternal(TaskResult.scala:40) > > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1429) > > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1398) > > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1158) > > at > java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:330) > > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:27) > > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:47) > > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:171) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > 13/10/04 17:21:58 ERROR cluster.ClusterTaskSetManager: Task 2.0:730 > > failed more than 4 times; aborting job > > 13/10/04 17:21:58 INFO cluster.ClusterScheduler: Remove TaskSet 2.0 from > pool >
