com.google.protobuf out of memory

2014-05-25 Thread Zuhair Khayyat
Dear all,

I am getting a OutOfMemoryError in class ByteString.java from package
com.google.protobuf when processing very large data using spark 0.9. Does
increasing spark.shuffle.memoryFraction helps or I should add more memory
to my workers? Below the error I get during execution.

14/05/25 07:26:05 INFO MapOutputTrackerMasterActor: Asked to send map
output locations for shuffle 0 to spark@cloud21:47985

14/05/25 07:26:05 INFO MapOutputTrackerMasterActor: Asked to send map
output locations for shuffle 0 to spark@cloud5:46977

14/05/25 07:26:05 INFO MapOutputTrackerMasterActor: Asked to send map
output locations for shuffle 0 to spark@cloud14:51948

14/05/25 07:26:05 INFO MapOutputTrackerMasterActor: Asked to send map
output locations for shuffle 0 to spark@cloud12:45368

14/05/25 07:26:05 INFO MapOutputTrackerMasterActor: Asked to send map
output locations for shuffle 0 to spark@cloud9:50926

14/05/25 07:26:05 INFO MapOutputTrackerMasterActor: Asked to send map
output locations for shuffle 0 to spark@cloud10:50690

14/05/25 07:26:12 ERROR ActorSystemImpl: Uncaught fatal error from thread
[spark-akka.actor.default-dispatcher-5] shutting down ActorSystem [spark]

java.lang.OutOfMemoryError: Java heap space

at com.google.protobuf_spark.ByteString.copyFrom(ByteString.java:90)

at com.google.protobuf_spark.ByteString.copyFrom(ByteString.java:99)

at
akka.remote.MessageSerializer$.serialize(MessageSerializer.scala:36)

at
akka.remote.EndpointWriter$$anonfun$akka$remote$EndpointWriter$$serializeMessage$1.apply(Endpoint.scala:672)

at
akka.remote.EndpointWriter$$anonfun$akka$remote$EndpointWriter$$serializeMessage$1.apply(Endpoint.scala:672)

at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)

at
akka.remote.EndpointWriter.akka$remote$EndpointWriter$$serializeMessage(Endpoint.scala:671)

at
akka.remote.EndpointWriter$$anonfun$7.applyOrElse(Endpoint.scala:559)

at
akka.remote.EndpointWriter$$anonfun$7.applyOrElse(Endpoint.scala:544)

at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)

at akka.actor.FSM$class.processEvent(FSM.scala:595)

at akka.remote.EndpointWriter.processEvent(Endpoint.scala:443)

at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:589)

at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:583)

at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)

at akka.actor.ActorCell.invoke(ActorCell.scala:456)

at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)

at akka.dispatch.Mailbox.run(Mailbox.scala:219)

at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:385)

at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Thank you,


Re: com.google.protobuf out of memory

2014-05-25 Thread Hao Wang
Hi, Zuhair

According to my experience, you could try following steps to avoid Spark
OOM:

   1. Increase JVM memory by adding export SPARK_JAVA_OPTS=-Xmx2g
   2. Use .persist(storage.StorageLevel.MEMORY_AND_DISK) instead of .cache()
   3. Have you set spark.executor.memory value? It's 512m by default.
   4. Add more memory to the workers.

I haven't tried to modify the spark.shuffle.memoryFraction value. But it is
said it's a threshold for contents being spilled to disk or not. I think
you may decrease this value to mitigate pressure on memory.

Regards,
Wang Hao(王灏)

CloudTeam | School of Software Engineering
Shanghai Jiao Tong University
Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
Email:wh.s...@gmail.com


On Sun, May 25, 2014 at 4:15 PM, Zuhair Khayyat zuhair.khay...@gmail.comwrote:

 Dear all,

 I am getting a OutOfMemoryError in class ByteString.java from package
 com.google.protobuf when processing very large data using spark 0.9. Does
 increasing spark.shuffle.memoryFraction helps or I should add more memory
 to my workers? Below the error I get during execution.

 14/05/25 07:26:05 INFO MapOutputTrackerMasterActor: Asked to send map
 output locations for shuffle 0 to spark@cloud21:47985

 14/05/25 07:26:05 INFO MapOutputTrackerMasterActor: Asked to send map
 output locations for shuffle 0 to spark@cloud5:46977

 14/05/25 07:26:05 INFO MapOutputTrackerMasterActor: Asked to send map
 output locations for shuffle 0 to spark@cloud14:51948

 14/05/25 07:26:05 INFO MapOutputTrackerMasterActor: Asked to send map
 output locations for shuffle 0 to spark@cloud12:45368

 14/05/25 07:26:05 INFO MapOutputTrackerMasterActor: Asked to send map
 output locations for shuffle 0 to spark@cloud9:50926

 14/05/25 07:26:05 INFO MapOutputTrackerMasterActor: Asked to send map
 output locations for shuffle 0 to spark@cloud10:50690

 14/05/25 07:26:12 ERROR ActorSystemImpl: Uncaught fatal error from thread
 [spark-akka.actor.default-dispatcher-5] shutting down ActorSystem [spark]

 java.lang.OutOfMemoryError: Java heap space

 at
 com.google.protobuf_spark.ByteString.copyFrom(ByteString.java:90)

 at
 com.google.protobuf_spark.ByteString.copyFrom(ByteString.java:99)

 at
 akka.remote.MessageSerializer$.serialize(MessageSerializer.scala:36)

 at
 akka.remote.EndpointWriter$$anonfun$akka$remote$EndpointWriter$$serializeMessage$1.apply(Endpoint.scala:672)

 at
 akka.remote.EndpointWriter$$anonfun$akka$remote$EndpointWriter$$serializeMessage$1.apply(Endpoint.scala:672)

 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)

 at
 akka.remote.EndpointWriter.akka$remote$EndpointWriter$$serializeMessage(Endpoint.scala:671)

 at
 akka.remote.EndpointWriter$$anonfun$7.applyOrElse(Endpoint.scala:559)

 at
 akka.remote.EndpointWriter$$anonfun$7.applyOrElse(Endpoint.scala:544)

 at
 scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)

 at akka.actor.FSM$class.processEvent(FSM.scala:595)

 at akka.remote.EndpointWriter.processEvent(Endpoint.scala:443)

 at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:589)

 at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:583)

 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)

 at akka.actor.ActorCell.invoke(ActorCell.scala:456)

 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)

 at akka.dispatch.Mailbox.run(Mailbox.scala:219)

 at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:385)

 at
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

 at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

 at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

 at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


 Thank you,