Thanks Ted for the input. I was able to get it working with pyspark shell
but the same job submitted via 'spark-submit' using client or cluster
deploy mode ends up with these errors:

~~~~~
java.lang.OutOfMemoryError: Java heap space
at java.lang.Object.clone(Native Method)
at akka.util.CompactByteString$.apply(ByteString.scala:410)
at akka.util.ByteString$.apply(ByteString.scala:22)
at
akka.remote.transport.netty.TcpHandlers$class.onMessage(TcpSupport.scala:45)
at
akka.remote.transport.netty.TcpServerHandler.onMessage(TcpSupport.scala:57)
at
akka.remote.transport.netty.NettyServerHelpers$class.messageReceived(NettyHelpers.scala:43)
at
akka.remote.transport.netty.ServerHandler.messageReceived(NettyTransport.scala:179)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
ERROR Utils: Uncaught exception in thread task-result-getter-3
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2219)
at java.util.ArrayList.grow(ArrayList.java:242)
at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208)
at java.util.ArrayList.add(ArrayList.java:440)
at
com.esotericsoftware.kryo.util.MapReferenceResolver.nextReadId(MapReferenceResolver.java:33)
at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:766)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:727)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:275)
at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:97)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:60)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
~~~~

When using pyspark, I'm passing the options via cmd line:

pyspark --master yarn --deploy-mode client --driver-memory 6g
--executor-memory 8g --executor-cores 4

while I'm doing the same via conf.set() in the script that I'm using with
spark-submit.

I do notice the options being picked up correctly under the Environment tab
but its not clear as to why pyspark is successful while spark-submit fails.
Is there any difference in how these two ways to run the job?

Thanks!


On Sun, Apr 10, 2016 at 4:28 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Looks like the exception occurred on driver.
>
> Consider increasing the values for the following config:
>
> conf.set("spark.driver.memory", "10240m")
> conf.set("spark.driver.maxResultSize", "2g")
>
> Cheers
>
> On Sat, Apr 9, 2016 at 9:02 PM, Buntu Dev <buntu...@gmail.com> wrote:
>
>> I'm running it via pyspark against yarn in client deploy mode. I do
>> notice in the spark web ui under Environment tab all the options I've set,
>> so I'm guessing these are accepted.
>>
>> On Sat, Apr 9, 2016 at 5:52 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>>> Hi,
>>>
>>> (I haven't played with GraphFrames)
>>>
>>> What's your `sc.master`? How do you run your application --
>>> spark-submit or java -jar or sbt run or...? The reason I'm asking is
>>> that few options might not be in use whatsoever, e.g.
>>> spark.driver.memory and spark.executor.memory in local mode.
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>>
>>> On Sat, Apr 9, 2016 at 7:51 PM, Buntu Dev <buntu...@gmail.com> wrote:
>>> > I'm running this motif pattern against 1.5M vertices (5.5mb) and 10M
>>> (60mb)
>>> > edges:
>>> >
>>> >  tgraph.find("(a)-[]->(b); (c)-[]->(b); (c)-[]->(d)")
>>> >
>>> > I keep running into Java heap space errors:
>>> >
>>> > ~~~~~
>>> >
>>> > ERROR actor.ActorSystemImpl: Uncaught fatal error from thread
>>> > [sparkDriver-akka.actor.default-dispatcher-33] shutting down
>>> ActorSystem
>>> > [sparkDriver]
>>> > java.lang.OutOfMemoryError: Java heap space
>>> > at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:90)
>>> > at scala.reflect.ManifestFactory$$anon$6.newArray(Manifest.scala:88)
>>> > at scala.Array$.ofDim(Array.scala:218)
>>> > at akka.util.ByteIterator.toArray(ByteIterator.scala:462)
>>> > at akka.util.ByteString.toArray(ByteString.scala:321)
>>> > at
>>> >
>>> akka.remote.transport.AkkaPduProtobufCodec$.decodePdu(AkkaPduCodec.scala:168)
>>> > at
>>> >
>>> akka.remote.transport.ProtocolStateActor.akka$remote$transport$ProtocolStateActor$$decodePdu(AkkaProtocolTransport.scala:513)
>>> > at
>>> >
>>> akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:357)
>>> > at
>>> >
>>> akka.remote.transport.ProtocolStateActor$$anonfun$5.applyOrElse(AkkaProtocolTransport.scala:352)
>>> > at
>>> >
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
>>> > at akka.actor.FSM$class.processEvent(FSM.scala:595)
>>> > at
>>> >
>>> akka.remote.transport.ProtocolStateActor.processEvent(AkkaProtocolTransport.scala:220)
>>> > at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:589)
>>> > at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:583)
>>> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>> > at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>> > at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>> > at
>>> >
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>> > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> > at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> > at
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> > at
>>> >
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> >
>>> > ~~~~~
>>> >
>>> >
>>> > Here is my config:
>>> >
>>> > conf.set("spark.executor.memory", "8192m")
>>> > conf.set("spark.executor.cores", 4)
>>> > conf.set("spark.driver.memory", "10240m")
>>> > conf.set("spark.driver.maxResultSize", "2g")
>>> > conf.set("spark.kryoserializer.buffer.max", "1024mb")
>>> >
>>> >
>>> > Wanted to know if there are any other configs to tweak?
>>> >
>>> >
>>> > Thanks!
>>>
>>
>>
>

Reply via email to