Re: Spark streaming on YARN?

Tom Graves Fri, 10 Jan 2014 06:19:50 -0800

Try the yarn-client mode, the yarn-client mode is much more like running on a 
standalone spark cluster. I successfully got the NetworkWordCount example to 
work with it.  We haven't tried making spark streaming run under 
yarn-standalone mode. When I had tried it a month or so it didn't work though.


Tom



On Friday, January 10, 2014 7:24 AM, Mike Percy <[email protected]> wrote:
 
OK I got a bit farther but still no dice. I hacked up the HdfsWordCount.scala 
file to add a bunch of print statements output to a file basically. Apologies 
for the messy code, like I said I'm new to Scala. The file is here:

https://gist.github.com/mpercy/8351573


Apparently it's executing and getting somewhat far into the script, so I 
believe I'm actually passing the args correctly. This is the output I am 
getting from the above in /tmp/thing.txt (great name, I know):

hello world
still here: yarn-standalone, hdfs:///user/systest/hdfswordcount-test2
still here again ok: org.apache.spark.streaming.StreamingContext@7a237188
created stream: org.apache.spark.streaming.dstream.MappedDStream@6f4b06be
printing word counts: 
org.apache.spark.streaming.dstream.ShuffledDStream@5442e1da

So it looks like it hangs or gets killed when executing wordCounts.print() ... 
but that's all the info I've been able to glean so far.

I'm not sure I am catching the Throwable properly, if there is one (trying to 
get a stack trace). I wonder if it's getting kill -9ed by somebody... but I'd 
only expect that from YARN if there was an OutOfMemoryError and if that 
happened I think I should catch the OOME via my try block and be able to print, 
unless it threw again inside my catch maybe.

Any suggestions on where to go from here?

Thanks,
Mike



On Fri, Jan 10, 2014 at 3:04 AM, Mike Percy <[email protected]> wrote:

Thanks Tathagata. That helped a lot, but I am having some trouble under YARN 
with the HdfsWordCount example.
>
>
>I was able to get the example to work locally, and was also able to submit the 
>job to the YARN cluster, but it looks like it is crashing under YARN. The 
>streaming job stops after about 30 seconds, right after it runs, and before 
>I'm able to put anything new into the input directory. This is the command I 
>am running on the command line:
>
>
>export HADOOP_CONF_DIR=/etc/hadoop/conf
>SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar
> ./spark-class org.apache.spark.deploy.yarn.Client --jar 
>examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar 
>--class org.apache.spark.streaming.examples.HdfsWordCount --args 
>yarn-standalone --args hdfs:///user/mpercy/hdfswordcount-test2 --num-workers 3 
>--master-memory 4g --worker-memory 2g --worker-cores 1
>
>
>This is the kind of output I am getting in the YARN NodeManager log file:
>
>
>2014-01-09 20:13:29,249 INFO 
>org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to driver: 
>akka://spark@sparktest-01:58117/user/CoarseGrainedScheduler
>2014-01-09 20:13:29,358 INFO 
>org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 
>status for container: container_id { app_attempt_id { application_id { id: 8 
>cluster_timestamp: 1389304540039 } attemptId: 1 } id: 4 } state: C_RUNNING 
>diagnostics: "" exit_status: -1000
>2014-01-09 20:13:29,476 ERROR 
>org.apache.spark.executor.CoarseGrainedExecutorBackend: Driver terminated or 
>disconnected! Shutting down.
>2014-01-09 20:13:29,825 WARN 
>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
>from container container_1389304540039_0008_01_000004 is : 1
>2014-01-09 20:13:29,825 WARN 
>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception 
>from container-launch with container ID: 
>container_1389304540039_0008_01_000004 and exit code: 1
>org.apache.hadoop.util.Shell$ExitCodeException: 
>        at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>        at org.apache.hadoop.util.Shell.run(Shell.java:379)
>        at 
>org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>        at 
>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>        at 
>org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>        at 
>org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>        at 
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at 
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:724)
>2014-01-09 20:13:29,825 INFO 
>org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 
>2014-01-09 20:13:29,825 WARN 
>org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Container exited with a non-zero exit code 1
>2014-01-09 20:13:29,826 INFO 
>org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1389304540039_0008_01_000004 transitioned from RUNNING to 
>EXITED_WITH_FAILURE
>
>
>While it was difficult to get the logs from YARN before it deleted them during 
>job cleanup, I finally did and all I got was this from stderr (stdout file was 
>empty):
>
>
>SLF4J: Class path contains multiple SLF4J bindings.
>SLF4J: Found binding in 
>[jar:file:/media/ephemeral0/yarn/nm/usercache/mpercy/filecache/53/spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>SLF4J: Found binding in 
>[jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>SLF4J: Found binding in 
>[jar:file:/media/ephemeral0/yarn/nm/usercache/mpercy/filecache/52/spark-examples-assembly-0.8.1-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
>explanation.
>SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>
>
>Not super useful AFAICT, since I'm pretty sure SLF4J will pick the first 
>binding so I doubt that was the cause of the crash. Any suggestions on how to 
>proceed?
>
>
>One guess is that I am passing the args wrong. I'm new to Scala so I'm not 
>sure whether I'm reading the ClientArguments code right, but based on the 
>comments in one of the files I think passing --args multiple times is the 
>right way to do it.
>
>
>And just for good measure, this is what is being executed by YARN's 
>launch_container.sh script:
>
>
>exec /bin/bash -c "$JAVA_HOME/bin/java -server -Xmx4096m 
>-Djava.io.tmpdir=$PWD/tmp org.apache.spark.deploy.yarn.ApplicationMaster 
>--class org.apache.spark.streaming.examples.HdfsWordCount --jar 
>examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar 
>--args  'yarn-standalone'  --args  'hdfs:///user/mpercy/hdfswordcount-test2'  
>--worker-memory 2048 --worker-cores 1 --num-workers 3 1> 
>/var/log/hadoop-yarn/container/application_1389304540039_0052/container_1389304540039_0052_01_000001/stdout
> 2> 
>/var/log/hadoop-yarn/container/application_1389304540039_0052/container_1389304540039_0052_01_000001/stderr
>
>
>
>Would love to hear any suggestions for how to debug this further!
>
>
>Thanks,
>Mike
>
>
>
>
>
>
>On Thu, Jan 9, 2014 at 5:44 PM, Tathagata Das <[email protected]> 
>wrote:
>
>If you have been able to run Spark Pi to run on YARN, then you should be able 
>to run the streaming example HdfsWordCount as well. Even though the 
>instructions in the example says to run it on local machine, you can run the 
>example on YARN as well in the same way as Spark PI. You would just have to 
>give the appropriate Spark master url and use an HDFS directory as the 2nd 
>parameter. Then any text file written to that HDFS directory will get "word 
>counted". 
>>
>>
>>Note that you should write a file to that HDFS directory by moving the file 
>>from some other directory to that directory. For example if the HDFS 
>>directory that you want to use to run the example is 
>>hdfs://myhdfs:9000/mydir/ , then you can first copy a local file (say 
>>new_file) to "hdfs://myhdfs:9000/temp_location/new_file " then do a move it 
>>to "hdfs://myhdfs:9000/mydir/new_file". 
>>
>>
>>
>>
>>
>>
>>
>>On Thu, Jan 9, 2014 at 5:29 PM, Mike Percy <[email protected]> wrote:
>>
>>After looking through the docs, grepping the commit logs and looking on the 
>>list archives, I have been unable to see an indication or example of Spark 
>>streaming working on YARN. Is this possible yet? So far, I've gotten at least 
>>the Spark Pi example to run on YARN with CDH5 beta 1.
>>>
>>>
>>>I am about to dig into the code and try to figure out how the batch Yarn 
>>>client works, to see how much work it would be to set up an AM to run an 
>>>InputDStream, but thought I'd make it easy on myself ask here first before I 
>>>got started.
>>>
>>>
>>>Thanks in advance for any pointers,
>>>Mike
>>>
>>>
>>>
>>
>

Re: Spark streaming on YARN?

Reply via email to