Re: Spark streaming on YARN?

Tathagata Das Thu, 09 Jan 2014 17:46:20 -0800

If you have been able to run Spark Pi to run on YARN, then you should be
able to run the streaming example
HdfsWordCount<https://github.com/apache/incubator-spark/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/HdfsWordCount.scala>
as
well. Even though the instructions in the example says to run it on local
machine, you can run the example on YARN as well in the same way as Spark
PI. You would just have to give the appropriate Spark master url and use an
HDFS directory as the 2nd parameter. Then any text file written to that
HDFS directory will get "word counted".

Note that you should write a file to that HDFS directory by moving the file
from some other directory to that directory. For example if the HDFS
directory that you want to use to run the example is
*hdfs://myhdfs:9000/mydir/* , then you can first copy a local file (say
new_file) to "*hdfs://myhdfs:9000/temp_location/new_file *" then do a move
it to "*hdfs://myhdfs:9000/mydir/new_file*".

On Thu, Jan 9, 2014 at 5:29 PM, Mike Percy <[email protected]> wrote:

> After looking through the docs, grepping the commit logs and looking on
> the list archives, I have been unable to see an indication or example of
> Spark streaming working on YARN. Is this possible yet? So far, I've gotten
> at least the Spark Pi example to run on YARN with CDH5 beta 1.
>
> I am about to dig into the code and try to figure out how the batch Yarn
> client works, to see how much work it would be to set up an AM to run an
> InputDStream, but thought I'd make it easy on myself ask here first before
> I got started.
>
> Thanks in advance for any pointers,
> Mike
>
>

Re: Spark streaming on YARN?

Reply via email to