Re: run arbitrary job (non-MR) on YARN ?

Yang Thu, 30 Oct 2014 15:08:46 -0700

thanks!

On Wed, Oct 29, 2014 at 2:38 PM, Kevin <[email protected]> wrote:


> You can accomplish this by using the DistributedShell application that
> comes with YARN.
>
> If you copy all your archives to HDFS, then inside your shell script you
> could copy those archives to your YARN container and then execute whatever
> you want, provided all the other system dependencies exist in the container
> (correct Java version, Python, C++ libraries, etc.)
>
> For example,
>
> In myscript.sh I wrote the following:
>
> #!/usr/bin/env bash
> echo "This is my script running!"
> echo "Present working directory:"
> pwd
> echo "Current directory listing: (nothing exciting yet)"
> ls
> echo "Copying file from HDFS to container"
> hadoop fs -get /path/to/some/data/on/hdfs .
> echo "Current directory listing: (file should not be here)"
> ls
> echo "Cat ExecScript.sh (this is the script created by the
> DistributedShell application)"
> cat ExecScript.sh
>
> Run the DistributedShell application with the hadoop (or yarn) command:
>
> hadoop org.apache.hadoop.yarn.applications.distributedshell.Client -jar
> /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.3.0-cdh5.1.3.jar
> -num_containers 1 -shell_script myscript.sh
>
> If you have the YARN log aggregation property set, then you can pipe the
> container's logs to your client console using the yarn command:
>
> yarn logs -applicationId application_1414160538995_0035
>
> (replace the application id with yours)
>
> Here is a quick reference that should help get you going:
>
> http://books.google.com/books?id=heoXAwAAQBAJ&pg=PA227&lpg=PA227&dq=hadoop+yarn+distributed+shell+application&source=bl&ots=psGuJYlY1Y&sig=khp3b3hgzsZLZWFfz7GOe2yhgyY&hl=en&sa=X&ei=0U5RVKzDLeTK8gGgoYGoDQ&ved=0CFcQ6AEwCA#v=onepage&q&f=false
>
> Hopefully this helps,
> Kevin
>
> On Mon Oct 27 2014 at 2:21:18 AM Yang <[email protected]> wrote:
>
>> I happened to run into this interesting scenario:
>>
>> I had some mahout seq2sparse jobs, originally i run them in parallel
>> using the distributed mode. but because the input files are so small,
>> running them locally actually is much faster. so I truned them to local
>> mode.
>>
>> but I run 10 of these jobs in parallel, so when 10 mahout jobs are run
>> together, everyone became very slow.
>>
>> is there an existing code that takes a desired shell script, and possibly
>> some archive files (could contain the jar file, or C++ --generated
>> executable code). I understand that I could use yarn API to code such a
>> thing, but it would be nice if I could just take it and run in shell..
>>
>> Thanks
>> Yang
>>
>

Re: run arbitrary job (non-MR) on YARN ?

Reply via email to