thanks! On Wed, Oct 29, 2014 at 2:38 PM, Kevin <[email protected]> wrote:
> You can accomplish this by using the DistributedShell application that > comes with YARN. > > If you copy all your archives to HDFS, then inside your shell script you > could copy those archives to your YARN container and then execute whatever > you want, provided all the other system dependencies exist in the container > (correct Java version, Python, C++ libraries, etc.) > > For example, > > In myscript.sh I wrote the following: > > #!/usr/bin/env bash > echo "This is my script running!" > echo "Present working directory:" > pwd > echo "Current directory listing: (nothing exciting yet)" > ls > echo "Copying file from HDFS to container" > hadoop fs -get /path/to/some/data/on/hdfs . > echo "Current directory listing: (file should not be here)" > ls > echo "Cat ExecScript.sh (this is the script created by the > DistributedShell application)" > cat ExecScript.sh > > Run the DistributedShell application with the hadoop (or yarn) command: > > hadoop org.apache.hadoop.yarn.applications.distributedshell.Client -jar > /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.3.0-cdh5.1.3.jar > -num_containers 1 -shell_script myscript.sh > > If you have the YARN log aggregation property set, then you can pipe the > container's logs to your client console using the yarn command: > > yarn logs -applicationId application_1414160538995_0035 > > (replace the application id with yours) > > Here is a quick reference that should help get you going: > > http://books.google.com/books?id=heoXAwAAQBAJ&pg=PA227&lpg=PA227&dq=hadoop+yarn+distributed+shell+application&source=bl&ots=psGuJYlY1Y&sig=khp3b3hgzsZLZWFfz7GOe2yhgyY&hl=en&sa=X&ei=0U5RVKzDLeTK8gGgoYGoDQ&ved=0CFcQ6AEwCA#v=onepage&q&f=false > > Hopefully this helps, > Kevin > > On Mon Oct 27 2014 at 2:21:18 AM Yang <[email protected]> wrote: > >> I happened to run into this interesting scenario: >> >> I had some mahout seq2sparse jobs, originally i run them in parallel >> using the distributed mode. but because the input files are so small, >> running them locally actually is much faster. so I truned them to local >> mode. >> >> but I run 10 of these jobs in parallel, so when 10 mahout jobs are run >> together, everyone became very slow. >> >> is there an existing code that takes a desired shell script, and possibly >> some archive files (could contain the jar file, or C++ --generated >> executable code). I understand that I could use yarn API to code such a >> thing, but it would be nice if I could just take it and run in shell.. >> >> Thanks >> Yang >> >
