Hi Oleg, In order to simplify the process of package and distribute you codes, you could deploy an shared storage (such as NFS), and put your project in it, mount it to all the slaves as "/projects".
In the spark job scripts, you can access your project by put the path into sys.path, such as: import sys sys.path.append("/projects") import myproject Davies On Fri, Sep 5, 2014 at 1:28 AM, Oleg Ruchovets <oruchov...@gmail.com> wrote: > Hi , > We avaluating PySpark and successfully executed examples of PySpark on > Yarn. > > Next step what we want to do: > We have a python project ( bunch of python script using Anaconda > packages). > Question: > What is the way to execute PySpark on Yarn having a lot of python > files ( ~ 50)? > Should it be packaged in archive? > How the command to execute Pyspark on Yarn with a lot of files will > looks like? > Currently command looks like: > > ./bin/spark-submit --master yarn --num-executors 3 --driver-memory 4g > --executor-memory 2g --executor-cores 1 > examples/src/main/python/wordcount.py 1000 > > Thanks > Oleg. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org