Re: PySpark on Yarn a lot of python scripts project

Davies Liu Fri, 05 Sep 2014 09:00:57 -0700

Hi Oleg,

In order to simplify the process of package and distribute you codes,
you could deploy
an shared storage (such as NFS), and put your project in it, mount it
to all the slaves
as "/projects".


In the spark job scripts, you can access your project by put the path
into sys.path, such
as:

import sys
sys.path.append("/projects")
import myproject

Davies

On Fri, Sep 5, 2014 at 1:28 AM, Oleg Ruchovets <oruchov...@gmail.com> wrote:
> Hi ,
>    We avaluating PySpark  and successfully executed examples of PySpark on
> Yarn.
>
> Next step what we want to do:
>        We have a python project ( bunch of python script using Anaconda
> packages).
> Question:
>         What is the way to execute PySpark on Yarn having a lot of python
> files ( ~ 50)?
>        Should it be packaged in archive?
>        How the command to execute Pyspark on Yarn with a lot of files will
> looks like?
> Currently command looks like:
>
> ./bin/spark-submit --master yarn  --num-executors 3  --driver-memory 4g
> --executor-memory 2g --executor-cores 1
> examples/src/main/python/wordcount.py   1000
>
> Thanks
> Oleg.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: PySpark on Yarn a lot of python scripts project

Reply via email to