subject:"PySpark on Yarn a lot of python scripts project"

Re: PySpark on Yarn a lot of python scripts project

2014-09-05 Thread Marcelo Vanzin

Hi Davies, On Fri, Sep 5, 2014 at 1:04 PM, Davies Liu wrote: > In Douban, we use Moose FS[1] instead of HDFS as the distributed file system, > it's POSIX compatible and can be mounted just as NFS. Sure, if you already have the infrastructure in place, it might be worthwhile to use it. After all,

Re: PySpark on Yarn a lot of python scripts project

2014-09-05 Thread Davies Liu

Here is a store about how shared storage simplify all the things: In Douban, we use Moose FS[1] instead of HDFS as the distributed file system, it's POSIX compatible and can be mounted just as NFS. We put all the data and tools and code in it, so we can access them easily on all the machines, jus

Re: PySpark on Yarn a lot of python scripts project

2014-09-05 Thread Marcelo Vanzin

On Fri, Sep 5, 2014 at 10:50 AM, Davies Liu wrote: > In daily development, it's common to modify your projects and re-run > the jobs. If using zip or egg to package your code, you need to do > this every time after modification, I think it will be boring. That's why shell scripts were invented. :

Re: PySpark on Yarn a lot of python scripts project

2014-09-05 Thread Andrew Or

Hi Oleg, We do support serving python files in zips. If you use --py-files, you can provide a comma delimited list of zips instead of python files. This will allow you to automatically add these files to the python path on the executors without you having to manually copy them to every single slav

Re: PySpark on Yarn a lot of python scripts project

2014-09-05 Thread Davies Liu

On Fri, Sep 5, 2014 at 10:21 AM, Oleg Ruchovets wrote: > Ok , I didn't explain my self correct: >In case of java having a lot of classes jar should be used. >All examples for PySpark I found is one py script( Pi , wordcount ...) , > but in real environment analytics has more then one py f

Re: PySpark on Yarn a lot of python scripts project

2014-09-05 Thread Oleg Ruchovets

Ok , I didn't explain my self correct: In case of java having a lot of classes jar should be used. All examples for PySpark I found is one py script( Pi , wordcount ...) , but in real environment analytics has more then one py file. My question is how to use PySpark on Yarn analytics in c

Re: PySpark on Yarn a lot of python scripts project

2014-09-05 Thread Dimension Data, LLC.

Hi: Curious... is there any reason not to use one of the below pyspark options (in red)? Assuming each file is, say 10k in size, is 50 files too much? Does that touch on some practical limitation? Usage: ./bin/pyspark [options] Options: --master MASTER_URL spark://host:port, mesos://h

Re: PySpark on Yarn a lot of python scripts project

2014-09-05 Thread Davies Liu

Hi Oleg, In order to simplify the process of package and distribute you codes, you could deploy an shared storage (such as NFS), and put your project in it, mount it to all the slaves as "/projects". In the spark job scripts, you can access your project by put the path into sys.path, such as: im

PySpark on Yarn a lot of python scripts project

2014-09-05 Thread Oleg Ruchovets

Hi , We avaluating PySpark and successfully executed examples of PySpark on Yarn. Next step what we want to do: We have a python project ( bunch of python script using Anaconda packages). Question: What is the way to execute PySpark on Yarn having a lot of python files ( ~ 50)?

Re: PySpark on Yarn a lot of python scripts project

Re: PySpark on Yarn a lot of python scripts project

Re: PySpark on Yarn a lot of python scripts project

Re: PySpark on Yarn a lot of python scripts project

Re: PySpark on Yarn a lot of python scripts project

Re: PySpark on Yarn a lot of python scripts project

Re: PySpark on Yarn a lot of python scripts project

Re: PySpark on Yarn a lot of python scripts project

PySpark on Yarn a lot of python scripts project

9 matches

Site Navigation

Mail list logo

Footer information