How pySpark works?

2014-07-11 Thread Egor Pahomov
Hi, I want to use pySpark, but can't understand how it works. Documentation doesn't provide enough information. 1) How python shipped to cluster? Should machines in cluster already have python? 2) What happens when I write some python code in map function - is it shipped to cluster and just

Re: How pySpark works?

2014-07-11 Thread Andrew Or
Hi Egor, Here are a few answers to your questions: 1) Python needs to be installed on all machines, but not pyspark. The way the executors get the pyspark code depends on which cluster manager you use. In standalone mode, your executors need to have the actual python files in their working

Re: How pySpark works?

2014-07-11 Thread Reynold Xin
Also take a look at this: https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals On Fri, Jul 11, 2014 at 10:29 AM, Andrew Or and...@databricks.com wrote: Hi Egor, Here are a few answers to your questions: 1) Python needs to be installed on all machines, but not pyspark. The