PySpark doesn't include a Python interpreter; by default, it will use your
system `python`.  The pyspark script (
https://github.com/apache/incubator-spark/blob/master/pyspark) just
performs some setup of environment variables, adds the PySpark python
dependencies to PYTHONPATH, and adds some code to initialize a SparkContext
in the Python REPL.

I suppose we could split the Python classes into a proper Python package
that could be installed with easy_install / pip and that assumes that
SPARK_HOME contains the right JARs.  To avoid weird bugs from using
incompatible versions of the Python pyspark package and the Java classes,
we'd probably need to add some mechanism to detect version mismatches when
connecting to the cluster.  We'd still support the ./pyspark script that
uses the bundled dependencies, too.  This is probably as simple as creating
a setup.py file in $SPARK_HOME/python.  Python packaging experts: please
feel free to submit pull requests for this!


On Tue, Nov 19, 2013 at 11:08 AM, Michal Romaniuk <
[email protected]> wrote:

> Hi,
>
> I would like to use Spark to distribute some computations that rely on
> my existing Python installation. I know that Spark includes its own
> Python but it would be much easier to just install a package and perhaps
> do a bit of configuration.
>
> Thanks,
> Michal
>
>

Reply via email to