I am fairly new to python and am starting a new project that will want to
make use of Spark and the python machine learning libraries (matplotlib,
pandas, Š) . I noticed that the spark-c2 script set up my AWS cluster with
python 2.6 and 2.7

http://spark.apache.org/docs/latest/programming-guide.html#linking-with-spar
k

"Spark 1.5.1 works with Python 2.6+ or Python 3.4+. It can use the standard
CPython interpreter, so C libraries like NumPy can be used. It also works
with PyPy 2.3+²

" PySpark works with IPython 1.0.0 and later.²


I realize there are a lot of legacy python packages that are probably
vectorized and not easy to port.

What would you recommend?

I assume if I wanted to use python 3 I would need to install it on all the
works and master. And follow the direction in linking-with-spark to cause it
to use the correct version of python

(of course I realize I need to install 3rd party packages on all the works)

Kind regards

Andy




Reply via email to