Hi Marcelo, Quick Question. I am using Spark 1.3 and using Yarn Client mode. It is working well, provided I have to manually pip-install all the 3rd party libraries like numpy etc to the executor nodes.
So the SPARK-5479 fix in 1.5 which you mentioned fix this as well? Thanks. On Thu, Jun 25, 2015 at 2:22 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > That sounds like SPARK-5479 which is not in 1.4... > > On Thu, Jun 25, 2015 at 12:17 PM, Elkhan Dadashov <elkhan8...@gmail.com> > wrote: > >> In addition to previous emails, when i try to execute this command from >> command line: >> >> ./bin/spark-submit --verbose --master yarn-cluster --py-files >> mypython/libs/numpy-1.9.2.zip --deploy-mode cluster >> mypython/scripts/kmeans.py /kmeans_data.txt 5 1.0 >> >> >> - numpy-1.9.2.zip - is downloaded numpy package >> - kmeans.py is default example which comes with Spark 1.4 >> - kmeans_data.txt - is default data file which comes with Spark 1.4 >> >> >> It fails saying that it could not find numpy: >> >> File "kmeans.py", line 31, in <module> >> import numpy >> ImportError: No module named numpy >> >> Has anyone run Python Spark application on Yarn-cluster mode ? (which has >> 3rd party Python modules to be shipped with) >> >> What are the configurations or installations to be done before running >> Python Spark job with 3rd party dependencies on Yarn-cluster ? >> >> Thanks in advance. >> >> > -- > Marcelo >