By SparkContext.addPyFile("xx.zip"), the xx.zip will be copies to all the workers and stored in temporary directory, the path to xx.zip will be in the sys.path on worker machines, so you can "import xx" in your jobs, it does not need to be installed on worker machines.
PS: the package or module should be in the top level in xx.zip, or it cannot be imported. such as : daviesliu@dm:~/work/tmp$ zipinfo textblob.zip Archive: textblob.zip 3245946 bytes 517 files drwxr-xr-x 3.0 unx 0 bx stor 12-Sep-14 10:10 textblob/ -rw-r--r-- 3.0 unx 203 tx defN 12-Sep-14 10:10 textblob/__init__.py -rw-r--r-- 3.0 unx 563 bx defN 12-Sep-14 10:10 textblob/__init__.pyc -rw-r--r-- 3.0 unx 61510 tx defN 12-Sep-14 10:10 textblob/_text.py -rw-r--r-- 3.0 unx 68316 bx defN 12-Sep-14 10:10 textblob/_text.pyc -rw-r--r-- 3.0 unx 2962 tx defN 12-Sep-14 10:10 textblob/base.py -rw-r--r-- 3.0 unx 5501 bx defN 12-Sep-14 10:10 textblob/base.pyc -rw-r--r-- 3.0 unx 27621 tx defN 12-Sep-14 10:10 textblob/blob.py you can get this textblob.zip by: pip install textblob cd /xxx/xx/site-package/ zip -r path_to_store/textblob.zip textblob Davies On Fri, Sep 12, 2014 at 1:39 AM, yh18190 <yh18...@gmail.com> wrote: > Hi all, > > I am currently working on pyspark for NLP processing etc.I am using TextBlob > python library.Normally in a standalone mode it easy to install the external > python libraries .In case of cluster mode I am facing problem to install > these libraries on worker nodes remotely.I cannot access each and every > worker machine to install these libs in python path.I tried to use > Sparkcontext pyfiles option to ship .zip files..But the problem is these > python packages needs to be get installed on worker machines.Could anyone > let me know wat are different ways of doing it so that this lib-Textblob > could be available in python path. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-ship-external-Python-libraries-in-PYSPARK-tp14074.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org