By SparkContext.addPyFile("xx.zip"), the xx.zip will be copies to all
the workers
and stored in temporary directory, the path to xx.zip will be in the sys.path on
worker machines, so you can "import xx" in your jobs, it does not need to be
installed on worker machines.

PS: the package or module should be in the top level in xx.zip, or it cannot
be imported. such as :

daviesliu@dm:~/work/tmp$ zipinfo textblob.zip
Archive:  textblob.zip   3245946 bytes   517 files
drwxr-xr-x  3.0 unx        0 bx stor 12-Sep-14 10:10 textblob/
-rw-r--r--  3.0 unx      203 tx defN 12-Sep-14 10:10 textblob/__init__.py
-rw-r--r--  3.0 unx      563 bx defN 12-Sep-14 10:10 textblob/__init__.pyc
-rw-r--r--  3.0 unx    61510 tx defN 12-Sep-14 10:10 textblob/_text.py
-rw-r--r--  3.0 unx    68316 bx defN 12-Sep-14 10:10 textblob/_text.pyc
-rw-r--r--  3.0 unx     2962 tx defN 12-Sep-14 10:10 textblob/base.py
-rw-r--r--  3.0 unx     5501 bx defN 12-Sep-14 10:10 textblob/base.pyc
-rw-r--r--  3.0 unx    27621 tx defN 12-Sep-14 10:10 textblob/blob.py

you can get this textblob.zip by:

pip install textblob
cd /xxx/xx/site-package/
zip -r path_to_store/textblob.zip textblob

Davies


On Fri, Sep 12, 2014 at 1:39 AM, yh18190 <yh18...@gmail.com> wrote:
> Hi all,
>
> I am currently working on pyspark for NLP processing etc.I am using TextBlob
> python library.Normally in a standalone mode it easy to install the external
> python libraries .In case of cluster mode I am facing problem to install
> these libraries on worker nodes remotely.I cannot access each and every
> worker machine to install these libs in python path.I tried to use
> Sparkcontext pyfiles option to ship .zip files..But the problem is  these
> python packages needs to be get installed on worker machines.Could anyone
> let me know wat are different ways of doing it so that this lib-Textblob
> could be available in python path.
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-ship-external-Python-libraries-in-PYSPARK-tp14074.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to