Hi Qi Ping, You don't have to distribute these files; they are automatically packaged in the assembly jar, which is already shipped to the worker nodes.
Other people have run into the same issue. See if the instructions here are of any help: http://mail-archives.apache.org/mod_mbox/spark-user/201406.mbox/%3ccamjob8mr1+ias-sldz_rfrke_na2uubnmhrac4nukqyqnun...@mail.gmail.com%3e As described in the link, the last resort is to try building your assembly jar with JAVA_HOME set to Java 6. This usually fixes the problem (more details in the link provided). Cheers, Andrew 2014-06-10 6:35 GMT-07:00 李奇平 <qiping....@alibaba-inc.com>: > Dear all, > > When I submit a pyspark application using this command: > > ./bin/spark-submit --master yarn-client > examples/src/main/python/wordcount.py "hdfs://..." > > I get the following exception: > > Error from python worker: > Traceback (most recent call last): > File "/usr/ali/lib/python2.5/runpy.py", line 85, in run_module > loader = get_loader(mod_name) > File "/usr/ali/lib/python2.5/pkgutil.py", line 456, in get_loader > return find_loader(fullname) > File "/usr/ali/lib/python2.5/pkgutil.py", line 466, in find_loader > for importer in iter_importers(fullname): > File "/usr/ali/lib/python2.5/pkgutil.py", line 422, in iter_importers > __import__(pkg) > ImportError: No module named pyspark > PYTHONPATH was: > > /home/xxx/spark/python:/home/xxx/spark_on_yarn/python/lib/py4j-0.8.1-src.zip:/disk11/mapred/tmp/usercache/xxxx/filecache/11/spark-assembly-1.0.0-hadoop2.0.0-ydh2.0.0.jar > > Maybe `pyspark/python` and `py4j-0.8.1-src.zip` is not included in the > YARN worker, > How can I distribute these files with my application? Can I use `--pyfiles > python.zip, py4j-0.8.1-src.zip `? > Or how can I package modules in pyspark to a .egg file? > > > >