Hi Qi Ping,

You don't have to distribute these files; they are automatically packaged
in the assembly jar, which is already shipped to the worker nodes.

Other people have run into the same issue. See if the instructions here are
of any help:
http://mail-archives.apache.org/mod_mbox/spark-user/201406.mbox/%3ccamjob8mr1+ias-sldz_rfrke_na2uubnmhrac4nukqyqnun...@mail.gmail.com%3e

As described in the link, the last resort is to try building your assembly
jar with JAVA_HOME set to Java 6. This usually fixes the problem (more
details in the link provided).

Cheers,
Andrew


2014-06-10 6:35 GMT-07:00 李奇平 <qiping....@alibaba-inc.com>:

> Dear all,
>
> When I submit a pyspark application using this command:
>
> ./bin/spark-submit --master yarn-client
> examples/src/main/python/wordcount.py "hdfs://..."
>
> I get the following exception:
>
> Error from python worker:
> Traceback (most recent call last):
> File "/usr/ali/lib/python2.5/runpy.py", line 85, in run_module
> loader = get_loader(mod_name)
> File "/usr/ali/lib/python2.5/pkgutil.py", line 456, in get_loader
> return find_loader(fullname)
> File "/usr/ali/lib/python2.5/pkgutil.py", line 466, in find_loader
> for importer in iter_importers(fullname):
> File "/usr/ali/lib/python2.5/pkgutil.py", line 422, in iter_importers
> __import__(pkg)
> ImportError: No module named pyspark
> PYTHONPATH was:
>
> /home/xxx/spark/python:/home/xxx/spark_on_yarn/python/lib/py4j-0.8.1-src.zip:/disk11/mapred/tmp/usercache/xxxx/filecache/11/spark-assembly-1.0.0-hadoop2.0.0-ydh2.0.0.jar
>
> Maybe `pyspark/python` and `py4j-0.8.1-src.zip` is not included in the
> YARN worker,
> How can I distribute these files with my application? Can I use `--pyfiles
> python.zip, py4j-0.8.1-src.zip `?
> Or how can I package modules in pyspark to a .egg file?
>
>
>
>

Reply via email to