Hello,

2016-02-16 11:03 GMT+01:00 Mohannad Ali <man...@gmail.com>:

> Hello Everyone,
>
> I have code inside my project organized in packages and modules, however I
> keep getting the error "ImportError: No module named <package.module>" when
> I run spark on YARN.
>
> My directory structure is something like this:
>
> project/
>      package/
>          module.py
>          __init__.py
>      bin/
>      docs/
>      setup.py
>      main_script.py
>      requirements.txt
>      tests/
>           package/
>                module_test.py
>                __init__.py
>          __init__.py
>
>
> So when I pass `main_script.py` to spark-submit with master set to
> "yarn-client", the packages aren't found and I get the error above.
>
> With a code structure like this adding everything as pyfile to the spark
> context seems counter intuitive.
>
> I just want to organize my code as much as possible to make it more
> readable and maintainable. Is there a better way to achieve good code
> organization without running into such problems?
>

According to the docs[1], you should be able to zip your "project/" (or
"package/"?) directory and pass the zip-file to spark-submit via --py-files

Best,
Eike

[1]
 --py-files PY_FILES         Comma-separated list of .zip, .egg, or .py
files to place
                              on the PYTHONPATH for Python apps.

Reply via email to