Thanks Ashish, nice blog but does not cover my issue. Actually I have pycharm running and loading pyspark and rest of libraries perfectly fine. My issue is that I am not sure what is triggering
Error from python worker: /cube/PY/Python27/bin/python: No module named pyspark pyspark PYTHONPATH was: /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/18/spark-assembly-1. 4.1-hadoop2.6.0.jar Question is why is yarn not getting python package to run on the single node via YARN? Some people are saying run with JAVA 6 due to zip library changes between 6/7/8, some identified bug w RH, i am on debian, then some documentation errors but nothing is really clear. i have binaries for spark hadoop and i did just fine with spark sql module, hive, python, pandas ad yarn. Locally as i said app is working fine (pandas to spark df to parquet) But as soon as I move to yarn client mode yarn is not getting packages required to run app. If someone confirms that I need to build everything from source with specific version of software I will do that, but at this point I am not sure what to do to remedy this situation... --sasha On Sun, Sep 6, 2015 at 8:27 PM, Ashish Dutt <ashish.du...@gmail.com> wrote: > Hi Aleksandar, > Quite some time ago, I faced the same problem and I found a solution which > I have posted here on my blog > <https://edumine.wordpress.com/category/apache-spark/>. > See if that can help you and if it does not then you can check out these > questions & solution on stackoverflow > <http://stackoverflow.com/search?q=no+module+named+pyspark> website > > > Sincerely, > Ashish Dutt > > > On Mon, Sep 7, 2015 at 7:17 AM, Sasha Kacanski <skacan...@gmail.com> > wrote: > >> Hi, >> I am successfully running python app via pyCharm in local mode >> setMaster("local[*]") >> >> When I turn on SparkConf().setMaster("yarn-client") >> >> and run via >> >> park-submit PysparkPandas.py >> >> >> I run into issue: >> Error from python worker: >> /cube/PY/Python27/bin/python: No module named pyspark >> PYTHONPATH was: >> >> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/18/spark-assembly-1.4.1-hadoop2.6.0.jar >> >> I am running java >> hadoop@pluto:~/pySpark$ /opt/java/jdk/bin/java -version >> java version "1.8.0_31" >> Java(TM) SE Runtime Environment (build 1.8.0_31-b13) >> Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode) >> >> Should I try same thing with java 6/7 >> >> Is this packaging issue or I have something wrong with configurations ... >> >> Regards, >> >> -- >> Aleksandar Kacanski >> > > -- Aleksandar Kacanski