Re: hadoop2.6.0 + spark1.4.1 + python2.7.10

Sasha Kacanski Mon, 07 Sep 2015 08:05:10 -0700

Thanks Ashish,
nice blog but does not cover my issue. Actually I have pycharm running and
loading pyspark and rest of libraries perfectly fine.
My issue is that I am not sure what is triggering

Error from python worker:
  /cube/PY/Python27/bin/python: No module named pyspark
pyspark
PYTHONPATH was:

/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/18/spark-assembly-1.
4.1-hadoop2.6.0.jar

Question is why is yarn not getting python package to run on the single
node via YARN?
Some people are saying run with JAVA 6 due to zip library changes between
6/7/8, some identified bug w RH, i am on debian,  then some documentation
errors but nothing is really clear.

i have binaries for spark hadoop and i did just fine with spark sql module,
hive, python, pandas ad yarn.
Locally as i said app is working fine (pandas to spark df to parquet)
But as soon as I move to yarn client mode yarn is not getting packages
required to run app.

If someone confirms that I need to build everything from source with
specific version of software I will do that, but at this point I am not
sure what to do to remedy this situation...

--sasha

On Sun, Sep 6, 2015 at 8:27 PM, Ashish Dutt <ashish.du...@gmail.com> wrote:

> Hi Aleksandar,
> Quite some time ago, I faced the same problem and I found a solution which
> I have posted here on my blog
> <https://edumine.wordpress.com/category/apache-spark/>.
> See if that can help you and if it does not then you can check out these
> questions & solution on stackoverflow
> <http://stackoverflow.com/search?q=no+module+named+pyspark> website
>
>
> Sincerely,
> Ashish Dutt
>
>
> On Mon, Sep 7, 2015 at 7:17 AM, Sasha Kacanski <skacan...@gmail.com>
> wrote:
>
>> Hi,
>> I am successfully running python app via pyCharm in local mode
>> setMaster("local[*]")
>>
>> When I turn on SparkConf().setMaster("yarn-client")
>>
>> and run via
>>
>> park-submit PysparkPandas.py
>>
>>
>> I run into issue:
>> Error from python worker:
>>   /cube/PY/Python27/bin/python: No module named pyspark
>> PYTHONPATH was:
>>
>> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/18/spark-assembly-1.4.1-hadoop2.6.0.jar
>>
>> I am running java
>> hadoop@pluto:~/pySpark$ /opt/java/jdk/bin/java -version
>> java version "1.8.0_31"
>> Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
>> Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
>>
>> Should I try same thing with java 6/7
>>
>> Is this packaging issue or I have something wrong with configurations ...
>>
>> Regards,
>>
>> --
>> Aleksandar Kacanski
>>
>
>

-- 
Aleksandar Kacanski

Re: hadoop2.6.0 + spark1.4.1 + python2.7.10

Reply via email to