Re: hadoop2.6.0 + spark1.4.1 + python2.7.10

Sasha Kacanski Tue, 08 Sep 2015 07:20:15 -0700

Hi Ashish,
Thanks for the update.
I tried all of it, but what I don't get it is that I run cluster with one
node so presumably I should have PYspark binaries there as I am developing
on same host.
Could you tell me where you placed parcels or whatever cloudera is using.
My understanding of yarn and spark is that these binaries get compressed
and packaged with Java to be pushed to work node.
Regards,
On Sep 7, 2015 9:00 PM, "Ashish Dutt" <ashish.du...@gmail.com> wrote:


> Hello Sasha,
>
> I have no answer for debian. My cluster is on Linux and I'm using CDH 5.4
> Your question-  "Error from python worker:
>   /cube/PY/Python27/bin/python: No module named pyspark"
>
> On a single node (ie one server/machine/computer) I installed pyspark
> binaries and it worked. Connected it to pycharm and it worked too.
>
> Next I tried executing pyspark command on another node (say the worker) in
> the cluster and i got this error message, Error from python worker: PATH:
> No module named pyspark".
>
> My first guess was that the worker is not picking up the path of pyspark
> binaries installed on the server ( I tried many a things like hard-coding
> the pyspark path in the config.sh file on the worker- NO LUCK; tried
> dynamic path from the code in pycharm- NO LUCK... ; searched the web and
> asked the question in almost every online forum--NO LUCK..; banged my head
> several times with pyspark/hadoop books--NO LUCK... Finally, one fine day a
> 'watermelon' dropped while brooding on this problem and I installed pyspark
> binaries on all the worker machines ) Now when I tried executing just the
> command pyspark on the worker's it worked. Tried some simple program
> snippets on each worker, it works too.
>
> I am not sure if this will help or not for your use-case.
>
>
>
> Sincerely,
> Ashish
>
> On Mon, Sep 7, 2015 at 11:04 PM, Sasha Kacanski <skacan...@gmail.com>
> wrote:
>
>> Thanks Ashish,
>> nice blog but does not cover my issue. Actually I have pycharm running
>> and loading pyspark and rest of libraries perfectly fine.
>> My issue is that I am not sure what is triggering
>>
>> Error from python worker:
>>   /cube/PY/Python27/bin/python: No module named pyspark
>> pyspark
>> PYTHONPATH was:
>>
>> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/18/spark-assembly-1.
>> 4.1-hadoop2.6.0.jar
>>
>> Question is why is yarn not getting python package to run on the single
>> node via YARN?
>> Some people are saying run with JAVA 6 due to zip library changes between
>> 6/7/8, some identified bug w RH, i am on debian,  then some documentation
>> errors but nothing is really clear.
>>
>> i have binaries for spark hadoop and i did just fine with spark sql
>> module, hive, python, pandas ad yarn.
>> Locally as i said app is working fine (pandas to spark df to parquet)
>> But as soon as I move to yarn client mode yarn is not getting packages
>> required to run app.
>>
>> If someone confirms that I need to build everything from source with
>> specific version of software I will do that, but at this point I am not
>> sure what to do to remedy this situation...
>>
>> --sasha
>>
>>
>> On Sun, Sep 6, 2015 at 8:27 PM, Ashish Dutt <ashish.du...@gmail.com>
>> wrote:
>>
>>> Hi Aleksandar,
>>> Quite some time ago, I faced the same problem and I found a solution
>>> which I have posted here on my blog
>>> <https://edumine.wordpress.com/category/apache-spark/>.
>>> See if that can help you and if it does not then you can check out these
>>> questions & solution on stackoverflow
>>> <http://stackoverflow.com/search?q=no+module+named+pyspark> website
>>>
>>>
>>> Sincerely,
>>> Ashish Dutt
>>>
>>>
>>> On Mon, Sep 7, 2015 at 7:17 AM, Sasha Kacanski <skacan...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> I am successfully running python app via pyCharm in local mode
>>>> setMaster("local[*]")
>>>>
>>>> When I turn on SparkConf().setMaster("yarn-client")
>>>>
>>>> and run via
>>>>
>>>> park-submit PysparkPandas.py
>>>>
>>>>
>>>> I run into issue:
>>>> Error from python worker:
>>>>   /cube/PY/Python27/bin/python: No module named pyspark
>>>> PYTHONPATH was:
>>>>
>>>> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/18/spark-assembly-1.4.1-hadoop2.6.0.jar
>>>>
>>>> I am running java
>>>> hadoop@pluto:~/pySpark$ /opt/java/jdk/bin/java -version
>>>> java version "1.8.0_31"
>>>> Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
>>>>
>>>> Should I try same thing with java 6/7
>>>>
>>>> Is this packaging issue or I have something wrong with configurations
>>>> ...
>>>>
>>>> Regards,
>>>>
>>>> --
>>>> Aleksandar Kacanski
>>>>
>>>
>>>
>>
>>
>> --
>> Aleksandar Kacanski
>>
>
>

Re: hadoop2.6.0 + spark1.4.1 + python2.7.10

Reply via email to