Glad you got it figured out!

On Tue, Dec 17, 2013 at 8:43 AM, Jie Deng <[email protected]> wrote:

> don't bother...My problem is using spark-0.9 instead 0.8...because 0.9
> fixed bug which can run from eclipse.
>
>
> 2013/12/17 Jie Deng <[email protected]>
>
>> When I start a task on master, I can see there is a
>> CoarseGralinedExcutorBackend java process running on worker, is that saying
>> something?
>>
>>
>> 2013/12/17 Jie Deng <[email protected]>
>>
>>> Hi Andrew,
>>>
>>> Thanks for helping!
>>> Sorry I did not make my self clear, here is the output from iptables
>>> (both master and worker):
>>>
>>> jie@jie-OptiPlex-7010:~/spark$ sudo ufw status
>>> Status: inactive
>>> jie@jie-OptiPlex-7010:~/spark$ sudo iptables -L
>>> Chain INPUT (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>> Chain FORWARD (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>> Chain OUTPUT (policy ACCEPT)
>>> target     prot opt source               destination
>>>
>>>
>>>
>>>
>>> 2013/12/17 Andrew Ash <[email protected]>
>>>
>>>> Hi Jie,
>>>>
>>>> When you say firewall is closed does that mean ports are blocked
>>>> between the worker nodes?  I believe workers start up on a random port and
>>>> send data directly between each other during shuffles.  Your firewall may
>>>> be blocking those connections.  Can you try with the firewall temporarily
>>>> disabled?
>>>>
>>>> Andrew
>>>>
>>>>
>>>> On Mon, Dec 16, 2013 at 9:58 AM, Jie Deng <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>> Thanks for reading,
>>>>>
>>>>> I am trying to running a spark program on cluster. The program can
>>>>> successfully running on local;
>>>>> The standalone topology is working, I can see workers from master
>>>>> webUI; Master and worker are different machine, and worker status is 
>>>>> ALIVE;
>>>>> The thing is no matter I start a program from eclipse or
>>>>> ./run-example, they both stop at some point like:
>>>>> Stage Id Description SubmittedDuration Tasks: Succeeded/TotalShuffle
>>>>> Read Shuffle Write 0 count at 
>>>>> SparkExample.java:31<http://jie-optiplex-7010.local:4040/stages/stage?id=0>2013/12/16
>>>>>  14:50:367 m
>>>>> 0/2
>>>>>  And after a while, the worker's state become DEAD.
>>>>>
>>>>> Spark directory on worker is copy from master by ./make-distribution,
>>>>> firewall is all closed.
>>>>>
>>>>> Has anyone has the same issue before?
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to