Hi Andrew, Thanks for helping! Sorry I did not make my self clear, here is the output from iptables (both master and worker):
jie@jie-OptiPlex-7010:~/spark$ sudo ufw status Status: inactive jie@jie-OptiPlex-7010:~/spark$ sudo iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination 2013/12/17 Andrew Ash <[email protected]> > Hi Jie, > > When you say firewall is closed does that mean ports are blocked between > the worker nodes? I believe workers start up on a random port and send > data directly between each other during shuffles. Your firewall may be > blocking those connections. Can you try with the firewall temporarily > disabled? > > Andrew > > > On Mon, Dec 16, 2013 at 9:58 AM, Jie Deng <[email protected]> wrote: > >> Hi, >> Thanks for reading, >> >> I am trying to running a spark program on cluster. The program can >> successfully running on local; >> The standalone topology is working, I can see workers from master webUI; >> Master and worker are different machine, and worker status is ALIVE; >> The thing is no matter I start a program from eclipse or ./run-example, >> they both stop at some point like: >> Stage Id Description SubmittedDuration Tasks: Succeeded/TotalShuffle >> ReadShuffle Write0 count >> at >> SparkExample.java:31<http://jie-optiplex-7010.local:4040/stages/stage?id=0>2013/12/16 >> 14:50:367 m >> 0/2 >> And after a while, the worker's state become DEAD. >> >> Spark directory on worker is copy from master by ./make-distribution, >> firewall is all closed. >> >> Has anyone has the same issue before? >> > >
