don't bother...My problem is using spark-0.9 instead 0.8...because 0.9 fixed bug which can run from eclipse.
2013/12/17 Jie Deng <[email protected]> > When I start a task on master, I can see there is a > CoarseGralinedExcutorBackend java process running on worker, is that saying > something? > > > 2013/12/17 Jie Deng <[email protected]> > >> Hi Andrew, >> >> Thanks for helping! >> Sorry I did not make my self clear, here is the output from iptables >> (both master and worker): >> >> jie@jie-OptiPlex-7010:~/spark$ sudo ufw status >> Status: inactive >> jie@jie-OptiPlex-7010:~/spark$ sudo iptables -L >> Chain INPUT (policy ACCEPT) >> target prot opt source destination >> >> Chain FORWARD (policy ACCEPT) >> target prot opt source destination >> >> Chain OUTPUT (policy ACCEPT) >> target prot opt source destination >> >> >> >> >> 2013/12/17 Andrew Ash <[email protected]> >> >>> Hi Jie, >>> >>> When you say firewall is closed does that mean ports are blocked between >>> the worker nodes? I believe workers start up on a random port and send >>> data directly between each other during shuffles. Your firewall may be >>> blocking those connections. Can you try with the firewall temporarily >>> disabled? >>> >>> Andrew >>> >>> >>> On Mon, Dec 16, 2013 at 9:58 AM, Jie Deng <[email protected]> wrote: >>> >>>> Hi, >>>> Thanks for reading, >>>> >>>> I am trying to running a spark program on cluster. The program can >>>> successfully running on local; >>>> The standalone topology is working, I can see workers from master >>>> webUI; Master and worker are different machine, and worker status is ALIVE; >>>> The thing is no matter I start a program from eclipse or ./run-example, >>>> they both stop at some point like: >>>> Stage Id Description SubmittedDuration Tasks: Succeeded/TotalShuffle >>>> Read Shuffle Write 0 count at >>>> SparkExample.java:31<http://jie-optiplex-7010.local:4040/stages/stage?id=0>2013/12/16 >>>> 14:50:367 m >>>> 0/2 >>>> And after a while, the worker's state become DEAD. >>>> >>>> Spark directory on worker is copy from master by ./make-distribution, >>>> firewall is all closed. >>>> >>>> Has anyone has the same issue before? >>>> >>> >>> >> >
