Glad you got it figured out!
On Tue, Dec 17, 2013 at 8:43 AM, Jie Deng <[email protected]> wrote: > don't bother...My problem is using spark-0.9 instead 0.8...because 0.9 > fixed bug which can run from eclipse. > > > 2013/12/17 Jie Deng <[email protected]> > >> When I start a task on master, I can see there is a >> CoarseGralinedExcutorBackend java process running on worker, is that saying >> something? >> >> >> 2013/12/17 Jie Deng <[email protected]> >> >>> Hi Andrew, >>> >>> Thanks for helping! >>> Sorry I did not make my self clear, here is the output from iptables >>> (both master and worker): >>> >>> jie@jie-OptiPlex-7010:~/spark$ sudo ufw status >>> Status: inactive >>> jie@jie-OptiPlex-7010:~/spark$ sudo iptables -L >>> Chain INPUT (policy ACCEPT) >>> target prot opt source destination >>> >>> Chain FORWARD (policy ACCEPT) >>> target prot opt source destination >>> >>> Chain OUTPUT (policy ACCEPT) >>> target prot opt source destination >>> >>> >>> >>> >>> 2013/12/17 Andrew Ash <[email protected]> >>> >>>> Hi Jie, >>>> >>>> When you say firewall is closed does that mean ports are blocked >>>> between the worker nodes? I believe workers start up on a random port and >>>> send data directly between each other during shuffles. Your firewall may >>>> be blocking those connections. Can you try with the firewall temporarily >>>> disabled? >>>> >>>> Andrew >>>> >>>> >>>> On Mon, Dec 16, 2013 at 9:58 AM, Jie Deng <[email protected]> wrote: >>>> >>>>> Hi, >>>>> Thanks for reading, >>>>> >>>>> I am trying to running a spark program on cluster. The program can >>>>> successfully running on local; >>>>> The standalone topology is working, I can see workers from master >>>>> webUI; Master and worker are different machine, and worker status is >>>>> ALIVE; >>>>> The thing is no matter I start a program from eclipse or >>>>> ./run-example, they both stop at some point like: >>>>> Stage Id Description SubmittedDuration Tasks: Succeeded/TotalShuffle >>>>> Read Shuffle Write 0 count at >>>>> SparkExample.java:31<http://jie-optiplex-7010.local:4040/stages/stage?id=0>2013/12/16 >>>>> 14:50:367 m >>>>> 0/2 >>>>> And after a while, the worker's state become DEAD. >>>>> >>>>> Spark directory on worker is copy from master by ./make-distribution, >>>>> firewall is all closed. >>>>> >>>>> Has anyone has the same issue before? >>>>> >>>> >>>> >>> >> >
