Re: can't get jobs to run on cluster (enough memory and cpus are available on worker)

2014-07-21 Thread Matt Work Coarr
I got this working by having our sysadmin update our security group to allow incoming traffic from the local subnet on ports 1-65535. I'm not sure if there's a more specific range I could have used, but so far, everything is running! Thanks for all the responses Marcelo and Andrew!! Matt

Re: can't get jobs to run on cluster (enough memory and cpus are available on worker)

2014-07-17 Thread Marcelo Vanzin
On Wed, Jul 16, 2014 at 12:36 PM, Matt Work Coarr mattcoarr.w...@gmail.com wrote: Thanks Marcelo, I'm not seeing anything in the logs that clearly explains what's causing this to break. One interesting point that we just discovered is that if we run the driver and the slave (worker) on the

Re: can't get jobs to run on cluster (enough memory and cpus are available on worker)

2014-07-17 Thread Matt Work Coarr
Thanks Marcelo! This is a huge help!! Looking at the executor logs (in a vanilla spark install, I'm finding them in $SPARK_HOME/work/*)... It launches the executor, but it looks like the CoarseGrainedExecutorBackend is having trouble talking to the driver (exactly what you said!!!). Do you

Re: can't get jobs to run on cluster (enough memory and cpus are available on worker)

2014-07-17 Thread Marcelo Vanzin
Hi Matt, I'm not very familiar with setup on ec2; the closest I can point you at is to look at the launch_cluster in ec2/spark_ec2.py, where the ports seem to be configured. On Thu, Jul 17, 2014 at 1:29 PM, Matt Work Coarr mattcoarr.w...@gmail.com wrote: Thanks Marcelo! This is a huge help!!

Re: can't get jobs to run on cluster (enough memory and cpus are available on worker)

2014-07-17 Thread Andrew Or
Hi Matt, The security group shouldn't be an issue; the ports listed in `spark_ec2.py` are only for communication with the outside world. How did you launch your application? I notice you did not launch your driver from your Master node. What happens if you did? Another thing is that there seems

Re: can't get jobs to run on cluster (enough memory and cpus are available on worker)

2014-07-16 Thread Matt Work Coarr
Thanks Marcelo, I'm not seeing anything in the logs that clearly explains what's causing this to break. One interesting point that we just discovered is that if we run the driver and the slave (worker) on the same host it runs, but if we run the driver on a separate host it does not run.

can't get jobs to run on cluster (enough memory and cpus are available on worker)

2014-07-15 Thread Matt Work Coarr
Hello spark folks, I have a simple spark cluster setup but I can't get jobs to run on it. I am using the standlone mode. One master, one slave. Both machines have 32GB ram and 8 cores. The slave is setup with one worker that has 8 cores and 24GB memory allocated. My application requires 2

Re: can't get jobs to run on cluster (enough memory and cpus are available on worker)

2014-07-15 Thread Marcelo Vanzin
Have you looked at the slave machine to see if the process has actually launched? If it has, have you tried peeking into its log file? (That error is printed whenever the executors fail to report back to the driver. Insufficient resources to launch the executor is the most common cause of that,