Hi all! Thanks for answering!
@Sean, I tried to run with 30 executor-cores , and 1 machine still without processing. @Vanzin, I checked RM's web UI, and all nodes were detecteds and "RUNNING". The interesting fact is that available memory and available core of 1 node was different of other 2, with just 1 available core and 1 available gig ram. @All, I created a new cluster with 10 slaves and 1 master, and now 9 of my slaves are working, and 1 still without processing. It's fine by me! I'm just wondering why YARN's doing it... Does anyone know the answer? 2014-11-18 16:18 GMT-02:00 Sean Owen <so...@cloudera.com>: > My guess is you're asking for all cores of all machines but the driver > needs at least one core, so one executor is unable to find a machine to fit > on. > On Nov 18, 2014 7:04 PM, "Alan Prando" <a...@scanboo.com.br> wrote: > >> Hi Folks! >> >> I'm running Spark on YARN cluster installed with Cloudera Manager Express. >> The cluster has 1 master and 3 slaves, each machine with 32 cores and 64G >> RAM. >> >> My spark's job is working fine, however it seems that just 2 of 3 slaves >> are working (htop shows 2 slaves working 100% on 32 cores, and 1 slaves >> without any processing). >> >> I'm using this command: >> ./spark-submit --master yarn --num-executors 3 --executor-cores 32 >> --executor-memory 32g feature_extractor.py -r 390 >> >> Additionaly, spark's log testify communications with 2 slaves only: >> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor: >> Actor[akka.tcp://sparkExecutor@ip-172-31-13-180.ec2.internal:33177/user/Executor#-113177469] >> with ID 1 >> 14/11/18 17:19:38 INFO RackResolver: Resolved >> ip-172-31-13-180.ec2.internal to /default >> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor: >> Actor[akka.tcp://sparkExecutor@ip-172-31-13-179.ec2.internal:51859/user/Executor#-323896724] >> with ID 2 >> 14/11/18 17:19:38 INFO RackResolver: Resolved >> ip-172-31-13-179.ec2.internal to /default >> 14/11/18 17:19:38 INFO BlockManagerMasterActor: Registering block manager >> ip-172-31-13-180.ec2.internal:50959 with 16.6 GB RAM >> 14/11/18 17:19:39 INFO BlockManagerMasterActor: Registering block manager >> ip-172-31-13-179.ec2.internal:53557 with 16.6 GB RAM >> 14/11/18 17:19:51 INFO YarnClientSchedulerBackend: SchedulerBackend is >> ready for scheduling beginning after waiting >> maxRegisteredResourcesWaitingTime: 30000(ms) >> >> Is there a configuration to call spark's job on YARN cluster with all >> slaves? >> >> Thanks in advance! =] >> >> --- >> Regards >> Alan Vidotti Prando. >> >> >>