OK, yarn.scheduler.maximum-allocation-mb is 16384. I have ran it again, the command to run it is: ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster - -driver-memory 4g --executor-memory 8g lib/spark-examples*.jar 200
> > > 15/11/24 16:15:56 INFO yarn.ApplicationMaster: Registered signal handlers for > [TERM, HUP, INT] > > 15/11/24 16:15:57 INFO yarn.ApplicationMaster: ApplicationAttemptId: > appattempt_1447834709734_0120_000001 > > 15/11/24 16:15:58 INFO spark.SecurityManager: Changing view acls to: hdfs-test > > 15/11/24 16:15:58 INFO spark.SecurityManager: Changing modify acls to: > hdfs-test > > 15/11/24 16:15:58 INFO spark.SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(hdfs-test); > users with modify permissions: Set(hdfs-test) > > 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Starting the user application > in a separate Thread > > 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Waiting for spark context > initialization > > 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Waiting for spark context > initialization ... > 15/11/24 16:15:58 INFO spark.SparkContext: Running Spark version 1.5.0 > > 15/11/24 16:15:58 INFO spark.SecurityManager: Changing view acls to: hdfs-test > > 15/11/24 16:15:58 INFO spark.SecurityManager: Changing modify acls to: > hdfs-test > > 15/11/24 16:15:58 INFO spark.SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(hdfs-test); > users with modify permissions: Set(hdfs-test) > 15/11/24 16:15:58 INFO slf4j.Slf4jLogger: Slf4jLogger started > 15/11/24 16:15:59 INFO Remoting: Starting remoting > > 15/11/24 16:15:59 INFO Remoting: Remoting started; listening on addresses > :[akka.tcp://sparkDriver@X.X.X.X > ] > > 15/11/24 16:15:59 INFO util.Utils: Successfully started service 'sparkDriver' > on port 61904. > 15/11/24 16:15:59 INFO spark.SparkEnv: Registering MapOutputTracker > 15/11/24 16:15:59 INFO spark.SparkEnv: Registering BlockManagerMaster > > 15/11/24 16:15:59 INFO storage.DiskBlockManager: Created local directory at > /data1/hadoop/nm-local-dir/usercache/hdfs-test/appcache/application_1447834709734_0120/blockmgr-33fbe6c4-5138-4eff-83b4-fb0c886667b7 > > 15/11/24 16:15:59 INFO storage.MemoryStore: MemoryStore started with capacity > 1966.1 MB > > 15/11/24 16:15:59 INFO spark.HttpFileServer: HTTP File server directory is > /data1/hadoop/nm-local-dir/usercache/hdfs-test/appcache/application_1447834709734_0120/spark-fbbfa2bd-6d30-421e-a634-4546134b3b5f/httpd-e31d7b8e-ca8f-400e-8b4b-d2993fb6f1d1 > 15/11/24 16:15:59 INFO spark.HttpServer: Starting HTTP Server > 15/11/24 16:15:59 INFO server.Server: jetty-8.y.z-SNAPSHOT > 15/11/24 16:15:59 INFO server.AbstractConnector: Started > SocketConnector@0.0.0.0:14692 > > 15/11/24 16:15:59 INFO util.Utils: Successfully started service 'HTTP file > server' on port 14692. > 15/11/24 16:15:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator > > 15/11/24 16:15:59 INFO ui.JettyUtils: Adding filter: > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter > 15/11/24 16:15:59 INFO server.Server: jetty-8.y.z-SNAPSHOT > 15/11/24 16:15:59 INFO server.AbstractConnector: Started > SelectChannelConnector@0.0.0.0:15948 > 15/11/24 16:15:59 INFO util.Utils: Successfully started service 'SparkUI' on > port 15948. > > 15/11/24 16:15:59 INFO ui.SparkUI: Started SparkUI at X.X.X.X > > 15/11/24 16:15:59 INFO cluster.YarnClusterScheduler: Created > YarnClusterScheduler > > 15/11/24 16:15:59 WARN metrics.MetricsSystem: Using default name DAGScheduler > for source because > spark.app.id is not set. > > 15/11/24 16:15:59 INFO util.Utils: Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41830. > 15/11/24 16:15:59 INFO netty.NettyBlockTransferService: Server created on > 41830 > > 15/11/24 16:15:59 INFO storage.BlockManagerMaster: Trying to register > BlockManager > > 15/11/24 16:15:59 INFO storage.BlockManagerMasterEndpoint: Registering block > manager X.X.X.X:41830 with 1966.1 MB RAM, BlockManagerId(driver, 10.12.30.2, > 41830) > > 15/11/24 16:15:59 INFO storage.BlockManagerMaster: Registered BlockManager > 15/11/24 16:16:00 INFO scheduler.EventLoggingListener: Logging events to > hdfs:///tmp/latest-spark-events/application_1447834709734_0120_1 > > 15/11/24 16:16:00 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: > ApplicationMaster registered as > AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/YarnAM#293602859]) > > 15/11/24 16:16:00 INFO client.RMProxy: Connecting to ResourceManager at > X.X.X.X > > 15/11/24 16:16:00 INFO yarn.YarnRMClient: Registering the ApplicationMaster > > 15/11/24 16:16:00 INFO yarn.ApplicationMaster: Started progress reporter > thread with (heartbeat : 3000, initial allocation : 200) intervals > > 15/11/24 16:16:29 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend > is ready for scheduling beginning after waiting > maxRegisteredResourcesWaitingTime: 30000(ms) > > 15/11/24 16:16:29 INFO cluster.YarnClusterScheduler: > YarnClusterScheduler.postStartHook done > > 15/11/24 16:16:29 INFO spark.SparkContext: Starting job: reduce at > SparkPi.scala:36 > > 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Got job 0 (reduce at > SparkPi.scala:36) with 200 output partitions > > 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Final stage: ResultStage > 0(reduce at SparkPi.scala:36) > > 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Parents of final stage: List() > 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Missing parents: List() > > 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Submitting ResultStage 0 > (MapPartitionsRDD[1] at map at SparkPi.scala:32), which has no missing parents > > 15/11/24 16:16:30 INFO storage.MemoryStore: ensureFreeSpace(1888) called with > curMem=0, maxMem=2061647216 > > 15/11/24 16:16:30 INFO storage.MemoryStore: Block broadcast_0 stored as > values in memory (estimated size 1888.0 B, free 1966.1 MB) > 15/11/24 16:16:30 INFO storage.MemoryStore: ensureFreeSpace(1202) called with > curMem=1888, maxMem=2061647216 > > 15/11/24 16:16:30 INFO storage.MemoryStore: Block broadcast_0_piece0 stored > as bytes in memory (estimated size 1202.0 B, free 1966.1 MB) > > 15/11/24 16:16:30 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in > memory on X.X.X.X:41830 (size: 1202.0 B, free: 1966.1 MB) > > > 15/11/24 16:16:30 INFO spark.SparkContext: Created broadcast 0 from broadcast > at DAGScheduler.scala:861 > > 15/11/24 16:16:30 INFO scheduler.DAGScheduler: Submitting 200 missing tasks > from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32) > > 15/11/24 16:16:30 INFO cluster.YarnClusterScheduler: Adding task set 0.0 with > 200 tasks > > 15/11/24 16:16:45 WARN cluster.YarnClusterScheduler: Initial job has not > accepted any resources; check your cluster UI to ensure that workers are > registered and have sufficient resources > > 15/11/24 16:17:00 WARN cluster.YarnClusterScheduler: Initial job has not > accepted any resources; check your cluster UI to ensure that workers are > registered and have sufficient resources > > 15/11/24 16:17:15 WARN cluster.YarnClusterScheduler: Initial job has not > accepted any resources; check your cluster UI to ensure that workers are > registered and have sufficient resources > > 15/11/24 16:17:30 WARN cluster.YarnClusterScheduler: Initial job has not > accepted any resources; check your cluster UI to ensure that workers are > registered and have sufficient resources > > 15/11/24 16:17:45 WARN cluster.YarnClusterScheduler: Initial job has not > accepted any resources; check your cluster UI to ensure that workers are > registered and have sufficient resources > > 15/11/24 16:18:00 WARN cluster.YarnClusterScheduler: Initial job has not > accepted any resources; check your cluster UI to ensure that workers are > registered and have sufficient resources > > 2015-11-24 15:14 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: > What about this configure in Yarn "yarn.scheduler.maximum-allocation-mb" > > I'm curious why 49 executors can be worked, but 50 failed. Would you > provide your application master log, if container request is issued, there > will be log like: > > 15/10/14 17:35:37 INFO yarn.YarnAllocator: Will request 2 executor > containers, each with 1 cores and 1408 MB memory including 384 MB overhead > 15/10/14 17:35:37 INFO yarn.YarnAllocator: Container request (host: Any, > capability: <memory:1408, vCores:1>) > 15/10/14 17:35:37 INFO yarn.YarnAllocator: Container request (host: Any, > capability: <memory:1408, vCores:1>) > > > > On Tue, Nov 24, 2015 at 2:56 PM, 谢廷稳 <xieting...@gmail.com> wrote: > >> OK, the YARN conf will be list in the following: >> >> yarn.nodemanager.resource.memory-mb:115200 >> yarn.nodemanager.resource.cpu-vcores:50 >> >> I think the YARN resource is sufficient. In the previous letter I have >> said that I think Spark application didn't request resources from YARN. >> >> Thanks >> >> 2015-11-24 14:30 GMT+08:00 cherrywayb...@gmail.com < >> cherrywayb...@gmail.com>: >> >>> can you show your parameter values in your env ? >>> yarn.nodemanager.resource.cpu-vcores >>> yarn.nodemanager.resource.memory-mb >>> >>> ------------------------------ >>> cherrywayb...@gmail.com >>> >>> >>> *From:* 谢廷稳 <xieting...@gmail.com> >>> *Date:* 2015-11-24 12:13 >>> *To:* Saisai Shao <sai.sai.s...@gmail.com> >>> *CC:* spark users <user@spark.apache.org> >>> *Subject:* Re: A Problem About Running Spark 1.5 on YARN with Dynamic >>> Alloction >>> OK,the YARN cluster was used by myself,it have 6 node witch can run over >>> 100 executor, and the YARN RM logs showed that the Spark application did >>> not requested resource from it. >>> >>> Is this a bug? Should I create a JIRA for this problem? >>> >>> 2015-11-24 12:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: >>> >>>> OK, so this looks like your Yarn cluster does not allocate containers >>>> which you supposed should be 50. Does the yarn cluster have enough resource >>>> after allocating AM container, if not, that is the problem. >>>> >>>> The problem not lies in dynamic allocation from my guess of your >>>> description. I said I'm OK with min and max executors to the same number. >>>> >>>> On Tue, Nov 24, 2015 at 11:54 AM, 谢廷稳 <xieting...@gmail.com> wrote: >>>> >>>>> Hi Saisai, >>>>> I'm sorry for did not describe it clearly,YARN debug log said I have >>>>> 50 executors,but ResourceManager showed that I only have 1 container for >>>>> the AppMaster. >>>>> >>>>> I have checked YARN RM logs,after AppMaster changed state >>>>> from ACCEPTED to RUNNING,it did not have log about this job any >>>>> more.So,the >>>>> problem is I did not have any executor but ExecutorAllocationManager think >>>>> I have.Would you minding having a test in your cluster environment? >>>>> Thanks, >>>>> Weber >>>>> >>>>> 2015-11-24 11:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: >>>>> >>>>>> I think this behavior is expected, since you already have 50 >>>>>> executors launched, so no need to acquire additional executors. You >>>>>> change >>>>>> is not solid, it is just hiding the log. >>>>>> >>>>>> Again I think you should check the logs of Yarn and Spark to see if >>>>>> executors are started correctly. Why resource is still not enough where >>>>>> you >>>>>> already have 50 executors. >>>>>> >>>>>> On Tue, Nov 24, 2015 at 10:48 AM, 谢廷稳 <xieting...@gmail.com> wrote: >>>>>> >>>>>>> Hi SaiSai, >>>>>>> I have changed "if (numExecutorsTarget >= maxNumExecutors)" to "if >>>>>>> (numExecutorsTarget > maxNumExecutors)" of the first line in the >>>>>>> ExecutorAllocationManager#addExecutors() and it rans well. >>>>>>> In my opinion,when I was set minExecutors equals maxExecutors,when >>>>>>> the first time to add Executors,numExecutorsTarget equals >>>>>>> maxNumExecutors >>>>>>> and it repeat printe "DEBUG ExecutorAllocationManager: Not adding >>>>>>> executors because our current target total is already 50 (limit 50) >>>>>>> ". >>>>>>> Thanks >>>>>>> Weber >>>>>>> >>>>>>> 2015-11-23 21:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: >>>>>>> >>>>>>>> Hi Tingwen, >>>>>>>> >>>>>>>> Would you minding sharing your changes in >>>>>>>> ExecutorAllocationManager#addExecutors(). >>>>>>>> >>>>>>>> From my understanding and test, dynamic allocation can be worked >>>>>>>> when you set the min to max number of executors to the same number. >>>>>>>> >>>>>>>> Please check your Spark and Yarn log to make sure the executors are >>>>>>>> correctly started, the warning log means currently resource is not >>>>>>>> enough >>>>>>>> to submit tasks. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Saisai >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Nov 23, 2015 at 8:41 PM, 谢廷稳 <xieting...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> I ran a SparkPi on YARN with Dynamic Allocation enabled and set >>>>>>>>> spark.dynamicAllocation.maxExecutors >>>>>>>>> equals >>>>>>>>> spark.dynamicAllocation.minExecutors,then I submit an application >>>>>>>>> using: >>>>>>>>> ./bin/spark-submit --class org.apache.spark.examples.SparkPi >>>>>>>>> --master yarn-cluster --driver-memory 4g --executor-memory 8g >>>>>>>>> lib/spark-examples*.jar 200 >>>>>>>>> >>>>>>>>> then, this application was submitted successfully, but the >>>>>>>>> AppMaster always saying “15/11/23 20:13:08 WARN >>>>>>>>> cluster.YarnClusterScheduler: Initial job has not accepted any >>>>>>>>> resources; >>>>>>>>> check your cluster UI to ensure that workers are registered and have >>>>>>>>> sufficient resources” >>>>>>>>> and when I open DEBUG,I found “15/11/23 20:24:00 DEBUG >>>>>>>>> ExecutorAllocationManager: Not adding executors because our current >>>>>>>>> target >>>>>>>>> total is already 50 (limit 50)” in the console. >>>>>>>>> >>>>>>>>> I have fixed it by modifying code in >>>>>>>>> ExecutorAllocationManager.addExecutors,Does this a bug or it was >>>>>>>>> designed >>>>>>>>> that we can’t set maxExecutors equals minExecutors? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Weber >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >