Thank you very much, after change to newer version, it did work well! 2015-11-24 17:15 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:
> The document is right. Because of a bug introduce in > https://issues.apache.org/jira/browse/SPARK-9092 which makes this > configuration fail to work. > > It is fixed in https://issues.apache.org/jira/browse/SPARK-10790, you > could change to newer version of Spark. > > On Tue, Nov 24, 2015 at 5:12 PM, 谢廷稳 <xieting...@gmail.com> wrote: > >> @Sab Thank you for your reply, but the cluster has 6 nodes which contain >> 300 cores and Spark application did not request resource from YARN. >> >> @SaiSai I have ran it successful with " >> spark.dynamicAllocation.initialExecutors" equals 50, but in >> http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation >> it says that >> >> "spark.dynamicAllocation.initialExecutors" equals " >> spark.dynamicAllocation.minExecutors". So, I think something was wrong, >> did it? >> >> Thanks. >> >> >> >> 2015-11-24 16:47 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: >> >>> Did you set this configuration "spark.dynamicAllocation.initialExecutors" >>> ? >>> >>> You can set spark.dynamicAllocation.initialExecutors 50 to take try >>> again. >>> >>> I guess you might be hitting this issue since you're running 1.5.0, >>> https://issues.apache.org/jira/browse/SPARK-9092. But it still cannot >>> explain why 49 executors can be worked. >>> >>> On Tue, Nov 24, 2015 at 4:42 PM, Sabarish Sasidharan < >>> sabarish.sasidha...@manthan.com> wrote: >>> >>>> If yarn has only 50 cores then it can support max 49 executors plus 1 >>>> driver application master. >>>> >>>> Regards >>>> Sab >>>> On 24-Nov-2015 1:58 pm, "谢廷稳" <xieting...@gmail.com> wrote: >>>> >>>>> OK, yarn.scheduler.maximum-allocation-mb is 16384. >>>>> >>>>> I have ran it again, the command to run it is: >>>>> ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master >>>>> yarn-cluster - >>>>> -driver-memory 4g --executor-memory 8g lib/spark-examples*.jar 200 >>>>> >>>>> >>>>> >>>>>> >>>>>> >>>>>> 15/11/24 16:15:56 INFO yarn.ApplicationMaster: Registered signal >>>>>> handlers for [TERM, HUP, INT] >>>>>> >>>>>> 15/11/24 16:15:57 INFO yarn.ApplicationMaster: ApplicationAttemptId: >>>>>> appattempt_1447834709734_0120_000001 >>>>>> >>>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing view acls to: >>>>>> hdfs-test >>>>>> >>>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing modify acls to: >>>>>> hdfs-test >>>>>> >>>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: SecurityManager: >>>>>> authentication disabled; ui acls disabled; users with view permissions: >>>>>> Set(hdfs-test); users with modify permissions: Set(hdfs-test) >>>>>> >>>>>> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Starting the user >>>>>> application in a separate Thread >>>>>> >>>>>> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Waiting for spark context >>>>>> initialization >>>>>> >>>>>> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Waiting for spark context >>>>>> initialization ... >>>>>> 15/11/24 16:15:58 INFO spark.SparkContext: Running Spark version 1.5.0 >>>>>> >>>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing view acls to: >>>>>> hdfs-test >>>>>> >>>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing modify acls to: >>>>>> hdfs-test >>>>>> >>>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: SecurityManager: >>>>>> authentication disabled; ui acls disabled; users with view permissions: >>>>>> Set(hdfs-test); users with modify permissions: Set(hdfs-test) >>>>>> 15/11/24 16:15:58 INFO slf4j.Slf4jLogger: Slf4jLogger started >>>>>> 15/11/24 16:15:59 INFO Remoting: Starting remoting >>>>>> >>>>>> 15/11/24 16:15:59 INFO Remoting: Remoting started; listening on >>>>>> addresses :[akka.tcp://sparkDriver@X.X.X.X >>>>>> ] >>>>>> >>>>>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service >>>>>> 'sparkDriver' on port 61904. >>>>>> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering MapOutputTracker >>>>>> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering BlockManagerMaster >>>>>> >>>>>> 15/11/24 16:15:59 INFO storage.DiskBlockManager: Created local directory >>>>>> at >>>>>> /data1/hadoop/nm-local-dir/usercache/hdfs-test/appcache/application_1447834709734_0120/blockmgr-33fbe6c4-5138-4eff-83b4-fb0c886667b7 >>>>>> >>>>>> 15/11/24 16:15:59 INFO storage.MemoryStore: MemoryStore started with >>>>>> capacity 1966.1 MB >>>>>> >>>>>> 15/11/24 16:15:59 INFO spark.HttpFileServer: HTTP File server directory >>>>>> is >>>>>> /data1/hadoop/nm-local-dir/usercache/hdfs-test/appcache/application_1447834709734_0120/spark-fbbfa2bd-6d30-421e-a634-4546134b3b5f/httpd-e31d7b8e-ca8f-400e-8b4b-d2993fb6f1d1 >>>>>> 15/11/24 16:15:59 INFO spark.HttpServer: Starting HTTP Server >>>>>> 15/11/24 16:15:59 INFO server.Server: jetty-8.y.z-SNAPSHOT >>>>>> 15/11/24 16:15:59 INFO server.AbstractConnector: Started >>>>>> SocketConnector@0.0.0.0:14692 >>>>>> >>>>>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 'HTTP >>>>>> file server' on port 14692. >>>>>> >>>>>> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering >>>>>> OutputCommitCoordinator >>>>>> >>>>>> 15/11/24 16:15:59 INFO ui.JettyUtils: Adding filter: >>>>>> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter >>>>>> 15/11/24 16:15:59 INFO server.Server: jetty-8.y.z-SNAPSHOT >>>>>> 15/11/24 16:15:59 INFO server.AbstractConnector: Started >>>>>> SelectChannelConnector@0.0.0.0:15948 >>>>>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service >>>>>> 'SparkUI' on port 15948. >>>>>> >>>>>> 15/11/24 16:15:59 INFO ui.SparkUI: Started SparkUI at X.X.X.X >>>>>> >>>>>> 15/11/24 16:15:59 INFO cluster.YarnClusterScheduler: Created >>>>>> YarnClusterScheduler >>>>>> >>>>>> 15/11/24 16:15:59 WARN metrics.MetricsSystem: Using default name >>>>>> DAGScheduler for source because >>>>>> spark.app.id is not set. >>>>>> >>>>>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service >>>>>> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41830. >>>>>> 15/11/24 16:15:59 INFO netty.NettyBlockTransferService: Server created >>>>>> on 41830 >>>>>> >>>>>> 15/11/24 16:15:59 INFO storage.BlockManagerMaster: Trying to register >>>>>> BlockManager >>>>>> >>>>>> 15/11/24 16:15:59 INFO storage.BlockManagerMasterEndpoint: Registering >>>>>> block manager X.X.X.X:41830 with 1966.1 MB RAM, BlockManagerId(driver, >>>>>> 10.12.30.2, 41830) >>>>>> >>>>>> >>>>>> 15/11/24 16:15:59 INFO storage.BlockManagerMaster: Registered >>>>>> BlockManager >>>>>> 15/11/24 16:16:00 INFO scheduler.EventLoggingListener: Logging events to >>>>>> hdfs:///tmp/latest-spark-events/application_1447834709734_0120_1 >>>>>> >>>>>> 15/11/24 16:16:00 INFO >>>>>> cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster >>>>>> registered as >>>>>> AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/YarnAM#293602859]) >>>>>> >>>>>> 15/11/24 16:16:00 INFO client.RMProxy: Connecting to ResourceManager at >>>>>> X.X.X.X >>>>>> >>>>>> >>>>>> 15/11/24 16:16:00 INFO yarn.YarnRMClient: Registering the >>>>>> ApplicationMaster >>>>>> >>>>>> 15/11/24 16:16:00 INFO yarn.ApplicationMaster: Started progress reporter >>>>>> thread with (heartbeat : 3000, initial allocation : 200) intervals >>>>>> >>>>>> 15/11/24 16:16:29 INFO cluster.YarnClusterSchedulerBackend: >>>>>> SchedulerBackend is ready for scheduling beginning after waiting >>>>>> maxRegisteredResourcesWaitingTime: 30000(ms) >>>>>> >>>>>> 15/11/24 16:16:29 INFO cluster.YarnClusterScheduler: >>>>>> YarnClusterScheduler.postStartHook done >>>>>> >>>>>> 15/11/24 16:16:29 INFO spark.SparkContext: Starting job: reduce at >>>>>> SparkPi.scala:36 >>>>>> >>>>>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Got job 0 (reduce at >>>>>> SparkPi.scala:36) with 200 output partitions >>>>>> >>>>>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Final stage: ResultStage >>>>>> 0(reduce at SparkPi.scala:36) >>>>>> >>>>>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Parents of final stage: >>>>>> List() >>>>>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Missing parents: List() >>>>>> >>>>>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Submitting ResultStage 0 >>>>>> (MapPartitionsRDD[1] at map at SparkPi.scala:32), which has no missing >>>>>> parents >>>>>> >>>>>> 15/11/24 16:16:30 INFO storage.MemoryStore: ensureFreeSpace(1888) called >>>>>> with curMem=0, maxMem=2061647216 >>>>>> >>>>>> 15/11/24 16:16:30 INFO storage.MemoryStore: Block broadcast_0 stored as >>>>>> values in memory (estimated size 1888.0 B, free 1966.1 MB) >>>>>> 15/11/24 16:16:30 INFO storage.MemoryStore: ensureFreeSpace(1202) called >>>>>> with curMem=1888, maxMem=2061647216 >>>>>> >>>>>> 15/11/24 16:16:30 INFO storage.MemoryStore: Block broadcast_0_piece0 >>>>>> stored as bytes in memory (estimated size 1202.0 B, free 1966.1 MB) >>>>>> >>>>>> 15/11/24 16:16:30 INFO storage.BlockManagerInfo: Added >>>>>> broadcast_0_piece0 in memory on X.X.X.X:41830 (size: 1202.0 B, free: >>>>>> 1966.1 MB) >>>>>> >>>>>> >>>>>> 15/11/24 16:16:30 INFO spark.SparkContext: Created broadcast 0 from >>>>>> broadcast at DAGScheduler.scala:861 >>>>>> >>>>>> 15/11/24 16:16:30 INFO scheduler.DAGScheduler: Submitting 200 missing >>>>>> tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32) >>>>>> >>>>>> 15/11/24 16:16:30 INFO cluster.YarnClusterScheduler: Adding task set 0.0 >>>>>> with 200 tasks >>>>>> >>>>>> 15/11/24 16:16:45 WARN cluster.YarnClusterScheduler: Initial job has not >>>>>> accepted any resources; check your cluster UI to ensure that workers are >>>>>> registered and have sufficient resources >>>>>> >>>>>> 15/11/24 16:17:00 WARN cluster.YarnClusterScheduler: Initial job has not >>>>>> accepted any resources; check your cluster UI to ensure that workers are >>>>>> registered and have sufficient resources >>>>>> >>>>>> 15/11/24 16:17:15 WARN cluster.YarnClusterScheduler: Initial job has not >>>>>> accepted any resources; check your cluster UI to ensure that workers are >>>>>> registered and have sufficient resources >>>>>> >>>>>> 15/11/24 16:17:30 WARN cluster.YarnClusterScheduler: Initial job has not >>>>>> accepted any resources; check your cluster UI to ensure that workers are >>>>>> registered and have sufficient resources >>>>>> >>>>>> 15/11/24 16:17:45 WARN cluster.YarnClusterScheduler: Initial job has not >>>>>> accepted any resources; check your cluster UI to ensure that workers are >>>>>> registered and have sufficient resources >>>>>> >>>>>> 15/11/24 16:18:00 WARN cluster.YarnClusterScheduler: Initial job has not >>>>>> accepted any resources; check your cluster UI to ensure that workers are >>>>>> registered and have sufficient resources >>>>>> >>>>>> >>>>> 2015-11-24 15:14 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: >>>>> >>>>>> What about this configure in Yarn " >>>>>> yarn.scheduler.maximum-allocation-mb" >>>>>> >>>>>> I'm curious why 49 executors can be worked, but 50 failed. Would you >>>>>> provide your application master log, if container request is issued, >>>>>> there >>>>>> will be log like: >>>>>> >>>>>> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Will request 2 executor >>>>>> containers, each with 1 cores and 1408 MB memory including 384 MB >>>>>> overhead >>>>>> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Container request (host: >>>>>> Any, capability: <memory:1408, vCores:1>) >>>>>> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Container request (host: >>>>>> Any, capability: <memory:1408, vCores:1>) >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Nov 24, 2015 at 2:56 PM, 谢廷稳 <xieting...@gmail.com> wrote: >>>>>> >>>>>>> OK, the YARN conf will be list in the following: >>>>>>> >>>>>>> yarn.nodemanager.resource.memory-mb:115200 >>>>>>> yarn.nodemanager.resource.cpu-vcores:50 >>>>>>> >>>>>>> I think the YARN resource is sufficient. In the previous letter I >>>>>>> have said that I think Spark application didn't request resources >>>>>>> from YARN. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> 2015-11-24 14:30 GMT+08:00 cherrywayb...@gmail.com < >>>>>>> cherrywayb...@gmail.com>: >>>>>>> >>>>>>>> can you show your parameter values in your env ? >>>>>>>> yarn.nodemanager.resource.cpu-vcores >>>>>>>> yarn.nodemanager.resource.memory-mb >>>>>>>> >>>>>>>> ------------------------------ >>>>>>>> cherrywayb...@gmail.com >>>>>>>> >>>>>>>> >>>>>>>> *From:* 谢廷稳 <xieting...@gmail.com> >>>>>>>> *Date:* 2015-11-24 12:13 >>>>>>>> *To:* Saisai Shao <sai.sai.s...@gmail.com> >>>>>>>> *CC:* spark users <user@spark.apache.org> >>>>>>>> *Subject:* Re: A Problem About Running Spark 1.5 on YARN with >>>>>>>> Dynamic Alloction >>>>>>>> OK,the YARN cluster was used by myself,it have 6 node witch can run >>>>>>>> over 100 executor, and the YARN RM logs showed that the Spark >>>>>>>> application >>>>>>>> did not requested resource from it. >>>>>>>> >>>>>>>> Is this a bug? Should I create a JIRA for this problem? >>>>>>>> >>>>>>>> 2015-11-24 12:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: >>>>>>>> >>>>>>>>> OK, so this looks like your Yarn cluster does not allocate >>>>>>>>> containers which you supposed should be 50. Does the yarn cluster have >>>>>>>>> enough resource after allocating AM container, if not, that is the >>>>>>>>> problem. >>>>>>>>> >>>>>>>>> The problem not lies in dynamic allocation from my guess of your >>>>>>>>> description. I said I'm OK with min and max executors to the same >>>>>>>>> number. >>>>>>>>> >>>>>>>>> On Tue, Nov 24, 2015 at 11:54 AM, 谢廷稳 <xieting...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Saisai, >>>>>>>>>> I'm sorry for did not describe it clearly,YARN debug log said I >>>>>>>>>> have 50 executors,but ResourceManager showed that I only have 1 >>>>>>>>>> container >>>>>>>>>> for the AppMaster. >>>>>>>>>> >>>>>>>>>> I have checked YARN RM logs,after AppMaster changed state >>>>>>>>>> from ACCEPTED to RUNNING,it did not have log about this job any >>>>>>>>>> more.So,the >>>>>>>>>> problem is I did not have any executor but ExecutorAllocationManager >>>>>>>>>> think >>>>>>>>>> I have.Would you minding having a test in your cluster environment? >>>>>>>>>> Thanks, >>>>>>>>>> Weber >>>>>>>>>> >>>>>>>>>> 2015-11-24 11:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: >>>>>>>>>> >>>>>>>>>>> I think this behavior is expected, since you already have 50 >>>>>>>>>>> executors launched, so no need to acquire additional executors. You >>>>>>>>>>> change >>>>>>>>>>> is not solid, it is just hiding the log. >>>>>>>>>>> >>>>>>>>>>> Again I think you should check the logs of Yarn and Spark to see >>>>>>>>>>> if executors are started correctly. Why resource is still not >>>>>>>>>>> enough where >>>>>>>>>>> you already have 50 executors. >>>>>>>>>>> >>>>>>>>>>> On Tue, Nov 24, 2015 at 10:48 AM, 谢廷稳 <xieting...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi SaiSai, >>>>>>>>>>>> I have changed "if (numExecutorsTarget >= maxNumExecutors)" >>>>>>>>>>>> to "if (numExecutorsTarget > maxNumExecutors)" of the first line >>>>>>>>>>>> in the >>>>>>>>>>>> ExecutorAllocationManager#addExecutors() and it rans well. >>>>>>>>>>>> In my opinion,when I was set minExecutors equals >>>>>>>>>>>> maxExecutors,when the first time to add >>>>>>>>>>>> Executors,numExecutorsTarget >>>>>>>>>>>> equals maxNumExecutors and it repeat printe "DEBUG >>>>>>>>>>>> ExecutorAllocationManager: Not adding executors because our >>>>>>>>>>>> current target >>>>>>>>>>>> total is already 50 (limit 50)". >>>>>>>>>>>> Thanks >>>>>>>>>>>> Weber >>>>>>>>>>>> >>>>>>>>>>>> 2015-11-23 21:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com> >>>>>>>>>>>> : >>>>>>>>>>>> >>>>>>>>>>>>> Hi Tingwen, >>>>>>>>>>>>> >>>>>>>>>>>>> Would you minding sharing your changes in >>>>>>>>>>>>> ExecutorAllocationManager#addExecutors(). >>>>>>>>>>>>> >>>>>>>>>>>>> From my understanding and test, dynamic allocation can be >>>>>>>>>>>>> worked when you set the min to max number of executors to the >>>>>>>>>>>>> same number. >>>>>>>>>>>>> >>>>>>>>>>>>> Please check your Spark and Yarn log to make sure the >>>>>>>>>>>>> executors are correctly started, the warning log means currently >>>>>>>>>>>>> resource >>>>>>>>>>>>> is not enough to submit tasks. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> Saisai >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Nov 23, 2015 at 8:41 PM, 谢廷稳 <xieting...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>> I ran a SparkPi on YARN with Dynamic Allocation enabled and >>>>>>>>>>>>>> set spark.dynamicAllocation.maxExecutors equals >>>>>>>>>>>>>> spark.dynamicAllocation.minExecutors,then I submit an >>>>>>>>>>>>>> application using: >>>>>>>>>>>>>> ./bin/spark-submit --class org.apache.spark.examples.SparkPi >>>>>>>>>>>>>> --master yarn-cluster --driver-memory 4g --executor-memory 8g >>>>>>>>>>>>>> lib/spark-examples*.jar 200 >>>>>>>>>>>>>> >>>>>>>>>>>>>> then, this application was submitted successfully, but the >>>>>>>>>>>>>> AppMaster always saying “15/11/23 20:13:08 WARN >>>>>>>>>>>>>> cluster.YarnClusterScheduler: Initial job has not accepted any >>>>>>>>>>>>>> resources; >>>>>>>>>>>>>> check your cluster UI to ensure that workers are registered and >>>>>>>>>>>>>> have >>>>>>>>>>>>>> sufficient resources” >>>>>>>>>>>>>> and when I open DEBUG,I found “15/11/23 20:24:00 DEBUG >>>>>>>>>>>>>> ExecutorAllocationManager: Not adding executors because our >>>>>>>>>>>>>> current target >>>>>>>>>>>>>> total is already 50 (limit 50)” in the console. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have fixed it by modifying code in >>>>>>>>>>>>>> ExecutorAllocationManager.addExecutors,Does this a bug or it was >>>>>>>>>>>>>> designed >>>>>>>>>>>>>> that we can’t set maxExecutors equals minExecutors? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Weber >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >> >