Re: Re: A Problem About Running Spark 1.5 on YARN with Dynamic Alloction

谢廷稳 Tue, 24 Nov 2015 00:29:01 -0800

OK, yarn.scheduler.maximum-allocation-mb is 16384.

I have ran it again, the command to run it is:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
yarn-cluster -
-driver-memory 4g  --executor-memory 8g lib/spark-examples*.jar 200




>
>
> 15/11/24 16:15:56 INFO yarn.ApplicationMaster: Registered signal handlers for 
> [TERM, HUP, INT]
>
> 15/11/24 16:15:57 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
> appattempt_1447834709734_0120_000001
>
> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing view acls to: hdfs-test
>
> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing modify acls to: 
> hdfs-test
>
> 15/11/24 16:15:58 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(hdfs-test); 
> users with modify permissions: Set(hdfs-test)
>
> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Starting the user application 
> in a separate Thread
>
> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Waiting for spark context 
> initialization
>
> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Waiting for spark context 
> initialization ...
> 15/11/24 16:15:58 INFO spark.SparkContext: Running Spark version 1.5.0
>
> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing view acls to: hdfs-test
>
> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing modify acls to: 
> hdfs-test
>
> 15/11/24 16:15:58 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(hdfs-test); 
> users with modify permissions: Set(hdfs-test)
> 15/11/24 16:15:58 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 15/11/24 16:15:59 INFO Remoting: Starting remoting
>
> 15/11/24 16:15:59 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkDriver@X.X.X.X
> ]
>
> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 'sparkDriver' 
> on port 61904.
> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering MapOutputTracker
> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering BlockManagerMaster
>
> 15/11/24 16:15:59 INFO storage.DiskBlockManager: Created local directory at 
> /data1/hadoop/nm-local-dir/usercache/hdfs-test/appcache/application_1447834709734_0120/blockmgr-33fbe6c4-5138-4eff-83b4-fb0c886667b7
>
> 15/11/24 16:15:59 INFO storage.MemoryStore: MemoryStore started with capacity 
> 1966.1 MB
>
> 15/11/24 16:15:59 INFO spark.HttpFileServer: HTTP File server directory is 
> /data1/hadoop/nm-local-dir/usercache/hdfs-test/appcache/application_1447834709734_0120/spark-fbbfa2bd-6d30-421e-a634-4546134b3b5f/httpd-e31d7b8e-ca8f-400e-8b4b-d2993fb6f1d1
> 15/11/24 16:15:59 INFO spark.HttpServer: Starting HTTP Server
> 15/11/24 16:15:59 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/11/24 16:15:59 INFO server.AbstractConnector: Started
> SocketConnector@0.0.0.0:14692
>
> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 'HTTP file 
> server' on port 14692.
> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator
>
> 15/11/24 16:15:59 INFO ui.JettyUtils: Adding filter: 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
> 15/11/24 16:15:59 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/11/24 16:15:59 INFO server.AbstractConnector: Started
> SelectChannelConnector@0.0.0.0:15948
> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 'SparkUI' on 
> port 15948.
>
> 15/11/24 16:15:59 INFO ui.SparkUI: Started SparkUI at X.X.X.X
>
> 15/11/24 16:15:59 INFO cluster.YarnClusterScheduler: Created 
> YarnClusterScheduler
>
> 15/11/24 16:15:59 WARN metrics.MetricsSystem: Using default name DAGScheduler 
> for source because
> spark.app.id is not set.
>
> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41830.
> 15/11/24 16:15:59 INFO netty.NettyBlockTransferService: Server created on 
> 41830
>
> 15/11/24 16:15:59 INFO storage.BlockManagerMaster: Trying to register 
> BlockManager
>
> 15/11/24 16:15:59 INFO storage.BlockManagerMasterEndpoint: Registering block 
> manager X.X.X.X:41830 with 1966.1 MB RAM, BlockManagerId(driver, 10.12.30.2, 
> 41830)
>
> 15/11/24 16:15:59 INFO storage.BlockManagerMaster: Registered BlockManager
> 15/11/24 16:16:00 INFO scheduler.EventLoggingListener: Logging events to 
> hdfs:///tmp/latest-spark-events/application_1447834709734_0120_1
>
> 15/11/24 16:16:00 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: 
> ApplicationMaster registered as 
> AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/YarnAM#293602859])
>
> 15/11/24 16:16:00 INFO client.RMProxy: Connecting to ResourceManager at 
> X.X.X.X
>
> 15/11/24 16:16:00 INFO yarn.YarnRMClient: Registering the ApplicationMaster
>
> 15/11/24 16:16:00 INFO yarn.ApplicationMaster: Started progress reporter 
> thread with (heartbeat : 3000, initial allocation : 200) intervals
>
> 15/11/24 16:16:29 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend 
> is ready for scheduling beginning after waiting 
> maxRegisteredResourcesWaitingTime: 30000(ms)
>
> 15/11/24 16:16:29 INFO cluster.YarnClusterScheduler: 
> YarnClusterScheduler.postStartHook done
>
> 15/11/24 16:16:29 INFO spark.SparkContext: Starting job: reduce at 
> SparkPi.scala:36
>
> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Got job 0 (reduce at 
> SparkPi.scala:36) with 200 output partitions
>
> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Final stage: ResultStage 
> 0(reduce at SparkPi.scala:36)
>
> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Parents of final stage: List()
> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Missing parents: List()
>
> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Submitting ResultStage 0 
> (MapPartitionsRDD[1] at map at SparkPi.scala:32), which has no missing parents
>
> 15/11/24 16:16:30 INFO storage.MemoryStore: ensureFreeSpace(1888) called with 
> curMem=0, maxMem=2061647216
>
> 15/11/24 16:16:30 INFO storage.MemoryStore: Block broadcast_0 stored as 
> values in memory (estimated size 1888.0 B, free 1966.1 MB)
> 15/11/24 16:16:30 INFO storage.MemoryStore: ensureFreeSpace(1202) called with 
> curMem=1888, maxMem=2061647216
>
> 15/11/24 16:16:30 INFO storage.MemoryStore: Block broadcast_0_piece0 stored 
> as bytes in memory (estimated size 1202.0 B, free 1966.1 MB)
>
> 15/11/24 16:16:30 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
> memory on X.X.X.X:41830 (size: 1202.0 B, free: 1966.1 MB)
>
>
> 15/11/24 16:16:30 INFO spark.SparkContext: Created broadcast 0 from broadcast 
> at DAGScheduler.scala:861
>
> 15/11/24 16:16:30 INFO scheduler.DAGScheduler: Submitting 200 missing tasks 
> from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32)
>
> 15/11/24 16:16:30 INFO cluster.YarnClusterScheduler: Adding task set 0.0 with 
> 200 tasks
>
> 15/11/24 16:16:45 WARN cluster.YarnClusterScheduler: Initial job has not 
> accepted any resources; check your cluster UI to ensure that workers are 
> registered and have sufficient resources
>
> 15/11/24 16:17:00 WARN cluster.YarnClusterScheduler: Initial job has not 
> accepted any resources; check your cluster UI to ensure that workers are 
> registered and have sufficient resources
>
> 15/11/24 16:17:15 WARN cluster.YarnClusterScheduler: Initial job has not 
> accepted any resources; check your cluster UI to ensure that workers are 
> registered and have sufficient resources
>
> 15/11/24 16:17:30 WARN cluster.YarnClusterScheduler: Initial job has not 
> accepted any resources; check your cluster UI to ensure that workers are 
> registered and have sufficient resources
>
> 15/11/24 16:17:45 WARN cluster.YarnClusterScheduler: Initial job has not 
> accepted any resources; check your cluster UI to ensure that workers are 
> registered and have sufficient resources
>
> 15/11/24 16:18:00 WARN cluster.YarnClusterScheduler: Initial job has not 
> accepted any resources; check your cluster UI to ensure that workers are 
> registered and have sufficient resources
>
>
2015-11-24 15:14 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:

> What about this configure in Yarn "yarn.scheduler.maximum-allocation-mb"
>
> I'm curious why 49 executors can be worked, but 50 failed. Would you
> provide your application master log, if container request is issued, there
> will be log like:
>
> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Will request 2 executor
> containers, each with 1 cores and 1408 MB memory including 384 MB overhead
> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Container request (host: Any,
> capability: <memory:1408, vCores:1>)
> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Container request (host: Any,
> capability: <memory:1408, vCores:1>)
>
>
>
> On Tue, Nov 24, 2015 at 2:56 PM, 谢廷稳 <xieting...@gmail.com> wrote:
>
>> OK,  the YARN conf will be list in the following:
>>
>> yarn.nodemanager.resource.memory-mb:115200
>> yarn.nodemanager.resource.cpu-vcores:50
>>
>> I think the YARN resource is sufficient. In the previous letter I have
>> said that I think Spark application didn't request resources from YARN.
>>
>> Thanks
>>
>> 2015-11-24 14:30 GMT+08:00 cherrywayb...@gmail.com <
>> cherrywayb...@gmail.com>:
>>
>>> can you show your parameter values in your env ?
>>>     yarn.nodemanager.resource.cpu-vcores
>>>     yarn.nodemanager.resource.memory-mb
>>>
>>> ------------------------------
>>> cherrywayb...@gmail.com
>>>
>>>
>>> *From:* 谢廷稳 <xieting...@gmail.com>
>>> *Date:* 2015-11-24 12:13
>>> *To:* Saisai Shao <sai.sai.s...@gmail.com>
>>> *CC:* spark users <user@spark.apache.org>
>>> *Subject:* Re: A Problem About Running Spark 1.5 on YARN with Dynamic
>>> Alloction
>>> OK,the YARN cluster was used by myself,it have 6 node witch can run over
>>> 100 executor, and the YARN RM logs showed that the Spark application did
>>> not requested resource from it.
>>>
>>> Is this a bug? Should I create a JIRA for this problem?
>>>
>>> 2015-11-24 12:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:
>>>
>>>> OK, so this looks like your Yarn cluster  does not allocate containers
>>>> which you supposed should be 50. Does the yarn cluster have enough resource
>>>> after allocating AM container, if not, that is the problem.
>>>>
>>>> The problem not lies in dynamic allocation from my guess of your
>>>> description. I said I'm OK with min and max executors to the same number.
>>>>
>>>> On Tue, Nov 24, 2015 at 11:54 AM, 谢廷稳 <xieting...@gmail.com> wrote:
>>>>
>>>>> Hi Saisai,
>>>>> I'm sorry for did not describe it clearly,YARN debug log said I have
>>>>> 50 executors,but ResourceManager showed that I only have 1 container for
>>>>> the AppMaster.
>>>>>
>>>>> I have checked YARN RM logs,after AppMaster changed state
>>>>> from ACCEPTED to RUNNING,it did not have log about this job any 
>>>>> more.So,the
>>>>> problem is I did not have any executor but ExecutorAllocationManager think
>>>>> I have.Would you minding having a test in your cluster environment?
>>>>> Thanks,
>>>>> Weber
>>>>>
>>>>> 2015-11-24 11:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:
>>>>>
>>>>>> I think this behavior is expected, since you already have 50
>>>>>> executors launched, so no need to acquire additional executors. You 
>>>>>> change
>>>>>> is not solid, it is just hiding the log.
>>>>>>
>>>>>> Again I think you should check the logs of Yarn and Spark to see if
>>>>>> executors are started correctly. Why resource is still not enough where 
>>>>>> you
>>>>>> already have 50 executors.
>>>>>>
>>>>>> On Tue, Nov 24, 2015 at 10:48 AM, 谢廷稳 <xieting...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi SaiSai,
>>>>>>> I have changed  "if (numExecutorsTarget >= maxNumExecutors)"  to "if
>>>>>>> (numExecutorsTarget > maxNumExecutors)" of the first line in the
>>>>>>> ExecutorAllocationManager#addExecutors() and it rans well.
>>>>>>> In my opinion,when I was set minExecutors equals maxExecutors,when
>>>>>>> the first time to add Executors,numExecutorsTarget equals 
>>>>>>> maxNumExecutors
>>>>>>> and it repeat printe "DEBUG ExecutorAllocationManager: Not adding
>>>>>>> executors because our current target total is already 50 (limit 50)
>>>>>>> ".
>>>>>>> Thanks
>>>>>>> Weber
>>>>>>>
>>>>>>> 2015-11-23 21:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:
>>>>>>>
>>>>>>>> Hi Tingwen,
>>>>>>>>
>>>>>>>> Would you minding sharing your changes in
>>>>>>>> ExecutorAllocationManager#addExecutors().
>>>>>>>>
>>>>>>>> From my understanding and test, dynamic allocation can be worked
>>>>>>>> when you set the min to max number of executors to the same number.
>>>>>>>>
>>>>>>>> Please check your Spark and Yarn log to make sure the executors are
>>>>>>>> correctly started, the warning log means currently resource is not 
>>>>>>>> enough
>>>>>>>> to submit tasks.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Saisai
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Nov 23, 2015 at 8:41 PM, 谢廷稳 <xieting...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>> I ran a SparkPi on YARN with Dynamic Allocation enabled and set 
>>>>>>>>> spark.dynamicAllocation.maxExecutors
>>>>>>>>> equals
>>>>>>>>> spark.dynamicAllocation.minExecutors,then I submit an application
>>>>>>>>> using:
>>>>>>>>> ./bin/spark-submit --class org.apache.spark.examples.SparkPi
>>>>>>>>> --master yarn-cluster --driver-memory 4g --executor-memory 8g
>>>>>>>>> lib/spark-examples*.jar 200
>>>>>>>>>
>>>>>>>>> then, this application was submitted successfully, but the
>>>>>>>>> AppMaster always saying “15/11/23 20:13:08 WARN
>>>>>>>>> cluster.YarnClusterScheduler: Initial job has not accepted any 
>>>>>>>>> resources;
>>>>>>>>> check your cluster UI to ensure that workers are registered and have
>>>>>>>>> sufficient resources”
>>>>>>>>> and when I open DEBUG,I found “15/11/23 20:24:00 DEBUG
>>>>>>>>> ExecutorAllocationManager: Not adding executors because our current 
>>>>>>>>> target
>>>>>>>>> total is already 50 (limit 50)” in the console.
>>>>>>>>>
>>>>>>>>> I have fixed it by modifying code in
>>>>>>>>> ExecutorAllocationManager.addExecutors,Does this a bug or it was 
>>>>>>>>> designed
>>>>>>>>> that we can’t set maxExecutors equals minExecutors?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Weber
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Re: A Problem About Running Spark 1.5 on YARN with Dynamic Alloction

Reply via email to