Thank you very much, after change to newer version, it did work well!

2015-11-24 17:15 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:

> The document is right. Because of a bug introduce in
> https://issues.apache.org/jira/browse/SPARK-9092 which makes this
> configuration fail to work.
>
> It is fixed in https://issues.apache.org/jira/browse/SPARK-10790, you
> could change to newer version of Spark.
>
> On Tue, Nov 24, 2015 at 5:12 PM, 谢廷稳 <xieting...@gmail.com> wrote:
>
>> @Sab Thank you for your reply, but the cluster has 6 nodes which contain
>> 300 cores and Spark application did not request resource from YARN.
>>
>> @SaiSai I have ran it successful with "
>> spark.dynamicAllocation.initialExecutors"  equals 50, but in
>> http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation
>> it says that
>>
>> "spark.dynamicAllocation.initialExecutors" equals "
>> spark.dynamicAllocation.minExecutors". So, I think something was wrong,
>> did it?
>>
>> Thanks.
>>
>>
>>
>> 2015-11-24 16:47 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:
>>
>>> Did you set this configuration "spark.dynamicAllocation.initialExecutors"
>>> ?
>>>
>>> You can set spark.dynamicAllocation.initialExecutors 50 to take try
>>> again.
>>>
>>> I guess you might be hitting this issue since you're running 1.5.0,
>>> https://issues.apache.org/jira/browse/SPARK-9092. But it still cannot
>>> explain why 49 executors can be worked.
>>>
>>> On Tue, Nov 24, 2015 at 4:42 PM, Sabarish Sasidharan <
>>> sabarish.sasidha...@manthan.com> wrote:
>>>
>>>> If yarn has only 50 cores then it can support max 49 executors plus 1
>>>> driver application master.
>>>>
>>>> Regards
>>>> Sab
>>>> On 24-Nov-2015 1:58 pm, "谢廷稳" <xieting...@gmail.com> wrote:
>>>>
>>>>> OK, yarn.scheduler.maximum-allocation-mb is 16384.
>>>>>
>>>>> I have ran it again, the command to run it is:
>>>>> ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
>>>>> yarn-cluster -
>>>>> -driver-memory 4g  --executor-memory 8g lib/spark-examples*.jar 200
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> 15/11/24 16:15:56 INFO yarn.ApplicationMaster: Registered signal 
>>>>>> handlers for [TERM, HUP, INT]
>>>>>>
>>>>>> 15/11/24 16:15:57 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
>>>>>> appattempt_1447834709734_0120_000001
>>>>>>
>>>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing view acls to: 
>>>>>> hdfs-test
>>>>>>
>>>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing modify acls to: 
>>>>>> hdfs-test
>>>>>>
>>>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: SecurityManager: 
>>>>>> authentication disabled; ui acls disabled; users with view permissions: 
>>>>>> Set(hdfs-test); users with modify permissions: Set(hdfs-test)
>>>>>>
>>>>>> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Starting the user 
>>>>>> application in a separate Thread
>>>>>>
>>>>>> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Waiting for spark context 
>>>>>> initialization
>>>>>>
>>>>>> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Waiting for spark context 
>>>>>> initialization ...
>>>>>> 15/11/24 16:15:58 INFO spark.SparkContext: Running Spark version 1.5.0
>>>>>>
>>>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing view acls to: 
>>>>>> hdfs-test
>>>>>>
>>>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing modify acls to: 
>>>>>> hdfs-test
>>>>>>
>>>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: SecurityManager: 
>>>>>> authentication disabled; ui acls disabled; users with view permissions: 
>>>>>> Set(hdfs-test); users with modify permissions: Set(hdfs-test)
>>>>>> 15/11/24 16:15:58 INFO slf4j.Slf4jLogger: Slf4jLogger started
>>>>>> 15/11/24 16:15:59 INFO Remoting: Starting remoting
>>>>>>
>>>>>> 15/11/24 16:15:59 INFO Remoting: Remoting started; listening on 
>>>>>> addresses :[akka.tcp://sparkDriver@X.X.X.X
>>>>>> ]
>>>>>>
>>>>>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 
>>>>>> 'sparkDriver' on port 61904.
>>>>>> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering MapOutputTracker
>>>>>> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering BlockManagerMaster
>>>>>>
>>>>>> 15/11/24 16:15:59 INFO storage.DiskBlockManager: Created local directory 
>>>>>> at 
>>>>>> /data1/hadoop/nm-local-dir/usercache/hdfs-test/appcache/application_1447834709734_0120/blockmgr-33fbe6c4-5138-4eff-83b4-fb0c886667b7
>>>>>>
>>>>>> 15/11/24 16:15:59 INFO storage.MemoryStore: MemoryStore started with 
>>>>>> capacity 1966.1 MB
>>>>>>
>>>>>> 15/11/24 16:15:59 INFO spark.HttpFileServer: HTTP File server directory 
>>>>>> is 
>>>>>> /data1/hadoop/nm-local-dir/usercache/hdfs-test/appcache/application_1447834709734_0120/spark-fbbfa2bd-6d30-421e-a634-4546134b3b5f/httpd-e31d7b8e-ca8f-400e-8b4b-d2993fb6f1d1
>>>>>> 15/11/24 16:15:59 INFO spark.HttpServer: Starting HTTP Server
>>>>>> 15/11/24 16:15:59 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>>>>> 15/11/24 16:15:59 INFO server.AbstractConnector: Started
>>>>>> SocketConnector@0.0.0.0:14692
>>>>>>
>>>>>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 'HTTP 
>>>>>> file server' on port 14692.
>>>>>>
>>>>>> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering 
>>>>>> OutputCommitCoordinator
>>>>>>
>>>>>> 15/11/24 16:15:59 INFO ui.JettyUtils: Adding filter: 
>>>>>> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
>>>>>> 15/11/24 16:15:59 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>>>>> 15/11/24 16:15:59 INFO server.AbstractConnector: Started
>>>>>> SelectChannelConnector@0.0.0.0:15948
>>>>>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 
>>>>>> 'SparkUI' on port 15948.
>>>>>>
>>>>>> 15/11/24 16:15:59 INFO ui.SparkUI: Started SparkUI at X.X.X.X
>>>>>>
>>>>>> 15/11/24 16:15:59 INFO cluster.YarnClusterScheduler: Created 
>>>>>> YarnClusterScheduler
>>>>>>
>>>>>> 15/11/24 16:15:59 WARN metrics.MetricsSystem: Using default name 
>>>>>> DAGScheduler for source because
>>>>>> spark.app.id is not set.
>>>>>>
>>>>>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 
>>>>>> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41830.
>>>>>> 15/11/24 16:15:59 INFO netty.NettyBlockTransferService: Server created 
>>>>>> on 41830
>>>>>>
>>>>>> 15/11/24 16:15:59 INFO storage.BlockManagerMaster: Trying to register 
>>>>>> BlockManager
>>>>>>
>>>>>> 15/11/24 16:15:59 INFO storage.BlockManagerMasterEndpoint: Registering 
>>>>>> block manager X.X.X.X:41830 with 1966.1 MB RAM, BlockManagerId(driver, 
>>>>>> 10.12.30.2, 41830)
>>>>>>
>>>>>>
>>>>>> 15/11/24 16:15:59 INFO storage.BlockManagerMaster: Registered 
>>>>>> BlockManager
>>>>>> 15/11/24 16:16:00 INFO scheduler.EventLoggingListener: Logging events to 
>>>>>> hdfs:///tmp/latest-spark-events/application_1447834709734_0120_1
>>>>>>
>>>>>> 15/11/24 16:16:00 INFO 
>>>>>> cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster 
>>>>>> registered as 
>>>>>> AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/YarnAM#293602859])
>>>>>>
>>>>>> 15/11/24 16:16:00 INFO client.RMProxy: Connecting to ResourceManager at 
>>>>>> X.X.X.X
>>>>>>
>>>>>>
>>>>>> 15/11/24 16:16:00 INFO yarn.YarnRMClient: Registering the 
>>>>>> ApplicationMaster
>>>>>>
>>>>>> 15/11/24 16:16:00 INFO yarn.ApplicationMaster: Started progress reporter 
>>>>>> thread with (heartbeat : 3000, initial allocation : 200) intervals
>>>>>>
>>>>>> 15/11/24 16:16:29 INFO cluster.YarnClusterSchedulerBackend: 
>>>>>> SchedulerBackend is ready for scheduling beginning after waiting 
>>>>>> maxRegisteredResourcesWaitingTime: 30000(ms)
>>>>>>
>>>>>> 15/11/24 16:16:29 INFO cluster.YarnClusterScheduler: 
>>>>>> YarnClusterScheduler.postStartHook done
>>>>>>
>>>>>> 15/11/24 16:16:29 INFO spark.SparkContext: Starting job: reduce at 
>>>>>> SparkPi.scala:36
>>>>>>
>>>>>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Got job 0 (reduce at 
>>>>>> SparkPi.scala:36) with 200 output partitions
>>>>>>
>>>>>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Final stage: ResultStage 
>>>>>> 0(reduce at SparkPi.scala:36)
>>>>>>
>>>>>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Parents of final stage: 
>>>>>> List()
>>>>>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Missing parents: List()
>>>>>>
>>>>>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Submitting ResultStage 0 
>>>>>> (MapPartitionsRDD[1] at map at SparkPi.scala:32), which has no missing 
>>>>>> parents
>>>>>>
>>>>>> 15/11/24 16:16:30 INFO storage.MemoryStore: ensureFreeSpace(1888) called 
>>>>>> with curMem=0, maxMem=2061647216
>>>>>>
>>>>>> 15/11/24 16:16:30 INFO storage.MemoryStore: Block broadcast_0 stored as 
>>>>>> values in memory (estimated size 1888.0 B, free 1966.1 MB)
>>>>>> 15/11/24 16:16:30 INFO storage.MemoryStore: ensureFreeSpace(1202) called 
>>>>>> with curMem=1888, maxMem=2061647216
>>>>>>
>>>>>> 15/11/24 16:16:30 INFO storage.MemoryStore: Block broadcast_0_piece0 
>>>>>> stored as bytes in memory (estimated size 1202.0 B, free 1966.1 MB)
>>>>>>
>>>>>> 15/11/24 16:16:30 INFO storage.BlockManagerInfo: Added 
>>>>>> broadcast_0_piece0 in memory on X.X.X.X:41830 (size: 1202.0 B, free: 
>>>>>> 1966.1 MB)
>>>>>>
>>>>>>
>>>>>> 15/11/24 16:16:30 INFO spark.SparkContext: Created broadcast 0 from 
>>>>>> broadcast at DAGScheduler.scala:861
>>>>>>
>>>>>> 15/11/24 16:16:30 INFO scheduler.DAGScheduler: Submitting 200 missing 
>>>>>> tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32)
>>>>>>
>>>>>> 15/11/24 16:16:30 INFO cluster.YarnClusterScheduler: Adding task set 0.0 
>>>>>> with 200 tasks
>>>>>>
>>>>>> 15/11/24 16:16:45 WARN cluster.YarnClusterScheduler: Initial job has not 
>>>>>> accepted any resources; check your cluster UI to ensure that workers are 
>>>>>> registered and have sufficient resources
>>>>>>
>>>>>> 15/11/24 16:17:00 WARN cluster.YarnClusterScheduler: Initial job has not 
>>>>>> accepted any resources; check your cluster UI to ensure that workers are 
>>>>>> registered and have sufficient resources
>>>>>>
>>>>>> 15/11/24 16:17:15 WARN cluster.YarnClusterScheduler: Initial job has not 
>>>>>> accepted any resources; check your cluster UI to ensure that workers are 
>>>>>> registered and have sufficient resources
>>>>>>
>>>>>> 15/11/24 16:17:30 WARN cluster.YarnClusterScheduler: Initial job has not 
>>>>>> accepted any resources; check your cluster UI to ensure that workers are 
>>>>>> registered and have sufficient resources
>>>>>>
>>>>>> 15/11/24 16:17:45 WARN cluster.YarnClusterScheduler: Initial job has not 
>>>>>> accepted any resources; check your cluster UI to ensure that workers are 
>>>>>> registered and have sufficient resources
>>>>>>
>>>>>> 15/11/24 16:18:00 WARN cluster.YarnClusterScheduler: Initial job has not 
>>>>>> accepted any resources; check your cluster UI to ensure that workers are 
>>>>>> registered and have sufficient resources
>>>>>>
>>>>>>
>>>>> 2015-11-24 15:14 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:
>>>>>
>>>>>> What about this configure in Yarn "
>>>>>> yarn.scheduler.maximum-allocation-mb"
>>>>>>
>>>>>> I'm curious why 49 executors can be worked, but 50 failed. Would you
>>>>>> provide your application master log, if container request is issued, 
>>>>>> there
>>>>>> will be log like:
>>>>>>
>>>>>> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Will request 2 executor
>>>>>> containers, each with 1 cores and 1408 MB memory including 384 MB 
>>>>>> overhead
>>>>>> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Container request (host:
>>>>>> Any, capability: <memory:1408, vCores:1>)
>>>>>> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Container request (host:
>>>>>> Any, capability: <memory:1408, vCores:1>)
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 24, 2015 at 2:56 PM, 谢廷稳 <xieting...@gmail.com> wrote:
>>>>>>
>>>>>>> OK,  the YARN conf will be list in the following:
>>>>>>>
>>>>>>> yarn.nodemanager.resource.memory-mb:115200
>>>>>>> yarn.nodemanager.resource.cpu-vcores:50
>>>>>>>
>>>>>>> I think the YARN resource is sufficient. In the previous letter I
>>>>>>> have said that I think Spark application didn't request resources
>>>>>>> from YARN.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> 2015-11-24 14:30 GMT+08:00 cherrywayb...@gmail.com <
>>>>>>> cherrywayb...@gmail.com>:
>>>>>>>
>>>>>>>> can you show your parameter values in your env ?
>>>>>>>>     yarn.nodemanager.resource.cpu-vcores
>>>>>>>>     yarn.nodemanager.resource.memory-mb
>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>>> cherrywayb...@gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>>> *From:* 谢廷稳 <xieting...@gmail.com>
>>>>>>>> *Date:* 2015-11-24 12:13
>>>>>>>> *To:* Saisai Shao <sai.sai.s...@gmail.com>
>>>>>>>> *CC:* spark users <user@spark.apache.org>
>>>>>>>> *Subject:* Re: A Problem About Running Spark 1.5 on YARN with
>>>>>>>> Dynamic Alloction
>>>>>>>> OK,the YARN cluster was used by myself,it have 6 node witch can run
>>>>>>>> over 100 executor, and the YARN RM logs showed that the Spark 
>>>>>>>> application
>>>>>>>> did not requested resource from it.
>>>>>>>>
>>>>>>>> Is this a bug? Should I create a JIRA for this problem?
>>>>>>>>
>>>>>>>> 2015-11-24 12:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:
>>>>>>>>
>>>>>>>>> OK, so this looks like your Yarn cluster  does not allocate
>>>>>>>>> containers which you supposed should be 50. Does the yarn cluster have
>>>>>>>>> enough resource after allocating AM container, if not, that is the 
>>>>>>>>> problem.
>>>>>>>>>
>>>>>>>>> The problem not lies in dynamic allocation from my guess of your
>>>>>>>>> description. I said I'm OK with min and max executors to the same 
>>>>>>>>> number.
>>>>>>>>>
>>>>>>>>> On Tue, Nov 24, 2015 at 11:54 AM, 谢廷稳 <xieting...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Saisai,
>>>>>>>>>> I'm sorry for did not describe it clearly,YARN debug log said I
>>>>>>>>>> have 50 executors,but ResourceManager showed that I only have 1 
>>>>>>>>>> container
>>>>>>>>>> for the AppMaster.
>>>>>>>>>>
>>>>>>>>>> I have checked YARN RM logs,after AppMaster changed state
>>>>>>>>>> from ACCEPTED to RUNNING,it did not have log about this job any 
>>>>>>>>>> more.So,the
>>>>>>>>>> problem is I did not have any executor but ExecutorAllocationManager 
>>>>>>>>>> think
>>>>>>>>>> I have.Would you minding having a test in your cluster environment?
>>>>>>>>>> Thanks,
>>>>>>>>>> Weber
>>>>>>>>>>
>>>>>>>>>> 2015-11-24 11:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> I think this behavior is expected, since you already have 50
>>>>>>>>>>> executors launched, so no need to acquire additional executors. You 
>>>>>>>>>>> change
>>>>>>>>>>> is not solid, it is just hiding the log.
>>>>>>>>>>>
>>>>>>>>>>> Again I think you should check the logs of Yarn and Spark to see
>>>>>>>>>>> if executors are started correctly. Why resource is still not 
>>>>>>>>>>> enough where
>>>>>>>>>>> you already have 50 executors.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Nov 24, 2015 at 10:48 AM, 谢廷稳 <xieting...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi SaiSai,
>>>>>>>>>>>> I have changed  "if (numExecutorsTarget >= maxNumExecutors)"
>>>>>>>>>>>>  to "if (numExecutorsTarget > maxNumExecutors)" of the first line 
>>>>>>>>>>>> in the
>>>>>>>>>>>> ExecutorAllocationManager#addExecutors() and it rans well.
>>>>>>>>>>>> In my opinion,when I was set minExecutors equals
>>>>>>>>>>>> maxExecutors,when the first time to add 
>>>>>>>>>>>> Executors,numExecutorsTarget
>>>>>>>>>>>> equals maxNumExecutors and it repeat printe "DEBUG
>>>>>>>>>>>> ExecutorAllocationManager: Not adding executors because our 
>>>>>>>>>>>> current target
>>>>>>>>>>>> total is already 50 (limit 50)".
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Weber
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-11-23 21:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>
>>>>>>>>>>>> :
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Tingwen,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Would you minding sharing your changes in
>>>>>>>>>>>>> ExecutorAllocationManager#addExecutors().
>>>>>>>>>>>>>
>>>>>>>>>>>>> From my understanding and test, dynamic allocation can be
>>>>>>>>>>>>> worked when you set the min to max number of executors to the 
>>>>>>>>>>>>> same number.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please check your Spark and Yarn log to make sure the
>>>>>>>>>>>>> executors are correctly started, the warning log means currently 
>>>>>>>>>>>>> resource
>>>>>>>>>>>>> is not enough to submit tasks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Saisai
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Nov 23, 2015 at 8:41 PM, 谢廷稳 <xieting...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>> I ran a SparkPi on YARN with Dynamic Allocation enabled and
>>>>>>>>>>>>>> set spark.dynamicAllocation.maxExecutors equals
>>>>>>>>>>>>>> spark.dynamicAllocation.minExecutors,then I submit an
>>>>>>>>>>>>>> application using:
>>>>>>>>>>>>>> ./bin/spark-submit --class org.apache.spark.examples.SparkPi
>>>>>>>>>>>>>> --master yarn-cluster --driver-memory 4g --executor-memory 8g
>>>>>>>>>>>>>> lib/spark-examples*.jar 200
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> then, this application was submitted successfully, but the
>>>>>>>>>>>>>> AppMaster always saying “15/11/23 20:13:08 WARN
>>>>>>>>>>>>>> cluster.YarnClusterScheduler: Initial job has not accepted any 
>>>>>>>>>>>>>> resources;
>>>>>>>>>>>>>> check your cluster UI to ensure that workers are registered and 
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>> sufficient resources”
>>>>>>>>>>>>>> and when I open DEBUG,I found “15/11/23 20:24:00 DEBUG
>>>>>>>>>>>>>> ExecutorAllocationManager: Not adding executors because our 
>>>>>>>>>>>>>> current target
>>>>>>>>>>>>>> total is already 50 (limit 50)” in the console.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have fixed it by modifying code in
>>>>>>>>>>>>>> ExecutorAllocationManager.addExecutors,Does this a bug or it was 
>>>>>>>>>>>>>> designed
>>>>>>>>>>>>>> that we can’t set maxExecutors equals minExecutors?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Weber
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Reply via email to