[ 
https://issues.apache.org/jira/browse/YARN-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jin Gary resolved YARN-10404.
-----------------------------
    Resolution: Duplicate

> ResourseManager use HA mode both in Standby
> -------------------------------------------
>
>                 Key: YARN-10404
>                 URL: https://issues.apache.org/jira/browse/YARN-10404
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.1.1
>            Reporter: Jin Gary
>            Priority: Major
>         Attachments: image-2020-08-24-11-07-50-899.png
>
>
> our yarn use capacity scheduler, config is 
> {code:java}
> yarn.scheduler.capacity.maximum-am-resource-percent=0.2
> yarn.scheduler.capacity.maximum-applications=10000
> yarn.scheduler.capacity.node-locality-delay=20
> yarn.scheduler.capacity.queue-mappings=u:username:queuename,...
> yarn.scheduler.capacity.queue-mappings-override.enable=true
> yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
> yarn.scheduler.capacity.root.acl_administer_queue=*
> yarn.scheduler.capacity.root.acl_submit_applications=*
> yarn.scheduler.capacity.root.capacity=100
> yarn.scheduler.capacity.root.default.acl_submit_applications=*
> yarn.scheduler.capacity.root.default.capacity=1
> yarn.scheduler.capacity.root.default.maximum-am-resource-percent=1
> yarn.scheduler.capacity.root.default.maximum-applications=10
> yarn.scheduler.capacity.root.default.maximum-capacity=30
> yarn.scheduler.capacity.root.default.minimum-user-limit-percent=100
> yarn.scheduler.capacity.root.default.ordering-policy=fair
> yarn.scheduler.capacity.root.default.ordering-policy.fair.enable-size-based-weight=false
> yarn.scheduler.capacity.root.default.priority=0
> yarn.scheduler.capacity.root.default.state=RUNNING
> yarn.scheduler.capacity.root.default.user-limit-factor=10
> yarn.scheduler.capacity.root.developer.acl_administer_queue=*
> yarn.scheduler.capacity.root.developer.acl_submit_applications=*
> yarn.scheduler.capacity.root.developer.capacity=7
> yarn.scheduler.capacity.root.developer.developer_data.acl_administer_queue=*
> yarn.scheduler.capacity.root.developer.developer_data.acl_submit_applications=*
> yarn.scheduler.capacity.root.developer.developer_data.capacity=50
> yarn.scheduler.capacity.root.developer.developer_data.maximum-am-resource-percent=0.8
> yarn.scheduler.capacity.root.developer.developer_data.maximum-capacity=70
> yarn.scheduler.capacity.root.developer.developer_data.minimum-user-limit-percent=30
> yarn.scheduler.capacity.root.developer.developer_data.ordering-policy=fair
> yarn.scheduler.capacity.root.developer.developer_data.ordering-policy.fair.enable-size-based-weight=false
> yarn.scheduler.capacity.root.developer.developer_data.priority=20
> yarn.scheduler.capacity.root.developer.developer_data.state=RUNNING
> yarn.scheduler.capacity.root.developer.developer_data.user-limit-factor=5
> yarn.scheduler.capacity.root.developer.developer_recsys.acl_administer_queue=*
> yarn.scheduler.capacity.root.developer.developer_recsys.acl_submit_applications=*
> yarn.scheduler.capacity.root.developer.developer_recsys.capacity=50
> yarn.scheduler.capacity.root.developer.developer_recsys.maximum-am-resource-percent=0.8
> yarn.scheduler.capacity.root.developer.developer_recsys.maximum-capacity=70
> yarn.scheduler.capacity.root.developer.developer_recsys.minimum-user-limit-percent=30
> yarn.scheduler.capacity.root.developer.developer_recsys.ordering-policy=fair
> yarn.scheduler.capacity.root.developer.developer_recsys.ordering-policy.fair.enable-size-based-weight=false
> yarn.scheduler.capacity.root.developer.developer_recsys.priority=20
> yarn.scheduler.capacity.root.developer.developer_recsys.state=RUNNING
> yarn.scheduler.capacity.root.developer.developer_recsys.user-limit-factor=5
> yarn.scheduler.capacity.root.developer.maximum-am-resource-percent=1
> yarn.scheduler.capacity.root.developer.maximum-capacity=40
> yarn.scheduler.capacity.root.developer.minimum-user-limit-percent=30
> yarn.scheduler.capacity.root.developer.ordering-policy=priority-utilization
> yarn.scheduler.capacity.root.developer.priority=20
> yarn.scheduler.capacity.root.developer.queues=developer_data,developer_recsys
> yarn.scheduler.capacity.root.developer.state=RUNNING
> yarn.scheduler.capacity.root.developer.user-limit-factor=3
> yarn.scheduler.capacity.root.olap.acl_administer_queue=*
> yarn.scheduler.capacity.root.olap.acl_submit_applications=*
> yarn.scheduler.capacity.root.olap.capacity=5
> yarn.scheduler.capacity.root.olap.maximum-am-resource-percent=1
> yarn.scheduler.capacity.root.olap.maximum-capacity=30
> yarn.scheduler.capacity.root.olap.minimum-user-limit-percent=100
> yarn.scheduler.capacity.root.olap.ordering-policy=fifo
> yarn.scheduler.capacity.root.olap.priority=50
> yarn.scheduler.capacity.root.olap.state=RUNNING
> yarn.scheduler.capacity.root.olap.user-limit-factor=3
> yarn.scheduler.capacity.root.ordering-policy=priority-utilization
> yarn.scheduler.capacity.root.pipeline.acl_administer_queue=*
> yarn.scheduler.capacity.root.pipeline.acl_submit_applications=*
> yarn.scheduler.capacity.root.pipeline.capacity=30
> yarn.scheduler.capacity.root.pipeline.maximum-allocation-mb=21504
> yarn.scheduler.capacity.root.pipeline.maximum-capacity=30
> yarn.scheduler.capacity.root.pipeline.minimum-user-limit-percent=100
> yarn.scheduler.capacity.root.pipeline.ordering-policy=fifo
> yarn.scheduler.capacity.root.pipeline.priority=60
> yarn.scheduler.capacity.root.pipeline.state=RUNNING
> yarn.scheduler.capacity.root.pipeline.user-limit-factor=1
> yarn.scheduler.capacity.root.priority=0
> yarn.scheduler.capacity.root.queues=default,developer,pipeline,task,olap,realtime,yarn-system
> yarn.scheduler.capacity.root.realtime.acl_administer_queue=*
> yarn.scheduler.capacity.root.realtime.acl_submit_applications=*
> yarn.scheduler.capacity.root.realtime.capacity=6
> yarn.scheduler.capacity.root.realtime.maximum-capacity=6
> yarn.scheduler.capacity.root.realtime.minimum-user-limit-percent=100
> yarn.scheduler.capacity.root.realtime.ordering-policy=fifo
> yarn.scheduler.capacity.root.realtime.priority=60
> yarn.scheduler.capacity.root.realtime.state=RUNNING
> yarn.scheduler.capacity.root.realtime.user-limit-factor=1
> yarn.scheduler.capacity.root.task.acl_administer_queue=*
> yarn.scheduler.capacity.root.task.acl_submit_applications=*
> yarn.scheduler.capacity.root.task.capacity=51
> yarn.scheduler.capacity.root.task.maximum-am-resource-percent=1
> yarn.scheduler.capacity.root.task.maximum-capacity=59
> yarn.scheduler.capacity.root.task.minimum-user-limit-percent=60
> yarn.scheduler.capacity.root.task.ordering-policy=priority-utilization
> yarn.scheduler.capacity.root.task.priority=40
> yarn.scheduler.capacity.root.task.queues=task_data,task_recsys
> yarn.scheduler.capacity.root.task.state=RUNNING
> yarn.scheduler.capacity.root.task.task_data.acl_administer_queue=*
> yarn.scheduler.capacity.root.task.task_data.acl_submit_applications=*
> yarn.scheduler.capacity.root.task.task_data.capacity=70
> yarn.scheduler.capacity.root.task.task_data.maximum-am-resource-percent=0.9
> yarn.scheduler.capacity.root.task.task_data.maximum-capacity=80
> yarn.scheduler.capacity.root.task.task_data.minimum-user-limit-percent=90
> yarn.scheduler.capacity.root.task.task_data.ordering-policy=fifo
> yarn.scheduler.capacity.root.task.task_data.priority=40
> yarn.scheduler.capacity.root.task.task_data.state=RUNNING
> yarn.scheduler.capacity.root.task.task_data.user-limit-factor=3
> yarn.scheduler.capacity.root.task.task_recsys.acl_administer_queue=*
> yarn.scheduler.capacity.root.task.task_recsys.acl_submit_applications=*
> yarn.scheduler.capacity.root.task.task_recsys.capacity=30
> yarn.scheduler.capacity.root.task.task_recsys.maximum-am-resource-percent=0.9
> yarn.scheduler.capacity.root.task.task_recsys.maximum-capacity=50
> yarn.scheduler.capacity.root.task.task_recsys.minimum-user-limit-percent=100
> yarn.scheduler.capacity.root.task.task_recsys.ordering-policy=fifo
> yarn.scheduler.capacity.root.task.task_recsys.priority=40
> yarn.scheduler.capacity.root.task.task_recsys.state=RUNNING
> yarn.scheduler.capacity.root.task.task_recsys.user-limit-factor=3
> yarn.scheduler.capacity.root.task.user-limit-factor=2
> yarn.scheduler.capacity.root.yarn-system.acl_administer_queue=yarn-ats
> yarn.scheduler.capacity.root.yarn-system.acl_submit_applications=yarn-ats
> yarn.scheduler.capacity.root.yarn-system.capacity=0
> yarn.scheduler.capacity.root.yarn-system.default-application-lifetime=-1
> yarn.scheduler.capacity.root.yarn-system.disable_preemption=true
> yarn.scheduler.capacity.root.yarn-system.maximum-am-resource-percent=0.5
> yarn.scheduler.capacity.root.yarn-system.maximum-application-lifetime=-1
> yarn.scheduler.capacity.root.yarn-system.maximum-capacity=100
> yarn.scheduler.capacity.root.yarn-system.minimum-user-limit-percent=100
> yarn.scheduler.capacity.root.yarn-system.ordering-policy=fifo
> yarn.scheduler.capacity.root.yarn-system.priority=32768
> yarn.scheduler.capacity.root.yarn-system.state=RUNNING
> yarn.scheduler.capacity.root.yarn-system.user-limit-factor=1
> {code}
> As you can see, preemption is enabled between the queues.
> Now there are two faults:
>  # active rm will enter standby mode during operation, but standby rm not 
> automatically switch to active, and two rm logs frequently print abnormal 
> information: 
> {panel}
> 2020-08-18 22:35:19,451 WARN rmapp.RMAppImpl (RMAppImpl.java:<init>(473)) - 
> The specific max attempts: 0 for application: 2769 is invalid, because it is 
> out of the range [1, 3]. Use the global max attempts instead.
>  2020-08-18 22:35:19,451 INFO collector.TimelineCollectorManager 
> (TimelineCollectorManager.java:putIfAbsent(149)) - the collector for 
> application_1597677167198_2769 already exists!
>  2020-08-18 22:35:19,451 INFO placement.UserGroupMappingPlacementRule 
> (UserGroupMappingPlacementRule.java:getPlacementForApp(201)) - Application 
> application_1597677167198_2770 user kylin mapping [olap] to 
> [org.apache.hadoop.yarn.server.resourcemanager.placement.ApplicationPlacementContext@4ad0f252]
>  override true
>  2020-08-18 22:35:19,451 WARN rmapp.RMAppImpl (RMAppImpl.java:<init>(473)) - 
> The specific max attempts: 0 for application: 2770 is invalid, because it is 
> out of the range [1, 3]. Use the global max attempts instead.
>  2020-08-18 22:35:19,451 INFO collector.TimelineCollectorManager 
> (TimelineCollectorManager.java:putIfAbsent(149)) - the collector for 
> application_1597677167198_2770 already exists!
>  2020-08-18 22:35:19,452 INFO placement.UserGroupMappingPlacementRule 
> (UserGroupMappingPlacementRule.java:getPlacementForApp(201)) - Application 
> application_1597677167198_2771 user kylin mapping [olap] to 
> [org.apache.hadoop.yarn.server.resourcemanager.placement.ApplicationPlacementContext@2287c6d0]
>  override true
> {panel}
>           After a while, there was an zookeeper timeout exception: 
> ERROR curator.ConnectionState (ConnectionState.java:checkTimeouts(228)) - 
> Connection timed out for connection string 
> (zookeeper_host1:2181,zookeeper_host2:2181,zookeeper_host3:2181,zookeeper_host4:2181,zookeeper_host5:2181)
>  and timeout (15000) / elapsed (13677)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = 
> ConnectionLossorg.apache.curator.CuratorConnectionLossException: 
> KeeperErrorCode = ConnectionLoss at 
> org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:225) at 
> org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:94) at 
> org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:117)
>  at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835)
>  at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
>  at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
>  at 
> org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>       2. No tasks in the queue are running, but the resource occupancy rate 
> has not been released, this causes the newly submitted task to remain 
> ACCEPTED  !image-2020-08-24-11-07-50-899.png!
> ps: We occasionally use
>       yarn application -movetoqueue application_1478676388082_963529 -queue 
> task_data
>       to make tasks get sufficient queue resources



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to