[ 
https://issues.apache.org/jira/browse/YARN-11455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791533#comment-17791533
 ] 

Jepson edited comment on YARN-11455 at 11/30/23 9:39 AM:
---------------------------------------------------------

This bug on my side as well.

2023-11-28 09:57:36,339 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=bigdata 
OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS 
APPID=application_1699669356032_6755 
CONTAINERID=container_e159_1699669356032_6755_01_042429 RESOURCE=<memory:4096, 
vCores:1>

2023-11-28 09:57:36,347 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
*{color:#ffab00}Session disconnected. Entering neutral mode...{color}*
2023-11-28 09:57:36,347 WARN 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService:
 Lost contact with Zookeeper. Transitioning to standby in 10000 ms if 
connection is not reestablished.

2023-11-28 09:57:36,684 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_e159_1699669356032_6754_01_002305 Container Transitioned from NEW to 
ALLOCATED
2023-11-28 09:57:36,684 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=bigdata 
OPERATION=AM Allocated Container TARGET=SchedulerApp RESULT=SUCCESS 
APPID=application_1699669356032_6754 
CONTAINERID=container_e159_1699669356032_6754_01_002305 RESOURCE=<memory:4096, 
vCores:1>
2023-11-28 09:57:36,796 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server bdpuat01/10.100.15.145:2181. Will not attempt to 
authenticate using SASL (unknown error)
2023-11-28 09:57:36,796 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
checking for deactivate of application :application_1699669356032_6755
2023-11-28 09:57:36,797 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to bdpuat01/10.100.15.145:2181, initiating session
2023-11-28 09:57:36,797 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server bdpuat01/10.100.15.145:2181, sessionid = 
0x28bea27f845085e, negotiated timeout = 60000
2023-11-28 09:57:36,798 INFO 
org.apache.curator.framework.state.ConnectionStateManager: State change: 
RECONNECTED
2023-11-28 09:57:36,840 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server bdpuat01/10.100.15.145:2181. Will not attempt to 
authenticate using SASL (unknown error)
2023-11-28 09:57:36,841 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to bdpuat01/10.100.15.145:2181, initiating session
2023-11-28 09:57:36,850 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server bdpuat01/10.100.15.145:2181, sessionid = 
0x28bea27f845085f, negotiated timeout = 60000
2023-11-28 09:57:36,850 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session 
connected.
2023-11-28 09:57:36,853 INFO org.apache.hadoop.conf.Configuration: found 
resource yarn-site.xml at 
[file:/home/yarn/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml|file://home/yarn/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml]
2023-11-28 09:57:36,853 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn 
OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS


2023-11-28 09:57:36,854 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to 
standby state
2023-11-28 09:57:36,857 WARN 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher:
 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
 interrupted. Returning.
2023-11-28 09:57:36,861 INFO org.apache.hadoop.ipc.Server: Stopping server on 
23140
2023-11-28 09:57:36,865 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
listener on 23140
2023-11-28 09:57:36,865 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
Responder
2023-11-28 09:57:36,877 INFO org.apache.hadoop.ipc.Server: Stopping server on 
23130
2023-11-28 09:57:36,878 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
Responder
2023-11-28 09:57:36,882 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
listener on 23130
2023-11-28 09:57:36,886 INFO org.apache.hadoop.ipc.Server: Stopping server on 
8031
2023-11-28 09:57:36,887 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
listener on 8031
2023-11-28 09:57:36,890 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
Responder
2023-11-28 09:57:36,892 INFO 
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor 
thread interrupted


was (Author: [email protected]):
This bug on my side as well.

2023-11-28 09:57:36,339 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=bigdata  
OPERATION=AM Released Container TARGET=SchedulerApp     RESULT=SUCCESS  
APPID=application_1699669356032_6755    
CONTAINERID=container_e159_1699669356032_6755_01_042429 RESOURCE=<memory:4096, 
vCores:1>

2023-11-28 09:57:36,347 INFO 
org.apache.hadoop.ha.ActiveStandbyElector:*{color:#FFAB00} Session 
disconnected. Entering neutral mode...{color}*
2023-11-28 09:57:36,347 WARN 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService:
 Lost contact with Zookeeper. Transitioning to standby in 10000 ms if 
connection is not reestablished.

2023-11-28 09:57:36,684 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_e159_1699669356032_6754_01_002305 Container Transitioned from NEW to 
ALLOCATED
2023-11-28 09:57:36,684 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=bigdata  
OPERATION=AM Allocated Container        TARGET=SchedulerApp     RESULT=SUCCESS  
APPID=application_1699669356032_6754    
CONTAINERID=container_e159_1699669356032_6754_01_002305 RESOURCE=<memory:4096, 
vCores:1>
2023-11-28 09:57:36,796 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server bdpuat01/10.100.15.145:2181. Will not attempt to 
authenticate using SASL (unknown error)
2023-11-28 09:57:36,796 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: 
checking for deactivate of application :application_1699669356032_6755
2023-11-28 09:57:36,797 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to bdpuat01/10.100.15.145:2181, initiating session
2023-11-28 09:57:36,797 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server bdpuat01/10.100.15.145:2181, sessionid = 
0x28bea27f845085e, negotiated timeout = 60000
2023-11-28 09:57:36,798 INFO 
org.apache.curator.framework.state.ConnectionStateManager: State change: 
RECONNECTED
2023-11-28 09:57:36,840 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server bdpuat01/10.100.15.145:2181. Will not attempt to 
authenticate using SASL (unknown error)
2023-11-28 09:57:36,841 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to bdpuat01/10.100.15.145:2181, initiating session
2023-11-28 09:57:36,850 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server bdpuat01/10.100.15.145:2181, sessionid = 
0x28bea27f845085f, negotiated timeout = 60000
2023-11-28 09:57:36,850 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session 
connected.
2023-11-28 09:57:36,853 INFO org.apache.hadoop.conf.Configuration: found 
resource yarn-site.xml at 
file:/home/yarn/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml
2023-11-28 09:57:36,853 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn     
OPERATION=refreshAdminAcls      TARGET=AdminService     RESULT=SUCCESS
--正常
2023-11-28 09:57:36,854 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to 
standby state
2023-11-28 09:57:36,857 WARN 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher:
 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
 interrupted. Returning.
2023-11-28 09:57:36,861 INFO org.apache.hadoop.ipc.Server: Stopping server on 
23140
2023-11-28 09:57:36,865 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
listener on 23140
2023-11-28 09:57:36,865 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
Responder
2023-11-28 09:57:36,877 INFO org.apache.hadoop.ipc.Server: Stopping server on 
23130
2023-11-28 09:57:36,878 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
Responder
2023-11-28 09:57:36,882 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
listener on 23130
2023-11-28 09:57:36,886 INFO org.apache.hadoop.ipc.Server: Stopping server on 
8031
2023-11-28 09:57:36,887 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
listener on 8031
2023-11-28 09:57:36,890 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
Responder
2023-11-28 09:57:36,892 INFO 
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor 
thread interrupted

> All RMs in HA are stuck in standby when the ZK connection is disconnected
> -------------------------------------------------------------------------
>
>                 Key: YARN-11455
>                 URL: https://issues.apache.org/jira/browse/YARN-11455
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.10.1, 3.3.3
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Major
>
> All RMs in HA are stuck in standby when the ZK connection held by the active 
> RM is disconnected.
> {code:java}
> 2023-02-22 13:08:19,832 INFO org.apache.hadoop.ha.ActiveStandbyElector 
> (main-EventThread): Session disconnected. Entering neutral mode...
> 2023-02-22 13:08:19,832 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService
>  (main-EventThread): Lost contact with Zookeeper. Transitioning to standby in 
> 10000 ms if connection is not reestablished.{code}
>  
> *Repro:*
> Send a Disconnected Event to the Active RM using below code.
> {code:java}
> zkConnectionState = ConnectionState.DISCONNECTED;
> enterNeutralMode();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to