[
https://issues.apache.org/jira/browse/YARN-11455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791533#comment-17791533
]
Jepson commented on YARN-11455:
-------------------------------
This bug on my side as well.
2023-11-28 09:57:36,339 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=bigdata
OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS
APPID=application_1699669356032_6755
CONTAINERID=container_e159_1699669356032_6755_01_042429 RESOURCE=<memory:4096,
vCores:1>
2023-11-28 09:57:36,347 INFO
org.apache.hadoop.ha.ActiveStandbyElector:*{color:#FFAB00} Session
disconnected. Entering neutral mode...{color}*
2023-11-28 09:57:36,347 WARN
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService:
Lost contact with Zookeeper. Transitioning to standby in 10000 ms if
connection is not reestablished.
2023-11-28 09:57:36,684 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e159_1699669356032_6754_01_002305 Container Transitioned from NEW to
ALLOCATED
2023-11-28 09:57:36,684 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=bigdata
OPERATION=AM Allocated Container TARGET=SchedulerApp RESULT=SUCCESS
APPID=application_1699669356032_6754
CONTAINERID=container_e159_1699669356032_6754_01_002305 RESOURCE=<memory:4096,
vCores:1>
2023-11-28 09:57:36,796 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server bdpuat01/10.100.15.145:2181. Will not attempt to
authenticate using SASL (unknown error)
2023-11-28 09:57:36,796 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo:
checking for deactivate of application :application_1699669356032_6755
2023-11-28 09:57:36,797 INFO org.apache.zookeeper.ClientCnxn: Socket connection
established to bdpuat01/10.100.15.145:2181, initiating session
2023-11-28 09:57:36,797 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server bdpuat01/10.100.15.145:2181, sessionid =
0x28bea27f845085e, negotiated timeout = 60000
2023-11-28 09:57:36,798 INFO
org.apache.curator.framework.state.ConnectionStateManager: State change:
RECONNECTED
2023-11-28 09:57:36,840 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server bdpuat01/10.100.15.145:2181. Will not attempt to
authenticate using SASL (unknown error)
2023-11-28 09:57:36,841 INFO org.apache.zookeeper.ClientCnxn: Socket connection
established to bdpuat01/10.100.15.145:2181, initiating session
2023-11-28 09:57:36,850 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server bdpuat01/10.100.15.145:2181, sessionid =
0x28bea27f845085f, negotiated timeout = 60000
2023-11-28 09:57:36,850 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session
connected.
2023-11-28 09:57:36,853 INFO org.apache.hadoop.conf.Configuration: found
resource yarn-site.xml at
file:/home/yarn/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml
2023-11-28 09:57:36,853 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn
OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS
--正常
2023-11-28 09:57:36,854 INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to
standby state
2023-11-28 09:57:36,857 WARN
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher:
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
interrupted. Returning.
2023-11-28 09:57:36,861 INFO org.apache.hadoop.ipc.Server: Stopping server on
23140
2023-11-28 09:57:36,865 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server
listener on 23140
2023-11-28 09:57:36,865 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server
Responder
2023-11-28 09:57:36,877 INFO org.apache.hadoop.ipc.Server: Stopping server on
23130
2023-11-28 09:57:36,878 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server
Responder
2023-11-28 09:57:36,882 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server
listener on 23130
2023-11-28 09:57:36,886 INFO org.apache.hadoop.ipc.Server: Stopping server on
8031
2023-11-28 09:57:36,887 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server
listener on 8031
2023-11-28 09:57:36,890 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server
Responder
2023-11-28 09:57:36,892 INFO
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor
thread interrupted
> All RMs in HA are stuck in standby when the ZK connection is disconnected
> -------------------------------------------------------------------------
>
> Key: YARN-11455
> URL: https://issues.apache.org/jira/browse/YARN-11455
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.10.1, 3.3.3
> Reporter: Prabhu Joseph
> Assignee: Prabhu Joseph
> Priority: Major
>
> All RMs in HA are stuck in standby when the ZK connection held by the active
> RM is disconnected.
> {code:java}
> 2023-02-22 13:08:19,832 INFO org.apache.hadoop.ha.ActiveStandbyElector
> (main-EventThread): Session disconnected. Entering neutral mode...
> 2023-02-22 13:08:19,832 WARN
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService
> (main-EventThread): Lost contact with Zookeeper. Transitioning to standby in
> 10000 ms if connection is not reestablished.{code}
>
> *Repro:*
> Send a Disconnected Event to the Active RM using below code.
> {code:java}
> zkConnectionState = ConnectionState.DISCONNECTED;
> enterNeutralMode();
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]