[ https://issues.apache.org/jira/browse/HDDS-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marton Elek updated HDDS-3902: ------------------------------ Issue Type: Bug (was: Improvement) > OM HA client failover switcher to a wrong OM server > --------------------------------------------------- > > Key: HDDS-3902 > URL: https://issues.apache.org/jira/browse/HDDS-3902 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: OM HA > Reporter: Marton Elek > Priority: Blocker > Labels: 0.7.0 > > Found this problem with the PR/branch HDDS-3878, but it seems to be > independent. > 1. ozone sh volume create /vol1 works well with HA > 2. ozone freon omkg (rpc client) doesn't work > {code} > ozone freon omkg | grep "Failing over" > 2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy > to index: 1, nodeId: om2 > 2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy > to index: 2, nodeId: om3 > 2020-06-30 14:15:34 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy > to index: 0, nodeId: omNodeIdDummy > {code} > om2 seems to be the leader but for some reason the failover logic switching > back to an unknown node (?) > {code} > 2020-06-30 14:16:35 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy > to index: 2, nodeId: om3 > 2020-06-30 14:16:35 DEBUG Client:63 - getting client out of cache: > org.apache.hadoop.ipc.Client@f5acb9d > 2020-06-30 14:16:35 DEBUG Client:497 - The ping interval is 60000 ms. > 2020-06-30 14:16:35 DEBUG Client:795 - Connecting to > ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 > 2020-06-30 14:16:35 DEBUG Client:1074 - IPC Client (363509958) connection to > ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root: > starting, having connections 3 > 2020-06-30 14:16:35 DEBUG Client:1137 - IPC Client (363509958) connection to > ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root > sending #0 > org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest > 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to > ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got > value #0 > 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took > 439ms > 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to > ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root > sending #1 > org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest > 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to > ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got > value #1 > 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 2ms > 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to > ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root > sending #2 org.apache.hadoop.ozone.om.pro > tocol.OzoneManagerProtocol.submitRequest > 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to > ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got > value #2 > 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms > 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to > ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root > sending #3 > org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest > 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to > ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got > value #3 > 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms > 2020-06-30 14:16:36 DEBUG Client:63 - getting client out of cache: > org.apache.hadoop.ipc.Client@f5acb9d > 2020-06-30 14:16:36 DEBUG Groups:312 - GroupCacheLoader - load. > 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root > sending #5 > org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest > 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root > sending #11 > org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest > 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root > sending #8 > org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest > 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root > sending #12 > org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest > 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root > sending #10 > org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest > 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root > sending #6 > org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest > 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root > sending #9 > org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest > 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root > sending #7 > org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest > 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root > sending #4 > org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest > 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root > sending #13 > org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest > 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got > value #5 > 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got > value #8 > 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got > value #11 > 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got > value #10 > 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got > value #12 > 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got > value #7 > 2020-06-30 14:16:36 DEBUG Hadoop3OmTransport:140 - RetryProxy: OM:om1 is not > the leader. Suggested leader is OM:om3. > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createNotLeaderException(OzoneManagerProtocolServerSideTranslatorPB.java:198) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:141) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:74) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:113) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915) > 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to > ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got > value #9 > 2020-06-30 14:16:36 DEBUG OMFailoverProxyProvider:299 - Incrementing OM proxy > index to 0, nodeId: omNodeIdDummy > {code} > As you can see (after a few failover) finally om2 has been found and a few > requests has been handled. But after that the client switched back to the om0 > (???) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org