[ https://issues.apache.org/jira/browse/YARN-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
YCozy updated YARN-10301: ------------------------- Description: We observed the "Mismatched response." error in RM's log when a NM gets network-partitioned after RM failover. Here's how it happens: Initially, we have a sleeper YARN service running in a cluster with two RMs (an active RM1 and a standby RM2) and one NM. At some point, we perform a RM failover from RM1 to RM2. RM1's log: {noformat} 2020-06-01 16:29:20,387 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to standby state{noformat} RM2's log: {noformat} 2020-06-01 16:29:27,818 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to active state{noformat} After the RM failover, the NM encounters a network partition and fails to register with RM2. In other words, there's no "NodeManager from node *** registered" in RM2's log. This does not affect the sleeper YARN service. The sleeper service successfully recovers after the RM failover. We can see in RM2's log: {noformat} 2020-06-01 16:30:06,703 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_6_0001_000001 State change from LAUNCHED to RUNNING on event = REGISTERED{noformat} Then, we stop the sleeper service. In RM2's log, we can see that: {noformat} 2020-06-01 16:30:12,157 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: application_6_0001 unregistered successfully. ... 2020-06-01 16:31:09,861 INFO org.apache.hadoop.yarn.service.webapp.ApiServer: Successfully stopped service sleeper1{noformat} And in AM's log, we can see that: {noformat} 2020-06-01 16:30:12,651 [shutdown-hook-0] INFO service.ServiceMaster - SHUTDOWN_MSG:{noformat} Some time later, we observe the "Mismatched response" in RM2's log: {noformat} 2020-06-01 16:43:20,699 WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response. at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:376) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:623) at org.apache.hadoop.ipc.Client$Connection.access$2400(Client.java:414) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:827) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:823) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:823) at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1667) at org.apache.hadoop.ipc.Client.call(Client.java:1483) at org.apache.hadoop.ipc.Client.call(Client.java:1436) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy102.stopContainers(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.stopContainers(ContainerManagementProtocolPBClientImpl.java:147) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy103.stopContainers(Unknown Source) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.cleanup(AMLauncher.java:153) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:354) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2020-06-01 16:43:20,700 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error cleaning master javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched response. [Caused by org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.stopContainers(ContainerManagementProtocolPBClientImpl.java:150) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy103.stopContainers(Unknown Source) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.cleanup(AMLauncher.java:153) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:354) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response. at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1593) at org.apache.hadoop.ipc.Client.call(Client.java:1539) at org.apache.hadoop.ipc.Client.call(Client.java:1436) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy102.stopContainers(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.stopContainers(ContainerManagementProtocolPBClientImpl.java:147) ... 15 more {noformat} was: We observed the "Mismatched response." error in RM's log when a NM gets network-partitioned after RM failover. Here's how it happens: Initially, we have a sleeper YARN service running in a cluster with two RMs (an active RM1 and a standby RM2) and one NM. At some point, we perform a RM failover from RM1 to RM2. RM1's log: {noformat} 2020-06-01 16:29:20,387 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to standby state{noformat} RM2's log: {noformat} 2020-06-01 16:29:27,818 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to active state{noformat} After the RM failover, the NM encounters a network partition and fails to register with RM2. In other words, there's no "NodeManager from node *** registered" in RM2's log. This does not affect the sleeper YARN service. The sleeper service successfully recovers after the RM failover. We can see in RM2's log: {noformat} 2020-06-01 16:30:06,703 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_6_0001_000001 State change from LAUNCHED to RUNNING on event = REGISTERED{noformat} Then, we stop the sleeper service. In RM2's log, we can see that: {noformat} 2020-06-01 16:30:12,157 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: application_6_0001 unregistered successfully. ... 2020-06-01 16:31:09,861 INFO org.apache.hadoop.yarn.service.webapp.ApiServer: Successfully stopped service sleeper1{noformat} And in AM's log, we can see that: {noformat} 2020-06-01 16:30:12,651 [shutdown-hook-0] INFO service.ServiceMaster - SHUTDOWN_MSG:{noformat} Some time later, we observe the "Mismatched response" in RM2's log: {noformat} 2020-06-01 16:43:20,699 WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response. at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:376) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:623) at org.apache.hadoop.ipc.Client$Connection.access$2400(Client.java:414) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:827) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:823) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:823) at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1667) at org.apache.hadoop.ipc.Client.call(Client.java:1483) at org.apache.hadoop.ipc.Client.call(Client.java:1436) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy102.stopContainers(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.stopContainers(ContainerManagementProtocolPBClientImpl.java:147) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy103.stopContainers(Unknown Source) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.cleanup(AMLauncher.java:153) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:354) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2020-06-01 16:43:20,700 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error cleaning master javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched response. [Caused by org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.stopContainers(ContainerManagementProtocolPBClientImpl.java:150) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy103.stopContainers(Unknown Source) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.cleanup(AMLauncher.java:153) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:354) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response. at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1593) at org.apache.hadoop.ipc.Client.call(Client.java:1539) at org.apache.hadoop.ipc.Client.call(Client.java:1436) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy102.stopContainers(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.stopContainers(ContainerManagementProtocolPBClientImpl.java:147) ... 15 more {noformat} > "DIGEST-MD5: digest response format violation. Mismatched response." when > network partition occurs > -------------------------------------------------------------------------------------------------- > > Key: YARN-10301 > URL: https://issues.apache.org/jira/browse/YARN-10301 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 3.3.0 > Reporter: YCozy > Priority: Major > > We observed the "Mismatched response." error in RM's log when a NM gets > network-partitioned after RM failover. Here's how it happens: > > Initially, we have a sleeper YARN service running in a cluster with two RMs > (an active RM1 and a standby RM2) and one NM. At some point, we perform a RM > failover from RM1 to RM2. > RM1's log: > {noformat} > 2020-06-01 16:29:20,387 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned > to standby state{noformat} > RM2's log: > {noformat} > 2020-06-01 16:29:27,818 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned > to active state{noformat} > > After the RM failover, the NM encounters a network partition and fails to > register with RM2. In other words, there's no "NodeManager from node *** > registered" in RM2's log. > > This does not affect the sleeper YARN service. The sleeper service > successfully recovers after the RM failover. We can see in RM2's log: > {noformat} > 2020-06-01 16:30:06,703 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_6_0001_000001 State change from LAUNCHED to RUNNING on event = > REGISTERED{noformat} > > Then, we stop the sleeper service. In RM2's log, we can see that: > {noformat} > 2020-06-01 16:30:12,157 INFO > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: > application_6_0001 unregistered successfully. > ... > 2020-06-01 16:31:09,861 INFO org.apache.hadoop.yarn.service.webapp.ApiServer: > Successfully stopped service sleeper1{noformat} > And in AM's log, we can see that: > {noformat} > 2020-06-01 16:30:12,651 [shutdown-hook-0] INFO service.ServiceMaster - > SHUTDOWN_MSG:{noformat} > > Some time later, we observe the "Mismatched response" in RM2's log: > {noformat} > 2020-06-01 16:43:20,699 WARN org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server > org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): > DIGEST-MD5: digest response format violation. Mismatched response. > at > org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:376) > at > org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:623) > at org.apache.hadoop.ipc.Client$Connection.access$2400(Client.java:414) > > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:827) > > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:823) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:823) > > at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414) > > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1667) > > at org.apache.hadoop.ipc.Client.call(Client.java:1483) > > at org.apache.hadoop.ipc.Client.call(Client.java:1436) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy102.stopContainers(Unknown Source) > > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.stopContainers(ContainerManagementProtocolPBClientImpl.java:147) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy103.stopContainers(Unknown Source) > > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.cleanup(AMLauncher.java:153) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:354) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > 2020-06-01 16:43:20,700 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error > cleaning master > javax.security.sasl.SaslException: DIGEST-MD5: digest response format > violation. Mismatched response. [Caused by > org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): > DIGEST-MD5: digest response format violation. Mismatched response.] > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > > at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.stopContainers(ContainerManagementProtocolPBClientImpl.java:150) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy103.stopContainers(Unknown Source) > > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.cleanup(AMLauncher.java:153) > > > > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:354) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Caused by: > org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): > DIGEST-MD5: digest response format violation. Mismatched response. > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1593) > > at org.apache.hadoop.ipc.Client.call(Client.java:1539) > > at org.apache.hadoop.ipc.Client.call(Client.java:1436) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy102.stopContainers(Unknown Source) > > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.stopContainers(ContainerManagementProtocolPBClientImpl.java:147) > ... 15 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org