[
https://issues.apache.org/jira/browse/YARN-10969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419614#comment-17419614
]
Hadoop QA commented on YARN-10969:
----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m
19s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m
0s{color} | {color:green}{color} | {color:green} No case conflicting files
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green}{color} | {color:green} The patch does not contain any
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to
include any new or modified tests. Please justify why no new tests are needed
for this patch. Also please list what manual steps were performed to verify
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m
14s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
29s{color} | {color:green}{color} | {color:green} trunk passed with JDK
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
25s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
34s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
47s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
15m 5s{color} | {color:green}{color} | {color:green} branch has no errors when
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
38s{color} | {color:green}{color} | {color:green} trunk passed with JDK
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 17m
42s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 1m
22s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m
38s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
21s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m
21s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
14s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m
14s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
26s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
38s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green}{color} | {color:green} The patch has no whitespace
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
13m 24s{color} | {color:green}{color} | {color:green} patch has no errors when
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
34s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
33s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 1m
28s{color} | {color:green}{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 23m
6s{color} | {color:green}{color} | {color:green} hadoop-yarn-server-nodemanager
in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
32s{color} | {color:green}{color} | {color:green} The patch does not generate
ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}100m 22s{color} |
{color:black}{color} | {color:black}{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base:
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1212/artifact/out/Dockerfile
|
| JIRA Issue | YARN-10969 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/13034089/YARN-10969.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite
unit shadedclient findbugs checkstyle spotbugs |
| uname | Linux 32cd908f535a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 3113a119af8 |
| Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
| Multi-JDK versions |
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
/usr/lib/jvm/java-8-openjdk-amd64:Private
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
| Test Results |
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1212/testReport/ |
| Max. process+thread count | 683 (vs. ulimit of 5500) |
| modules | C:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
U:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
|
| Console output |
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1212/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |
This message was automatically generated.
> After RM fail-over, getContainerStatus fails from ApplicationMaster to
> NodeManager
> ----------------------------------------------------------------------------------
>
> Key: YARN-10969
> URL: https://issues.apache.org/jira/browse/YARN-10969
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 3.1.2
> Reporter: Lee young gon
> Priority: Major
> Attachments: YARN-10969.001.patch
>
>
> If the artifact type of yarn-service spec is docker, getContainerStatus is
> periodically requested through the NMClient.
> And when RM fail-over occurs, getContainerStatus fails after a
> specific(yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs *2)
> time.
> Then the following log occurs in AM
> {code:java}
> 2021-04-05 19:18:47,381 [pool-5-thread-2] ERROR instance.ComponentInstance -
> [COMPINSTANCE regionserver-2 : container_e82_1612399098156_879545_01_000004]
> Failed to get container status on ac3iax2079.bdp.bdata.ai:9454, will try
> again javax.security.sasl.SaslException: DIGEST-MD5: digest response format
> violation. Mismatched response. [Caused by
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException):
> DIGEST-MD5: digest response format violation. Mismatched response.] at
> sun.reflect.GeneratedConstructorAccessor35.newInstance(Unknown Source) at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80) at
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119)
> at
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.getContainerStatuses(ContainerManagementProtocolPBClientImpl.java:159)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy47.getContainerStatuses(Unknown Source) at
> org.apache.hadoop.yarn.client.api.impl.NMClientImpl.getContainerStatus(NMClientImpl.java:339)
> at
> org.apache.hadoop.yarn.service.component.instance.ComponentInstance$ContainerStatusRetriever.run(ComponentInstance.java:958)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745) Caused by:
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException):
> DIGEST-MD5: digest response format violation. Mismatched response. at
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511) at
> org.apache.hadoop.ipc.Client.call(Client.java:1457) at
> org.apache.hadoop.ipc.Client.call(Client.java:1367) at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy46.getContainerStatuses(Unknown Source) at
> org.apache.hadoo
> {code}
> The overall flow is as follows.
> # Started AM
> # AM requests containers
> # RM assigned a container
> ## RM makes tokens for each NM assigned to NMToken Master Key and delivers
> them to AM
> ## This NMToken master key is rolling
> periodically(yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs)
> for each NM. There's a timing issue, but it's the same value as RM
> # NM is assigned a container and stores the NMToken master key (same as the
> master key used by RM) in the old Master Keys (hashmap) at that point with
> ApplicationAttemptId as the key
> # After that, requests from AM to NM (getContainerStatus) are made through
> the issued token
> ** Even if the master key is rolled, the request succeeds because it is
> stored in NM's oldMasterKeys (stored in NMStateStore)
> # But it becomes a problem if the AM loses that token for any reason(e.g. RM
> failover, AM restart)
> # For example, when the AM restarts, the AM uses the token created with the
> NMToken master key at that point and is only effective for a
> specific(yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs*2)
> time
> ** If there is an old MasterKey of ApplicationAttemptId in NM's
> oldMasterKey, currentMasterKey and preliminaryMasterKey are valid, but
> subsequent tokens are invalid
> That is, for any reason, when AM is re-issued with a token, it is only valid
> for a specific time
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]