[ https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334279#comment-16334279 ]
genericqa commented on YARN-7176: --------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 18s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 38s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 12s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 54s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 46m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | | org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getApplicationACLs() is unsynchronized, org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.setApplicationACLs(Map) is synchronized At ContainerLaunchContextPBImpl.java:synchronized At ContainerLaunchContextPBImpl.java:[lines 457-458] | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7176 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12907103/YARN_7176_2.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 454c1d3a344f 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6e27b20 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/19369/artifact/out/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.html | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/19369/testReport/ | | Max. process+thread count | 408 (vs. ulimit of 5000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/19369/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Similar to YARN-2387:Resource Manager crashes with NPE due to lack of > synchronization > ------------------------------------------------------------------------------------- > > Key: YARN-7176 > URL: https://issues.apache.org/jira/browse/YARN-7176 > Project: Hadoop YARN > Issue Type: Bug > Components: RM > Affects Versions: 2.6.0 > Reporter: lujie > Assignee: lujie > Priority: Blocker > Attachments: YARN-7176.patch, YARN_7176_2.patch, logs.rar > > > submit a job, when the job is starting Appmaster Containers(eg. > startContainers) , then send kill command. After RM receive kill command, it > will perform state store(eg.updateApplicationStateInternal). > the startContainers process and updateApplicationStateInternal will call the > same method ContainerLaunchContextPBImpl.getProto which lack of the > synchronization(also can be called in reInitializeContainer method), the RM > log will show below. > {code:java} > 2017-09-08 02:34:37,967 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error > launching appattempt_1504809243340_0001_000001. Got exception: > java.lang.ArrayIndexOutOfBoundsException: 3 > at java.util.ArrayList.add(ArrayList.java:441) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) > at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2017-09-08 02:34:37,968 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating app: application_1504809243340_0001 > java.lang.NullPointerException > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816) > at > com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:148) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:810) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:864) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:859) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > 2017-09-08 02:34:37,978 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a > org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > java.lang.NullPointerException > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816) > at > com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:148) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:810) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:864) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:859) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > 2017-09-08 02:34:37,987 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_1504809243340_0001_01_000001 Container Transitioned from ACQUIRED > to KILLED > 2017-09-08 02:34:37,987 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: > Completed container: container_1504809243340_0001_01_000001 in state: KILLED > event:KILL > 2017-09-08 02:34:37,987 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hires > OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS > APPID=application_1504809243340_0001 > CONTAINERID=container_1504809243340_0001_01_000001 > 2017-09-08 02:34:37,988 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: > Released container container_1504809243340_0001_01_000001 of capacity > <memory:2048, vCores:1> on host hadoop11:45454, which currently has 0 > containers, <memory:0, vCores:0> used and <memory:8096, vCores:8> available, > release resources=true > 2017-09-08 02:34:37,988 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1 > 2017-09-08 02:34:37,988 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > default used=<memory:0, vCores:0> numContainers=0 user=hires > user-resources=<memory:0, vCores:0> > 2017-09-08 02:34:37,989 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > completedContainer container=Container: [ContainerId: > container_1504809243340_0001_01_000001, NodeId: hadoop11:45454, > NodeHttpAddress: hadoop11:8042, Resource: <memory:2048, vCores:1>, Priority: > 0, Token: Token { kind: ContainerToken, service: 10.3.1.11:45454 }, ] > queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, > vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, > numContainers=0 cluster=<memory:16192, vCores:16> > 2017-09-08 02:34:37,989 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > completedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0 > used=<memory:0, vCores:0> cluster=<memory:16192, vCores:16> > 2017-09-08 02:34:37,990 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Re-sorting completed queue: root.default stats: default: capacity=1.0, > absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=0 > 2017-09-08 02:34:37,990 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Application attempt appattempt_1504809243340_0001_000001 released container > container_1504809243340_0001_01_000001 on node: host: hadoop11:45454 > #containers=0 available=8096 used=0 with event: KILL > 2017-09-08 02:34:37,990 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: > Application application_1504809243340_0001 requests cleared > 2017-09-08 02:34:37,990 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Application removed - appId: application_1504809243340_0001 user: hires > queue: default #user-pending-applications: 0 #user-active-applications: 0 > #queue-pending-applications: 0 #queue-active-applications: 0 > 2017-09-08 02:34:38,001 ERROR > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > ExpiredTokenRemover received java.lang.InterruptedException: sleep > interrupted > 2017-09-08 02:34:38,005 INFO org.mortbay.log: Stopped > HttpServer2$SelectChannelConnectorWithSafeStartup@hadoop11:8088 > 2017-09-08 02:34:38,005 ERROR > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > ExpiredTokenRemover received java.lang.InterruptedException: sleep > interrupted > 2017-09-08 02:34:38,006 ERROR > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > ExpiredTokenRemover received java.lang.InterruptedException: sleep > interrupted > 2017-09-08 02:34:38,108 INFO org.apache.hadoop.ipc.Server: Stopping server on > 8032 > 2017-09-08 02:34:38,113 INFO org.apache.hadoop.ipc.Server: Stopping IPC > Server listener on 8032 > 2017-09-08 02:34:38,113 INFO org.apache.hadoop.ipc.Server: Stopping server on > 8033 > 2017-09-08 02:34:38,114 INFO org.apache.hadoop.ipc.Server: Stopping IPC > Server Responder > 2017-09-08 02:34:38,114 INFO org.apache.hadoop.ipc.Server: Stopping IPC > Server listener on 8033 > 2017-09-08 02:34:38,114 INFO org.apache.hadoop.ipc.Server: Stopping IPC > Server Responder > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org