[
https://issues.apache.org/jira/browse/YARN-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295013#comment-17295013
]
Hadoop QA commented on YARN-10642:
----------------------------------
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 27m
49s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m
0s{color} | {color:green}{color} | {color:green} No case conflicting files
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green}{color} | {color:green} The patch does not contain any
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color}
| {color:green}test4tests{color} | {color:green} The patch appears to include 1
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m
43s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
53s{color} | {color:green}{color} | {color:green} trunk passed with JDK
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
38s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
58s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
18m 36s{color} | {color:green}{color} | {color:green} branch has no errors when
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
44s{color} | {color:green}{color} | {color:green} trunk passed with JDK
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m
7s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m
33s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m
32s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
52s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m
52s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m
53s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m
37s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green}{color} | {color:green} The patch has no whitespace
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}
15m 45s{color} | {color:green}{color} | {color:green} patch has no errors when
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
35s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
58s{color} | {color:green}{color} | {color:green} the patch passed with JDK
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m
15s{color} | {color:green}{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m
13s{color} | {color:green}{color} | {color:green} hadoop-yarn-common in the
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
27s{color} | {color:green}{color} | {color:green} The patch does not generate
ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}109m 38s{color} |
{color:black}{color} | {color:black}{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base:
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/715/artifact/out/Dockerfile
|
| JIRA Issue | YARN-10642 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/13021541/YARN-10642.004.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite
unit shadedclient findbugs checkstyle |
| uname | Linux a773323e4531 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28
05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 7e8040e6adc |
| Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
| Multi-JDK versions |
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
/usr/lib/jvm/java-8-openjdk-amd64:Private
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
| Test Results |
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/715/testReport/ |
| Max. process+thread count | 515 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output |
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/715/console |
| versions | git=2.25.1 maven=3.6.3 findbugs=4.0.6 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |
This message was automatically generated.
> AsyncDispatcher will stuck introduced by YARN-8995.
> ---------------------------------------------------
>
> Key: YARN-10642
> URL: https://issues.apache.org/jira/browse/YARN-10642
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 3.2.1
> Reporter: zhengchenyu
> Assignee: zhengchenyu
> Priority: Critical
> Attachments: MockForDeadLoop.java, YARN-10642.001.patch,
> YARN-10642.002.patch, YARN-10642.003.patch, YARN-10642.004.patch,
> deadloop.png, debugfornode.png, put.png, take.png
>
>
> In our cluster, ResouceManager stuck twice within twenty days. Yarn client
> can't submit application. I got jstack info at second time, then found the
> reason.
> I analyze all the jstack, I found many thread stuck because can't get
> LinkedBlockingQueue.putLock. (Note: Sorry for limited space , omit the
> analytical process)
> The reason is that one thread hold the putLock all the time,
> printEventQueueDetails will called forEachRemaining, then hold putLock and
> readLock. The AsyncDispatcher will stuck.
> {code}
> Thread 6526 (IPC Server handler 454 on default port 8030):
> State: RUNNABLE
> Blocked count: 29988
> Waited count: 2035029
> Stack:
>
> java.util.concurrent.LinkedBlockingQueue$LBQSpliterator.forEachRemaining(LinkedBlockingQueue.java:926)
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.printEventQueueDetails(AsyncDispatcher.java:270)
>
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:295)
>
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.handleProgress(DefaultAMSProcessor.java:408)
>
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:215)
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:432)
>
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1040)
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:958)
> java.security.AccessController.doPrivileged(Native Method)
> {code}
> I analyze LinkedBlockingQueue's source code. I found forEachRemaining in
> LinkedBlockingQueue.LBQSpliterator may stuck, when forEachRemaining and take
> are called in different thread.
> YARN-8995 introduce printEventQueueDetails method,
> "eventQueue.stream().collect" will called forEachRemaining method.
> Let's see why? "put.png" shows that how to put("a"), "take.png" shows that
> how to take()。Specical Node: The removed Node will point itself for help gc!!!
> The key point code is in forEachRemaining, we see LBQSpliterator use
> forEachRemaining to visit all Node. But when got item value from Node, will
> release the lock. If at this time, take() will be called.
> The variable 'p' in forEachRemaining may point a Node which point itself,
> then forEachRemaining will be in dead loop. You can see it in "deadloop.png"
> Let's see a simple uni-test, Let's forEachRemaining called more slow than
> take, the problem will reproduction。uni-test is MockForDeadLoop.java.
> I debug MockForDeadLoop.java, and see a Node point itself. You can see pic
> "debugfornode.png"
> Environment:
> OS: CentOS Linux release 7.5.1804 (Core)
> JDK: jdk1.8.0_281
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]