[ 
https://issues.apache.org/jira/browse/YARN-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293939#comment-17293939
 ] 

Hadoop QA commented on YARN-10642:
----------------------------------

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 27m 
30s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
 4s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 24s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m  
4s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; 
considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 26s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/706/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt{color}
 | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: The patch 
generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 43s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green}{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
12s{color} | {color:green}{color} | {color:green} hadoop-yarn-common in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green}{color} | {color:green} The patch does not generate 
ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}104m  6s{color} | 
{color:black}{color} | {color:black}{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/706/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10642 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13020857/YARN-10642.003.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 3bb4011ffe01 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 
05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 1f1a1ef52df |
| Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
| Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
|  Test Results | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/706/testReport/ |
| Max. process+thread count | 535 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/706/console |
| versions | git=2.25.1 maven=3.6.3 findbugs=4.0.6 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> AsyncDispatcher will stuck introduced by YARN-8995.
> ---------------------------------------------------
>
>                 Key: YARN-10642
>                 URL: https://issues.apache.org/jira/browse/YARN-10642
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.2.1
>            Reporter: zhengchenyu
>            Assignee: zhengchenyu
>            Priority: Critical
>         Attachments: MockForDeadLoop.java, YARN-10642.001.patch, 
> YARN-10642.002.patch, YARN-10642.003.patch, deadloop.png, debugfornode.png, 
> put.png, take.png
>
>
> In our cluster, ResouceManager stuck twice within twenty days. Yarn client 
> can't submit application. I got jstack info at second time, then found the 
> reason.
> I analyze all the jstack, I found many thread stuck because can't get 
> LinkedBlockingQueue.putLock. (Note: Sorry for limited space , omit the 
> analytical process)
> The reason is that one thread hold the putLock all the time, 
> printEventQueueDetails will called forEachRemaining, then hold putLock and 
> readLock. The AsyncDispatcher will stuck.
> {code}
> Thread 6526 (IPC Server handler 454 on default port 8030):
>   State: RUNNABLE
>   Blocked count: 29988
>   Waited count: 2035029
>   Stack:
>     
> java.util.concurrent.LinkedBlockingQueue$LBQSpliterator.forEachRemaining(LinkedBlockingQueue.java:926)
>     java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>     
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>     java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>     java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>     java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>     
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.printEventQueueDetails(AsyncDispatcher.java:270)
>     
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:295)
>     
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.handleProgress(DefaultAMSProcessor.java:408)
>     
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:215)
>     
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>     
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>     
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:432)
>     
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>     
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>     
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>     org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>     org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1040)
>     org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:958)
>     java.security.AccessController.doPrivileged(Native Method)
> {code}
> I analyze LinkedBlockingQueue's source code. I found forEachRemaining in 
> LinkedBlockingQueue.LBQSpliterator may stuck, when forEachRemaining and take 
> are called in different thread. 
> YARN-8995 introduce printEventQueueDetails method, 
> "eventQueue.stream().collect" will called forEachRemaining method.
> Let's see why? "put.png" shows that how to put("a"), "take.png" shows that 
> how to take()。Specical Node: The removed Node will point itself for help gc!!!
> The key point code is in forEachRemaining, we see LBQSpliterator use 
> forEachRemaining to visit all Node. But when got item value from Node, will 
> release the lock. If at this time, take() will be called. 
> The variable 'p' in forEachRemaining may point a Node which point itself, 
> then forEachRemaining will be in dead loop. You can see it in "deadloop.png"
> Let's see a simple uni-test, Let's forEachRemaining called more slow than 
> take, the problem will reproduction。uni-test is MockForDeadLoop.java.
> I debug MockForDeadLoop.java, and see a Node point itself. You can see pic 
> "debugfornode.png"
> Environment:
>   OS: CentOS Linux release 7.5.1804 (Core) 
>   JDK: jdk1.8.0_281



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to