[jira] [Updated] (YARN-7200) SLS generates a realtimetrack.json file but that file is missing the closing ']'

2020-11-17 Thread Agshin Kazimli (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Agshin Kazimli updated YARN-7200:
-
Attachment: (was: realtimetrack.json)

> SLS generates a realtimetrack.json file but that file is missing the closing 
> ']'
> 
>
> Key: YARN-7200
> URL: https://issues.apache.org/jira/browse/YARN-7200
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Grant Sohn
>Assignee: Agshin Kazimli
>Priority: Minor
>  Labels: newbie, newbie++
> Attachments: YARN-7200-branch-trunk.patch, YARN-7200.002.patch, 
> snemeth-testing-20201113.zip
>
>
> File 
> hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java
>  shows:
> {noformat}
>   void tearDown() throws Exception {
> if (metricsLogBW != null)  {
>   metricsLogBW.write("]");
>   metricsLogBW.close();
> }
> 
> {noformat}
> So the exit logic is flawed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10486) FS-CS converter: handle case when weight=0

2020-11-17 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234296#comment-17234296
 ] 

Peter Bacsko commented on YARN-10486:
-

" In this case maybe the property name should be more telling, like allow zero 
capacity sum."

I renamed the property + related variables, please check.

> FS-CS converter: handle case when weight=0
> --
>
> Key: YARN-10486
> URL: https://issues.apache.org/jira/browse/YARN-10486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10486-001.patch, YARN-10486-002.patch, 
> YARN-10486-003-approach2.patch, YARN-10486-004-approach2.patch, 
> YARN-10486-005.patch, YARN-10486-006.patch
>
>
> We can encounter an ArithmeticException if there is a single or multiple 
> queues under a parent with zero weight.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10486) FS-CS converter: handle case when weight=0

2020-11-17 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10486:

Attachment: YARN-10486-006.patch

> FS-CS converter: handle case when weight=0
> --
>
> Key: YARN-10486
> URL: https://issues.apache.org/jira/browse/YARN-10486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10486-001.patch, YARN-10486-002.patch, 
> YARN-10486-003-approach2.patch, YARN-10486-004-approach2.patch, 
> YARN-10486-005.patch, YARN-10486-006.patch
>
>
> We can encounter an ArithmeticException if there is a single or multiple 
> queues under a parent with zero weight.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10492) deadlock in rm

2020-11-17 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234230#comment-17234230
 ] 

Peter Bacsko commented on YARN-10492:
-

We need the full thread dump. We can see one thread having 
"AllocateResponseLock" and another waiting for it, but we don't see what holds 
"0x0003c0d0d6c0".



> deadlock in rm 
> ---
>
> Key: YARN-10492
> URL: https://issues.apache.org/jira/browse/YARN-10492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.1
>Reporter: brick yang
>Priority: Critical
>  Labels: 3.1.1
>
> version: HDP-3.1.5.0-152   hadoop3.1
> capacity scheduler
> yarn sometimes not change to active
> we found that jstack dump has deadlocked:
> "IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 
> tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry 
> [0x7fee09633000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:323)
>  - waiting to lock <0x00043e2e19d0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>  
>  
>  
>  
>  
>  
>  
> "IPC Server handler 8 on 8030" #280 daemon prio=5 os_prio=0 
> tid=0x7fee83823800 nid=0x63eb8 waiting on condition [0x7fee0ba57000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0003c0d0d6c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1664)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1997)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:676)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.releaseContainers(AbstractYarnScheduler.java:753)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1182)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:279)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433)
>  - locked <0x00043e2e19d0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>  at 

[jira] [Commented] (YARN-10492) deadlock in rm

2020-11-17 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233880#comment-17233880
 ] 

Wangda Tan commented on YARN-10492:
---

cc: [~snemeth]  , [~pbacsko], [~tangzhankun], [~sunil.gov...@gmail.com]. 

> deadlock in rm 
> ---
>
> Key: YARN-10492
> URL: https://issues.apache.org/jira/browse/YARN-10492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.1
>Reporter: brick yang
>Priority: Critical
>  Labels: 3.1.1
>
> version: HDP-3.1.5.0-152   hadoop3.1
> capacity scheduler
> yarn sometimes not change to active
> we found that jstack dump has deadlocked:
> "IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 
> tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry 
> [0x7fee09633000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:323)
>  - waiting to lock <0x00043e2e19d0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>  
>  
>  
>  
>  
>  
>  
> "IPC Server handler 8 on 8030" #280 daemon prio=5 os_prio=0 
> tid=0x7fee83823800 nid=0x63eb8 waiting on condition [0x7fee0ba57000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0003c0d0d6c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1664)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1997)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:676)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.releaseContainers(AbstractYarnScheduler.java:753)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1182)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:279)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433)
>  - locked <0x00043e2e19d0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>  at 

[jira] [Commented] (YARN-10457) Add a configuration switch to change between legacy and JSON placement rule format

2020-11-17 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233861#comment-17233861
 ] 

Hadoop QA commented on YARN-10457:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 1 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
47s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 10s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
51s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} |  | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 17s{color} |  | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} |  | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 21s{color} 
| 

[jira] [Commented] (YARN-7200) SLS generates a realtimetrack.json file but that file is missing the closing ']'

2020-11-17 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233846#comment-17233846
 ] 

Hadoop QA commented on YARN-7200:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} |  | {color:red} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
46s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 32s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
50s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
46s{color} |  | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 24s{color} |  | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
55s{color} |  | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  2m 39s{color} 
| 

[jira] [Updated] (YARN-10494) CLI tool for docker-to-squashfs conversion (pure Java)

2020-11-17 Thread Craig Condit (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit updated YARN-10494:

Affects Version/s: 3.3.0

> CLI tool for docker-to-squashfs conversion (pure Java)
> --
>
> Key: YARN-10494
> URL: https://issues.apache.org/jira/browse/YARN-10494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Major
> Attachments: docker-to-squashfs-conversion-tool-design.pdf
>
>
> *YARN-9564* defines a docker-to-squashfs image conversion tool that relies on 
> python2, multiple libraries, squashfs-tools and root access in order to 
> convert Docker images to squashfs images for use with the runc container 
> runtime in YARN.
> *YARN-9943* was created to investigate alternatives, as the response to 
> merging YARN-9564 has not been very positive. This proposal outlines the 
> design for a CLI conversion tool in 100% pure Java that will work out of the 
> box.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9943) Investigate image/tag management and how to do docker-to-squash conversion at scale

2020-11-17 Thread Craig Condit (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233783#comment-17233783
 ] 

Craig Condit commented on YARN-9943:


I've created YARN-10493 (to handle repository / tag management changes) and 
YARN-10494 to handle creation of a CLI tool to manage images.

> Investigate image/tag management and how to do docker-to-squash conversion at 
> scale
> ---
>
> Key: YARN-9943
> URL: https://issues.apache.org/jira/browse/YARN-9943
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Priority: Major
>
> Currently, the RuncContainerRuntime creates its images using the 
> docker-to-squash.py script that is up in YARN-9564. However, this script 
> requires root to run and is not the most user-friendly. This JIRA is to 
> investigate how to make the process of uploading/importing docker images 
> easier for the RuncContainerRuntime. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10494) CLI tool for docker-to-squashfs conversion (pure Java)

2020-11-17 Thread Craig Condit (Jira)
Craig Condit created YARN-10494:
---

 Summary: CLI tool for docker-to-squashfs conversion (pure Java)
 Key: YARN-10494
 URL: https://issues.apache.org/jira/browse/YARN-10494
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Reporter: Craig Condit
Assignee: Craig Condit
 Attachments: docker-to-squashfs-conversion-tool-design.pdf

*YARN-9564* defines a docker-to-squashfs image conversion tool that relies on 
python2, multiple libraries, squashfs-tools and root access in order to convert 
Docker images to squashfs images for use with the runc container runtime in 
YARN.

*YARN-9943* was created to investigate alternatives, as the response to merging 
YARN-9564 has not been very positive. This proposal outlines the design for a 
CLI conversion tool in 100% pure Java that will work out of the box.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7200) SLS generates a realtimetrack.json file but that file is missing the closing ']'

2020-11-17 Thread Agshin Kazimli (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233777#comment-17233777
 ] 

Agshin Kazimli commented on YARN-7200:
--

Keeping the old exit logic as it is and not changing any field or attribute 
signature, I have created second patch witch ensures the 
schedulerMetrics.tearDown() function call, which in turn closes the json file 
and stops the thread.

> SLS generates a realtimetrack.json file but that file is missing the closing 
> ']'
> 
>
> Key: YARN-7200
> URL: https://issues.apache.org/jira/browse/YARN-7200
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Grant Sohn
>Assignee: Agshin Kazimli
>Priority: Minor
>  Labels: newbie, newbie++
> Attachments: YARN-7200-branch-trunk.patch, YARN-7200.002.patch, 
> realtimetrack.json, snemeth-testing-20201113.zip
>
>
> File 
> hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java
>  shows:
> {noformat}
>   void tearDown() throws Exception {
> if (metricsLogBW != null)  {
>   metricsLogBW.write("]");
>   metricsLogBW.close();
> }
> 
> {noformat}
> So the exit logic is flawed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7200) SLS generates a realtimetrack.json file but that file is missing the closing ']'

2020-11-17 Thread Agshin Kazimli (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Agshin Kazimli updated YARN-7200:
-
Attachment: YARN-7200.002.patch

> SLS generates a realtimetrack.json file but that file is missing the closing 
> ']'
> 
>
> Key: YARN-7200
> URL: https://issues.apache.org/jira/browse/YARN-7200
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Grant Sohn
>Assignee: Agshin Kazimli
>Priority: Minor
>  Labels: newbie, newbie++
> Attachments: YARN-7200-branch-trunk.patch, YARN-7200.002.patch, 
> realtimetrack.json, snemeth-testing-20201113.zip
>
>
> File 
> hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java
>  shows:
> {noformat}
>   void tearDown() throws Exception {
> if (metricsLogBW != null)  {
>   metricsLogBW.write("]");
>   metricsLogBW.close();
> }
> 
> {noformat}
> So the exit logic is flawed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8558) NM recovery level db not cleaned up properly on container finish

2020-11-17 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233767#comment-17233767
 ] 

Jim Brennan commented on YARN-8558:
---

I have committed this to branch-2.10.

 

> NM recovery level db not cleaned up properly on container finish
> 
>
> Key: YARN-8558
> URL: https://issues.apache.org/jira/browse/YARN-8558
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Bibin Chundatt
>Assignee: Bibin Chundatt
>Priority: Critical
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: YARN-8558-branch-2.10.001.patch, 
> YARN-8558-branch-3.0.002.patch, YARN-8558-branch-3.0.003.patch, 
> YARN-8558.001.patch, YARN-8558.002.patch
>
>
> {code}
> 2018-07-20 16:49:23,117 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1531994217928_0054 transitioned from NEW to INITING
> 2018-07-20 16:49:23,204 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_18 with incomplete 
> records
> 2018-07-20 16:49:23,204 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_19 with incomplete 
> records
> 2018-07-20 16:49:23,204 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_20 with incomplete 
> records
> 2018-07-20 16:49:23,205 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_21 with incomplete 
> records
> 2018-07-20 16:49:23,205 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_22 with incomplete 
> records
> 2018-07-20 16:49:23,205 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_23 with incomplete 
> records
> 2018-07-20 16:49:23,205 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_24 with incomplete 
> records
> 2018-07-20 16:49:23,205 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_25 with incomplete 
> records
> 2018-07-20 16:49:23,205 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_38 with incomplete 
> records
> 2018-07-20 16:49:23,205 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_39 with incomplete 
> records
> 2018-07-20 16:49:23,206 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_41 with incomplete 
> records
> 2018-07-20 16:49:23,206 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_44 with incomplete 
> records
> 2018-07-20 16:49:23,206 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_46 with incomplete 
> records
> 2018-07-20 16:49:23,206 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_49 with incomplete 
> records
> 2018-07-20 16:49:23,206 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_52 with incomplete 
> records
> 2018-07-20 16:49:23,206 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_54 with incomplete 
> records
> 2018-07-20 16:49:23,206 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_73 with incomplete 
> records
> 2018-07-20 16:49:23,207 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_74 with incomplete 
> records
> 2018-07-20 16:49:23,207 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_75 with incomplete 
> records
> 2018-07-20 16:49:23,207 WARN 
> 

[jira] [Created] (YARN-10493) RunC container repository v2

2020-11-17 Thread Craig Condit (Jira)
Craig Condit created YARN-10493:
---

 Summary: RunC container repository v2
 Key: YARN-10493
 URL: https://issues.apache.org/jira/browse/YARN-10493
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, yarn
Affects Versions: 3.3.0
Reporter: Craig Condit
Assignee: Craig Condit
 Attachments: runc-container-repository-v2-design.pdf

The current runc container repository design has scalability and usability 
issues which will likely limit widespread adoption. We should address this with 
a new, V2 layout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10457) Add a configuration switch to change between legacy and JSON placement rule format

2020-11-17 Thread Gergely Pollak (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollak updated YARN-10457:
--
Attachment: YARN-10457.002.patch

> Add a configuration switch to change between legacy and JSON placement rule 
> format
> --
>
> Key: YARN-10457
> URL: https://issues.apache.org/jira/browse/YARN-10457
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10457.001.patch, YARN-10457.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8558) NM recovery level db not cleaned up properly on container finish

2020-11-17 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233674#comment-17233674
 ] 

Hadoop QA commented on YARN-8558:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 
22s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 1 new or modified 
test files. {color} |
|| || || || {color:brown} branch-2.10 Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
11s{color} |  | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} |  | {color:green} branch-2.10 passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} |  | {color:green} branch-2.10 passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~16.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} |  | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} |  | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} |  | {color:green} branch-2.10 passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} |  | {color:green} branch-2.10 passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~16.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m  
7s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
6s{color} |  | {color:green} branch-2.10 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} |  | {color:green} the patch passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~16.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 16s{color} | 
[/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/303/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt]
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 1 new + 19 unchanged - 0 fixed = 20 total (was 19) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} |  | {color:green} the patch passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~16.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} |  | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 
56s{color} |  | {color:green} hadoop-yarn-server-nodemanager in the patch 
passed. {color} |
| 

[jira] [Reopened] (YARN-8558) NM recovery level db not cleaned up properly on container finish

2020-11-17 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan reopened YARN-8558:
---

Re-opening so I can put up a patch for branch-2.10.

> NM recovery level db not cleaned up properly on container finish
> 
>
> Key: YARN-8558
> URL: https://issues.apache.org/jira/browse/YARN-8558
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Bibin Chundatt
>Assignee: Bibin Chundatt
>Priority: Critical
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: YARN-8558-branch-3.0.002.patch, 
> YARN-8558-branch-3.0.003.patch, YARN-8558.001.patch, YARN-8558.002.patch
>
>
> {code}
> 2018-07-20 16:49:23,117 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1531994217928_0054 transitioned from NEW to INITING
> 2018-07-20 16:49:23,204 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_18 with incomplete 
> records
> 2018-07-20 16:49:23,204 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_19 with incomplete 
> records
> 2018-07-20 16:49:23,204 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_20 with incomplete 
> records
> 2018-07-20 16:49:23,205 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_21 with incomplete 
> records
> 2018-07-20 16:49:23,205 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_22 with incomplete 
> records
> 2018-07-20 16:49:23,205 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_23 with incomplete 
> records
> 2018-07-20 16:49:23,205 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_24 with incomplete 
> records
> 2018-07-20 16:49:23,205 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_25 with incomplete 
> records
> 2018-07-20 16:49:23,205 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_38 with incomplete 
> records
> 2018-07-20 16:49:23,205 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_39 with incomplete 
> records
> 2018-07-20 16:49:23,206 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_41 with incomplete 
> records
> 2018-07-20 16:49:23,206 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_44 with incomplete 
> records
> 2018-07-20 16:49:23,206 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_46 with incomplete 
> records
> 2018-07-20 16:49:23,206 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_49 with incomplete 
> records
> 2018-07-20 16:49:23,206 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_52 with incomplete 
> records
> 2018-07-20 16:49:23,206 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_54 with incomplete 
> records
> 2018-07-20 16:49:23,206 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_73 with incomplete 
> records
> 2018-07-20 16:49:23,207 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_74 with incomplete 
> records
> 2018-07-20 16:49:23,207 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container container_1531994217928_0001_01_75 with incomplete 
> records
> 2018-07-20 16:49:23,207 WARN 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService:
>  Remove container 

[jira] [Commented] (YARN-10486) FS-CS converter: handle case when weight=0

2020-11-17 Thread Benjamin Teke (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233545#comment-17233545
 ] 

Benjamin Teke commented on YARN-10486:
--

Thanks [~pbacsko].

_Note that there are two places in ParentQueue.java where you have to switch 
between normal and relaxed validation. The previous patch was incorrect. The 
"if" conditions might not be completely obvious, let me know if it's confusing 
I'll and try to rewrite it._

With the added comments I think the ifs are clear. Small nit: in both cases 
there are unnecessary parens.

 
_Another thing is that we can't just let the queues to have arbitrary capacity. 
It would be good but the capacity values propagate towards the top (root) and 
eventually somewhere along the line it will fail. So basically you'd need to 
set the new property to "true" for a given hierarchy, which is really not 
convenient. So the sum of capacities must be either 0% or 100%. But it's still 
more lenient than the current code._

I see, and agree on the not convenient part. In this case maybe the property 
name should be more telling, like _allow zero capacity sum_.
 

> FS-CS converter: handle case when weight=0
> --
>
> Key: YARN-10486
> URL: https://issues.apache.org/jira/browse/YARN-10486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10486-001.patch, YARN-10486-002.patch, 
> YARN-10486-003-approach2.patch, YARN-10486-004-approach2.patch, 
> YARN-10486-005.patch
>
>
> We can encounter an ArithmeticException if there is a single or multiple 
> queues under a parent with zero weight.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10486) FS-CS converter: handle case when weight=0

2020-11-17 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233471#comment-17233471
 ] 

Hadoop QA commented on YARN-10486:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 4 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 25m 
54s{color} | 
[/branch-mvninstall-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/302/artifact/out/branch-mvninstall-root.txt]
 | {color:red} root in trunk failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
28s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} |  | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
24s{color} | 
[/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/302/artifact/out/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt]
 | {color:red} hadoop-yarn-server-resourcemanager in trunk failed. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  5m 
55s{color} | 
[/branch-shadedclient.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/302/artifact/out/branch-shadedclient.txt]
 | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
51s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} |  | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
52s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.9+11-Ubuntu-0ubuntu1.18.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_272-8u272-b10-0ubuntu1~18.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 31s{color} | 
[/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/302/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt]
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 128 unchanged - 0 fixed = 129 total (was 128) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} |  | {color:green} the patch passed 

[jira] [Updated] (YARN-10492) deadlock in rm

2020-11-17 Thread brick yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

brick yang updated YARN-10492:
--
Priority: Critical  (was: Blocker)

> deadlock in rm 
> ---
>
> Key: YARN-10492
> URL: https://issues.apache.org/jira/browse/YARN-10492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.1
>Reporter: brick yang
>Priority: Critical
>  Labels: 3.1.1
>
> version: HDP-3.1.5.0-152   hadoop3.1
> capacity scheduler
> yarn sometimes not change to active
> we found that jstack dump has deadlocked:
> "IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 
> tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry 
> [0x7fee09633000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:323)
>  - waiting to lock <0x00043e2e19d0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>  
>  
>  
>  
>  
>  
>  
> "IPC Server handler 8 on 8030" #280 daemon prio=5 os_prio=0 
> tid=0x7fee83823800 nid=0x63eb8 waiting on condition [0x7fee0ba57000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0003c0d0d6c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1664)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1997)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:676)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.releaseContainers(AbstractYarnScheduler.java:753)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1182)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:279)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433)
>  - locked <0x00043e2e19d0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>  at java.security.AccessController.doPrivileged(Native Method)
> 

[jira] [Updated] (YARN-10492) deadlock in rm

2020-11-17 Thread brick yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

brick yang updated YARN-10492:
--
Description: 
version: HDP-3.1.5.0-152   hadoop3.1

capacity scheduler

yarn sometimes not change to active

we found that jstack dump has deadlocked:

"IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 
tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry 
[0x7fee09633000]
 java.lang.Thread.State: BLOCKED (on object monitor)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:323)
 - waiting to lock <0x00043e2e19d0> (a 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
 at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75)
 at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

 

 

 

 

 

 

 

"IPC Server handler 8 on 8030" #280 daemon prio=5 os_prio=0 
tid=0x7fee83823800 nid=0x63eb8 waiting on condition [0x7fee0ba57000]
 java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x0003c0d0d6c0> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
 at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1664)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1997)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:676)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.releaseContainers(AbstractYarnScheduler.java:753)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1182)
 at 
org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:279)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53)
 at 
org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433)
 - locked <0x00043e2e19d0> (a 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService$AllocateResponseLock)
 at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
 at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

 

 

 

 

 

  was:
version: HDP-3.1.5.0-152   hadoop3.1

yarn sometimes not change to active

we found that jstack dump has deadlocked:

"IPC Server handler 44 on 8030" #316 daemon prio=5 os_prio=0 
tid=0x7fee8216e800 nid=0x63edc waiting for monitor entry 
[0x7fee09633000]
 java.lang.Thread.State: BLOCKED (on object