[jira] [Updated] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-27 Thread Tao Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9714:
---
Attachment: YARN-9714.005.patch

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch, 
> YARN-9714.003.patch, YARN-9714.004.patch, YARN-9714.005.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-27 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917482#comment-16917482
 ] 

Tao Yang commented on YARN-9714:


{quote}
Instead of comparing, how about checking for resourceManager.getZKManager() == 
null? This basically sync the code where zkManager initialization to closing it.
{quote}
Make sense to me. Attached v5 patch for this, thanks!

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch, 
> YARN-9714.003.patch, YARN-9714.004.patch, YARN-9714.005.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9789) Disable Option for Write Ahead Logs of LogMutation

2019-08-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917436#comment-16917436
 ] 

Hadoop QA commented on YARN-9789:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  8s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 87m 
15s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}149m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9789 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12978721/YARN-9789-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6253fb013fd6 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b1eee8b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24648/testReport/ |
| Max. process+thread count | 813 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24648/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Disable Option for Write Ahead Logs o

[jira] [Commented] (YARN-9664) Improve response of scheduler/app activities for better understanding

2019-08-27 Thread Weiwei Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917421#comment-16917421
 ] 

Weiwei Yang commented on YARN-9664:
---

Just got time to take a look at this.
Wow, this is a huge set of changes.  [~Tao Yang], have you verified the output 
of these changes are expected?
I will try to go through the changes and hopefully, I can contribute enough 
review comments.
Thx

> Improve response of scheduler/app activities for better understanding
> -
>
> Key: YARN-9664
> URL: https://issues.apache.org/jira/browse/YARN-9664
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9664.001.patch, YARN-9664.002.patch
>
>
> Currently some diagnostics are not easy enough to understand for common 
> users, and I found some places still need to be improved such as no partition 
> information and lacking of necessary activities. This issue is to improve 
> these shortcomings.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9738) Remove lock on ClusterNodeTracker#getNodeReport as it blocks application submission

2019-08-27 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917408#comment-16917408
 ] 

Bilwa S T commented on YARN-9738:
-

Hi [~bibinchundatt] i have fixed the testcases by adding null check in 
ClusterNodeTracker#exists and ClusterNodeTracker#getNode. 

> Remove lock on ClusterNodeTracker#getNodeReport as it blocks application 
> submission
> ---
>
> Key: YARN-9738
> URL: https://issues.apache.org/jira/browse/YARN-9738
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9738-001.patch, YARN-9738-002.patch, 
> YARN-9738-003.patch
>
>
> *Env :*
> Server OS :- UBUNTU
> No. of Cluster Node:- 9120 NMs
> Env Mode:- [Secure / Non secure]Secure
> *Preconditions:*
> ~9120 NM's was running
> ~1250 applications was in running state 
> 35K applications was in pending state
> *Test Steps:*
> 1. Submit the application from 5 clients, each client 2 threads and total 10 
> queues
> 2. Once application submittion increases (for each application of 
> distributted shell will call getClusterNodes)
> *ClientRMservice#getClusterNodes tries to get 
> ClusterNodeTracker#getNodeReport where map nodes is locked.*
> {quote}
> "IPC Server handler 36 on 45022" #246 daemon prio=5 os_prio=0 
> tid=0x7f75095de000 nid=0x1949c waiting on condition [0x7f74cff78000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x7f759f6d8858> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.getNodeReport(ClusterNodeTracker.java:123)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getNodeReport(AbstractYarnScheduler.java:449)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.createNodeReports(ClientRMService.java:1067)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getClusterNodes(ClientRMService.java:992)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:313)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:589)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2792)
> {quote}
> *Instead we can make nodes as concurrentHashMap and remove readlock*



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-27 Thread Rohith Sharma K S (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917407#comment-16917407
 ] 

Rohith Sharma K S commented on YARN-9714:
-

Thanks [~Tao Yang] for the patch! Instead of comparing, how about checking for 
resourceManager.getZKManager() == null? This basically sync the code where 
zkManager initialization to closing it. However I don't see any issue with 
current patch as well

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch, 
> YARN-9714.003.patch, YARN-9714.004.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5913) Consolidate "resource" and "amResourceRequest" in ApplicationSubmissionContext

2019-08-27 Thread Yousef Abu-Salah (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917402#comment-16917402
 ] 

Yousef Abu-Salah commented on YARN-5913:


Can I get a more descriptive rundown of the problem?

> Consolidate "resource" and "amResourceRequest" in ApplicationSubmissionContext
> --
>
> Key: YARN-5913
> URL: https://issues.apache.org/jira/browse/YARN-5913
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Priority: Minor
>  Labels: newbie
>
> Usage of these two variables overlaps and causes confusion. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5913) Consolidate "resource" and "amResourceRequest" in ApplicationSubmissionContext

2019-08-27 Thread Yufei Gu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917399#comment-16917399
 ] 

Yufei Gu commented on YARN-5913:


[~ykabusalah] feel free to take any Jira without assignee.

> Consolidate "resource" and "amResourceRequest" in ApplicationSubmissionContext
> --
>
> Key: YARN-5913
> URL: https://issues.apache.org/jira/browse/YARN-5913
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Priority: Minor
>  Labels: newbie
>
> Usage of these two variables overlaps and causes confusion. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6425) Move out FS state dump code out of method update()

2019-08-27 Thread Yufei Gu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917398#comment-16917398
 ] 

Yufei Gu commented on YARN-6425:


[~ykabusalah] feel free to do that.

> Move out FS state dump code out of method update()
> --
>
> Key: YARN-6425
> URL: https://issues.apache.org/jira/browse/YARN-6425
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Yufei Gu
>Priority: Major
>  Labels: newbie++
>
> Better to move out FS state dump code out of update()
> {code}
> if (LOG.isDebugEnabled()) {
>   if (--updatesToSkipForDebug < 0) {
> updatesToSkipForDebug = UPDATE_DEBUG_FREQUENCY;
> dumpSchedulerState();
>   }
> }
> {code}
> And, after that we should distinct between update call and update thread 
> duration like before YARN-6112. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-08-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917374#comment-16917374
 ] 

Hadoop QA commented on YARN-9562:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 36s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 37 new + 689 unchanged - 2 fixed = 726 total (was 691) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
33s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 50s{color} 
| {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 44s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}122m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
|  |  Nullcheck of NodeManager.context at line 535 of value previously 
dereferenced in 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop()  At 
NodeManager.java:535 of value previously dereferenced in 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop()  At 
NodeManager.java:[line 532] |
|  |  Unused field:NodeManager.java |
| Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields |
|   | hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor |
\\
\\
|| Subsystem || Report/Not

[jira] [Commented] (YARN-9755) RM fails to start with FileSystemBasedConfigurationProvider

2019-08-27 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917237#comment-16917237
 ] 

Hudson commented on YARN-9755:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17191 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17191/])
YARN-9755. Fixed RM failing to start when (eyang: rev 
717c853873dd3b9112f5c15059a24655b8654607)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/FileSystemBasedConfigurationProvider.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java


> RM fails to start with FileSystemBasedConfigurationProvider
> ---
>
> Key: YARN-9755
> URL: https://issues.apache.org/jira/browse/YARN-9755
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9755-001.patch, YARN-9755-002.patch, 
> YARN-9755-003.patch, YARN-9755-004.patch
>
>
> RM fails to start with below exception when 
> FileSystemBasedConfigurationProvider is used.
> *Exception:*
> {code}
> 2019-08-16 12:05:33,802 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> java.io.IOException: Filesystem closed
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:868)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1281)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:1312)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1335)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1328)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1379)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1567)
> Caused by: java.io.IOException: java.io.IOException: Filesystem closed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FileBasedCSConfigurationProvider.loadConfiguration(FileBasedCSConfigurationProvider.java:64)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:445)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> ... 14 more
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:475)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1682)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1586)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1598)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1701)
> at 
> org.apache.hadoop.yarn.FileSystemBase

[jira] [Commented] (YARN-9438) launchTime not written to state store for running applications

2019-08-27 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917242#comment-16917242
 ] 

Hudson commented on YARN-9438:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17191 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17191/])
YARN-9438. launchTime not written to state store for running (jhung: rev 
8ef46595da6aefe4458aa7181670c3d9b13e7ec6)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java


> launchTime not written to state store for running applications
> --
>
> Key: YARN-9438
> URL: https://issues.apache.org/jira/browse/YARN-9438
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9438-branch-2.001.patch, 
> YARN-9438-branch-2.002.patch, YARN-9438.001.patch, YARN-9438.002.patch, 
> YARN-9438.003.patch, YARN-9438.004.patch
>
>
> launchTime is only saved to state store after application finishes, so if 
> restart happens, any running applications will have launchTime set as -1 
> (since this is the default timestamp of the recovery event).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9770) Create a queue ordering policy which picks child queues with equal probability

2019-08-27 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917227#comment-16917227
 ] 

Jonathan Hung commented on YARN-9770:
-

Thanks [~haibochen]. Makes sense. Attached 003 addressing these comments.

> Create a queue ordering policy which picks child queues with equal probability
> --
>
> Key: YARN-9770
> URL: https://issues.apache.org/jira/browse/YARN-9770
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9770.001.patch, YARN-9770.002.patch, 
> YARN-9770.003.patch
>
>
> Ran some simulations with the default queue_utilization_ordering_policy:
> An underutilized queue which receives an application with many (thousands) 
> resource requests will hog scheduler allocations for a long time (on the 
> order of a minute). In the meantime apps are getting submitted to all other 
> queues, which increases activeUsers in these queues, which drops user limit 
> in these queues to small values if minimum-user-limit-percent is configured 
> to small values (e.g. 10%).
> To avoid this issue, we assign to queues with equal probability, to avoid 
> scenarios where queues don't get allocations for a long time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9770) Create a queue ordering policy which picks child queues with equal probability

2019-08-27 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9770:

Attachment: YARN-9770.003.patch

> Create a queue ordering policy which picks child queues with equal probability
> --
>
> Key: YARN-9770
> URL: https://issues.apache.org/jira/browse/YARN-9770
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9770.001.patch, YARN-9770.002.patch, 
> YARN-9770.003.patch
>
>
> Ran some simulations with the default queue_utilization_ordering_policy:
> An underutilized queue which receives an application with many (thousands) 
> resource requests will hog scheduler allocations for a long time (on the 
> order of a minute). In the meantime apps are getting submitted to all other 
> queues, which increases activeUsers in these queues, which drops user limit 
> in these queues to small values if minimum-user-limit-percent is configured 
> to small values (e.g. 10%).
> To avoid this issue, we assign to queues with equal probability, to avoid 
> scenarios where queues don't get allocations for a long time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9754) Add support for arbitrary DAG AM Simulator.

2019-08-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917226#comment-16917226
 ] 

Hadoop QA commented on YARN-9754:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  3s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 12s{color} | {color:orange} hadoop-tools/hadoop-sls: The patch generated 2 
new + 12 unchanged - 0 fixed = 14 total (was 12) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 54s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 
42s{color} | {color:green} hadoop-sls in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9754 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12978695/YARN-9754.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  xml  findbugs  checkstyle  |
| uname | Linux 543a1d9eb17e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 66cfa48 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24646/artifact/out/diff-checkstyle-hadoop-tools_hadoop-sls.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24646/testReport/ |
| Max. process+thread count | 429 (vs. ulimit of 5500) |
| modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls |
| Console output |

[jira] [Updated] (YARN-9770) Create a queue ordering policy which picks child queues with equal probability

2019-08-27 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9770:

Attachment: (was: YARN-9770.003.patch)

> Create a queue ordering policy which picks child queues with equal probability
> --
>
> Key: YARN-9770
> URL: https://issues.apache.org/jira/browse/YARN-9770
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9770.001.patch, YARN-9770.002.patch
>
>
> Ran some simulations with the default queue_utilization_ordering_policy:
> An underutilized queue which receives an application with many (thousands) 
> resource requests will hog scheduler allocations for a long time (on the 
> order of a minute). In the meantime apps are getting submitted to all other 
> queues, which increases activeUsers in these queues, which drops user limit 
> in these queues to small values if minimum-user-limit-percent is configured 
> to small values (e.g. 10%).
> To avoid this issue, we assign to queues with equal probability, to avoid 
> scenarios where queues don't get allocations for a long time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9770) Create a queue ordering policy which picks child queues with equal probability

2019-08-27 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9770:

Attachment: YARN-9770.003.patch

> Create a queue ordering policy which picks child queues with equal probability
> --
>
> Key: YARN-9770
> URL: https://issues.apache.org/jira/browse/YARN-9770
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9770.001.patch, YARN-9770.002.patch
>
>
> Ran some simulations with the default queue_utilization_ordering_policy:
> An underutilized queue which receives an application with many (thousands) 
> resource requests will hog scheduler allocations for a long time (on the 
> order of a minute). In the meantime apps are getting submitted to all other 
> queues, which increases activeUsers in these queues, which drops user limit 
> in these queues to small values if minimum-user-limit-percent is configured 
> to small values (e.g. 10%).
> To avoid this issue, we assign to queues with equal probability, to avoid 
> scenarios where queues don't get allocations for a long time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9738) Remove lock on ClusterNodeTracker#getNodeReport as it blocks application submission

2019-08-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917148#comment-16917148
 ] 

Hadoop QA commented on YARN-9738:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 53s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 83m 
56s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}132m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9738 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12978682/YARN-9738-003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a4dc6e8d4d94 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 66cfa48 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24645/testReport/ |
| Max. process+thread count | 802 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24645/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Remove lock on ClusterNodeTracker#getN

[jira] [Commented] (YARN-9561) Add C changes for the new RuncContainerRuntime

2019-08-27 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917140#comment-16917140
 ] 

Eric Badger commented on YARN-9561:
---

Hey [~eyang], in the meantime I got a chance to go back through your patch 
comments from a little while ago. Let me know if you have any questions.

bq. In setup_container_paths, it would be good to use fprintf instead of fputs 
and include the actual path location. This helps system admin to debug the 
error more precisely.
I don't think this should be added in the code that was created by this patch. 
The functions that are called from within {{setup_container_paths()}} are the 
ones that should give more fine-grained error logging. Because at the 
{{setup_container_paths()}} level I have no idea which directory failed to be 
created. I would only be able to give a large list. So I think this is an 
improvement for the underlying functions, but isn't relevant to this specific 
patch. 

bq. Can make_string be used instead of strbuf_append_fmt for readability reason 
and reduce the need of string format functions? The 16k size seem like a limit 
that is easy to reach. make_string may use more memory during string 
construction, but maybe it is safer?
The 16k size is just the initial size to create 
{{upperdir=%s,workdir=%s,lowerdir=}}. Upperdir and workdir can only be single 
directories, so the only way the 16k limit would be exceeded is if these 
directory paths were absolutely gigantic. Inside of {{strbuf_append_fmt()}} we 
take into account how much memory needs to be allocated and reallocate that 
amount. So unless the system itself is out of memory, we will be able to 
allocate the full buffer regardless of the number of layers. This would be much 
harder to do if we used {{make_string()}} because we would have to concatenate 
the strings together or we would end up doing something like what 
{{strbuf_append_fmt()}} does anyway.

bq. This part of code doesn't seem to have any effect:
{noformat}
  de = is_docker_support_enabled() ? enabled : disabled;
  fprintf(stream,
  "%11s launch docker container:  %2d appid containerid workdir "
  "container-script tokens pidfile nm-local-dirs nm-log-dirs "
  "docker-command-file resources ", de, LAUNCH_DOCKER_CONTAINER);
{noformat}
It has no effect on the patch related to the OCI code, yes. We fixed up the 
display code to be a little bit cleaner and so we modified this to be in line 
with the cleaner style.

> Add C changes for the new RuncContainerRuntime
> --
>
> Key: YARN-9561
> URL: https://issues.apache.org/jira/browse/YARN-9561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9561.001.patch, YARN-9561.002.patch, 
> YARN-9561.003.patch, YARN-9561.004.patch
>
>
> This JIRA will be used to add the C changes to the container-executor native 
> binary that are necessary for the new RuncContainerRuntime. There should be 
> no changes to existing code paths. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9781) SchedConfCli to get current stored scheduler configuration

2019-08-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917134#comment-16917134
 ] 

Hadoop QA commented on YARN-9781:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
52s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 21 unchanged - 3 fixed = 21 total (was 24) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
39s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 83m 
50s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m 
50s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}194m 48s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9781 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12978658/YARN-9781-003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  xml  findbugs  checkstyle  |
| uname | Linux 0548de174e9e 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x8

[jira] [Commented] (YARN-9785) Application gets activated even when AM memory has reached

2019-08-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917093#comment-16917093
 ] 

Hadoop QA commented on YARN-9785:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 55s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
40s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 60m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.0 Server=19.03.0 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9785 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12978680/YARN-9785-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5961f47f6030 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 
22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 66cfa48 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24644/testReport/ |
| Max. process+thread count | 308 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24644/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Application gets activated even when AM memory has reached
> --
>
> 

[jira] [Commented] (YARN-9561) Add C changes for the new RuncContainerRuntime

2019-08-27 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917076#comment-16917076
 ] 

Eric Badger commented on YARN-9561:
---

bq. Do you mean C side? Java side does not have privileges to run modprobe or 
lsmod due to lack of root privileges.
I don't believe we need root privileges to run lsmod. It simply parses 
/proc/modules. On RHEL 7 this file is world-readable. So I think an {{lsmod | 
grep overlay}} would be sufficient.

bq. It took me several days to restore my cluster to a working state with 
overlay kernel module installed. In the latest patch 004, mapreduce pi job 
fails when trying to run mapreduce pi:
If you're failing in java then that means that the overlay mounts all worked 
and that runC has been correctly invoked. That's fantastic news! We're very 
close.

bq. Do we need implicit mounting of Hadoop binaries to enable existing workload 
to run with runc? If not, what step can be used to run an example app?
I don't have any Hadoop jars in the image or bind-mounted into the image. 
Instead I'm running using a Hadoop tarball in HDFS.

{noformat:title=mapred-site.xml}
  
mapreduce.application.framework.path

${fs.defaultFS}/user/ebadger/hadoop-3.3.0-SNAPSHOT.tar.gz#hadoop-mapreduce
  

  
mapreduce.application.classpath

./hadoop-mapreduce/hadoop-3.3.0-SNAPSHOT/share/hadoop/common/*,./hadoop-mapreduce/hadoop-3.3.0-SNAPSHOT/share/hadoop/common/lib/*,./hadoop-mapreduce/hadoop-3.3.0-SNAPSHOT/share/hadoop/hdfs/*,./hadoop-mapreduce/hadoop-3.3.0-SNAPSHOT/share/hadoop/hdfs/lib/*,./hadoop-mapreduce/hadoop-3.3.0-SNAPSHOT/share/hadoop/yarn/*,./hadoop-mapreduce/hadoop-3.3.0-SNAPSHOT/share/hadoop/yarn/lib/*,./hadoop-mapreduce/hadoop-3.3.0-SNAPSHOT/share/hadoop/mapreduce/*,./hadoop-mapreduce/hadoop-3.3.0-SNAPSHOT/share/hadoop/mapreduce/lib/*
  
{noformat}

If you would like to bind-mount the hadoop jars instead, you can add them to 
the default mount list 
{{yarn.nodemanager.runtime.linux.docker.default-rw-mounts}} or 
{{yarn.nodemanager.runtime.linux.docker.default-ro-mounts}} (don't think you 
should need them to be writable). You can choose where in the image that you'd 
like them to be mounted and then set your classpath up to reflect where the 
jars are located.

{noformat:title=Default Mount List Example}
 
   yarn.nodemanager.runtime.linux.docker.default-rw-mounts
   /var/run/nscd:/var/run/nscd
 
{noformat}


> Add C changes for the new RuncContainerRuntime
> --
>
> Key: YARN-9561
> URL: https://issues.apache.org/jira/browse/YARN-9561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9561.001.patch, YARN-9561.002.patch, 
> YARN-9561.003.patch, YARN-9561.004.patch
>
>
> This JIRA will be used to add the C changes to the container-executor native 
> binary that are necessary for the new RuncContainerRuntime. There should be 
> no changes to existing code paths. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9789) Disable Option for Write Ahead Logs of LogMutation

2019-08-27 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9789:

Attachment: YARN-9789-001.patch

> Disable Option for Write Ahead Logs of LogMutation
> --
>
> Key: YARN-9789
> URL: https://issues.apache.org/jira/browse/YARN-9789
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9789-001.patch
>
>
> When yarn.scheduler.configuration.store.max-logs is set to zero, the 
> YARNConfigurationStore (ZK, LevelDB) reads the write ahead logs from the 
> backend which is not needed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9770) Create a queue ordering policy which picks child queues with equal probability

2019-08-27 Thread Haibo Chen (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917058#comment-16917058
 ] 

Haibo Chen commented on YARN-9770:
--

Thanks [~jhung] for the patch. The patch looks good to me overall. I have two 
minor comments.

1) Can we rename FairQueueOrderingPolicy to RandomQueueOrderingPolicy to reduce 
cognitive load as the notion of fairness has been used in FairScheduler for a 
different meaning?

2) In the constructor of RandomIterator, given that we kinda assume that the 
swap operation is efficient and we are only passing in ArrayList, how 
about we restrict the type to ArrayList?

The checkstyle issue can also be addressed.

> Create a queue ordering policy which picks child queues with equal probability
> --
>
> Key: YARN-9770
> URL: https://issues.apache.org/jira/browse/YARN-9770
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9770.001.patch, YARN-9770.002.patch
>
>
> Ran some simulations with the default queue_utilization_ordering_policy:
> An underutilized queue which receives an application with many (thousands) 
> resource requests will hog scheduler allocations for a long time (on the 
> order of a minute). In the meantime apps are getting submitted to all other 
> queues, which increases activeUsers in these queues, which drops user limit 
> in these queues to small values if minimum-user-limit-percent is configured 
> to small values (e.g. 10%).
> To avoid this issue, we assign to queues with equal probability, to avoid 
> scenarios where queues don't get allocations for a long time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9754) Add support for arbitrary DAG AM Simulator.

2019-08-27 Thread Jira


[ 
https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917057#comment-16917057
 ] 

Íñigo Goiri commented on YARN-9754:
---

Thanks [~abmodi] the test with the times looks good.
Minor comments:
* Make assertEquals a static import.
* Add some high level comment to testGetToBeScheduledContainers explaining you 
are asking for a container of X,Y an Z delay, etc.
* Is there something else we can assert in TestSLSDagAMSimulator that relates 
to sls_dag.json?

> Add support for arbitrary DAG AM Simulator.
> ---
>
> Key: YARN-9754
> URL: https://issues.apache.org/jira/browse/YARN-9754
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9754.001.patch, YARN-9754.002.patch, 
> YARN-9754.003.patch, YARN-9754.004.patch
>
>
> Currently, all map containers are requests as soon as Application master 
> comes up and then all reducer containers are requested. This doesn't get 
> flexibility to simulate behavior of DAG where various number of containers 
> would be requested at different time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9438) launchTime not written to state store for running applications

2019-08-27 Thread Haibo Chen (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917051#comment-16917051
 ] 

Haibo Chen commented on YARN-9438:
--

+1 on the latest 004 patch

> launchTime not written to state store for running applications
> --
>
> Key: YARN-9438
> URL: https://issues.apache.org/jira/browse/YARN-9438
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9438-branch-2.001.patch, 
> YARN-9438-branch-2.002.patch, YARN-9438.001.patch, YARN-9438.002.patch, 
> YARN-9438.003.patch, YARN-9438.004.patch
>
>
> launchTime is only saved to state store after application finishes, so if 
> restart happens, any running applications will have launchTime set as -1 
> (since this is the default timestamp of the recovery event).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-7585) NodeManager should go unhealthy when state store throws DBException

2019-08-27 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung reopened YARN-7585:
-

Reopening this issue to track the backport to branch-2.

Waiting for YARN-8200 merge to backport this (since the fix for this JIRA 
touches some of that code).

> NodeManager should go unhealthy when state store throws DBException 
> 
>
> Key: YARN-7585
> URL: https://issues.apache.org/jira/browse/YARN-7585
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>  Labels: release-blocker
> Fix For: 3.1.0
>
> Attachments: YARN-7585.001.patch, YARN-7585.002.patch, 
> YARN-7585.003.patch
>
>
> If work preserving recover is enabled the NM will not start up if the state 
> store does not initialise. However if the state store becomes unavailable 
> after that for any reason the NM will not go unhealthy. 
> Since the state store is not available new containers can not be started any 
> more and the NM should become unhealthy:
> {code}
> AMLauncher: Error launching appattempt_1508806289867_268617_01. Got 
> exception: org.apache.hadoop.yarn.exceptions.YarnException: 
> java.io.IOException: org.iq80.leveldb.DBException: IO error: 
> /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: 
> Read-only file system
> at o.a.h.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
> at 
> o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:721)
> ...
> Caused by: java.io.IOException: org.iq80.leveldb.DBException: IO error: 
> /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: 
> Read-only file system
> at 
> o.a.h.y.s.n.r.NMLeveldbStateStoreService.storeApplication(NMLeveldbStateStoreService.java:374)
> at 
> o.a.h.y.s.n.cm.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:848)
> at 
> o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:712)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9561) Add C changes for the new RuncContainerRuntime

2019-08-27 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917003#comment-16917003
 ] 

Eric Yang commented on YARN-9561:
-

[~ebadger] {quote}For your 3rd point, I think it would be better to do this 
check in Java. That way we can catch the failure earlier since all containers 
trying to run runC will fail if overlay is not installed. For the check, I was 
thinking of doing an lsmod on "overlay"{quote}

Do you mean C side?  Java side does not have privileges to run modprobe or 
lsmod due to lack of root privileges.

It took me several days to restore my cluster to a working state with overlay 
kernel module installed.  In the latest patch 004, mapreduce pi job fails when 
trying to run mapreduce pi:
{code}
vars="YARN_CONTAINER_RUNTIME_TYPE=runc,YARN_CONTAINER_RUNTIME_RUNC_IMAGE=local/java-centos:latest"
./bin/hadoop jar 
share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0-SNAPSHOT.jar pi 
-Dmapreduce.map.env=$vars -Dmapreduce.reduce.env=$vars 10 100
{code}

Observed log output:
{code}
2019-08-27 11:46:08,377 INFO mapreduce.Job: Task Id : 
attempt_1566930487263_0002_m_02_2, Status : FAILED
[2019-08-27 11:46:07.131]Exception from container-launch.
Container id: container_1566930487263_0002_01_30
Exit code: 1
Exception message: Launch container failed

[2019-08-27 11:46:07.133]Container exited with a non-zero exit code 1. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapred.YarnChild


[2019-08-27 11:46:07.134]Container exited with a non-zero exit code 1. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapred.YarnChild
{code}

My base container image only contains centos:latest, and java-1.8.0-openjdk rpm 
installed.  It does not have Hadoop binaries in the container.  Do we need 
implicit mounting of Hadoop binaries to enable existing workload to run with 
runc?  If not, what step can be used to run an example app?

> Add C changes for the new RuncContainerRuntime
> --
>
> Key: YARN-9561
> URL: https://issues.apache.org/jira/browse/YARN-9561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9561.001.patch, YARN-9561.002.patch, 
> YARN-9561.003.patch, YARN-9561.004.patch
>
>
> This JIRA will be used to add the C changes to the container-executor native 
> binary that are necessary for the new RuncContainerRuntime. There should be 
> no changes to existing code paths. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-08-27 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916977#comment-16916977
 ] 

Eric Badger commented on YARN-9562:
---

Patch 008 adds a bunch of checkstyle and findbugs cleanup

> Add Java changes for the new RuncContainerRuntime
> -
>
> Key: YARN-9562
> URL: https://issues.apache.org/jira/browse/YARN-9562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9562.001.patch, YARN-9562.002.patch, 
> YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, 
> YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch
>
>
> This JIRA will be used to add the Java changes for the new 
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the 
> existing DockerLinuxContainerRuntime code once it is moved up into an 
> abstract class that can be extended. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-08-27 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-9562:
--
Attachment: YARN-9562.008.patch

> Add Java changes for the new RuncContainerRuntime
> -
>
> Key: YARN-9562
> URL: https://issues.apache.org/jira/browse/YARN-9562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9562.001.patch, YARN-9562.002.patch, 
> YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, 
> YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch
>
>
> This JIRA will be used to add the Java changes for the new 
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the 
> existing DockerLinuxContainerRuntime code once it is moved up into an 
> abstract class that can be extended. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9755) RM fails to start with FileSystemBasedConfigurationProvider

2019-08-27 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916924#comment-16916924
 ] 

Prabhu Joseph commented on YARN-9755:
-

Thanks [~eyang].

> RM fails to start with FileSystemBasedConfigurationProvider
> ---
>
> Key: YARN-9755
> URL: https://issues.apache.org/jira/browse/YARN-9755
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9755-001.patch, YARN-9755-002.patch, 
> YARN-9755-003.patch, YARN-9755-004.patch
>
>
> RM fails to start with below exception when 
> FileSystemBasedConfigurationProvider is used.
> *Exception:*
> {code}
> 2019-08-16 12:05:33,802 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> java.io.IOException: Filesystem closed
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:868)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1281)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:1312)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1335)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1328)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1379)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1567)
> Caused by: java.io.IOException: java.io.IOException: Filesystem closed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FileBasedCSConfigurationProvider.loadConfiguration(FileBasedCSConfigurationProvider.java:64)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:346)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:445)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> ... 14 more
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:475)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1682)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1586)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1598)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1701)
> at 
> org.apache.hadoop.yarn.FileSystemBasedConfigurationProvider.getConfigurationInputStream(FileSystemBasedConfigurationProvider.java:62)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FileBasedCSConfigurationProvider.loadConfiguration(FileBasedCSConfigurationProvider.java:56)
> {code}
> FileSystemBasedConfigurationProvider uses the cached FileSystem causing the 
> issue.
> *Configs:*
> {code}
> yarn.resourcemanager.configuration.provider-classorg.apache.hadoop.yarn.FileSystemBasedConfigurationProvider
> yarn.resourcemanager.configuration.file-system-based-store/yarn/conf
> [yarn@yarndocker-1 yarn]$ hadoop fs -ls /yarn/conf
> -rw-r--r--   3 yarn supergroup   4138 2019-08-16 13:09 
> 

[jira] [Updated] (YARN-9773) Add QueueMetrics for Custom Resources/Resource vectors

2019-08-27 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-9773:
-
Description: Although the custom resource metrics are calculated and saved 
as a QueueMetricsForCustomResources object within the QueueMetrics class, the 
JMX and Simon QueueMetrics do not report that information for custom resources. 

> Add QueueMetrics for Custom Resources/Resource vectors
> --
>
> Key: YARN-9773
> URL: https://issues.apache.org/jira/browse/YARN-9773
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> Although the custom resource metrics are calculated and saved as a 
> QueueMetricsForCustomResources object within the QueueMetrics class, the JMX 
> and Simon QueueMetrics do not report that information for custom resources. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9773) Add QueueMetrics for Custom Resources/Resource vectors

2019-08-27 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-9773:
-
Summary: Add QueueMetrics for Custom Resources/Resource vectors  (was: 
PartitionQueueMetrics for Custom Resources/Resource vectors)

> Add QueueMetrics for Custom Resources/Resource vectors
> --
>
> Key: YARN-9773
> URL: https://issues.apache.org/jira/browse/YARN-9773
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6425) Move out FS state dump code out of method update()

2019-08-27 Thread Yousef Abu-Salah (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916842#comment-16916842
 ] 

Yousef Abu-Salah commented on YARN-6425:


Can I take this issue? In addition, could I get a more descriptive definition 
of the problem and what needs to be accomplished?

> Move out FS state dump code out of method update()
> --
>
> Key: YARN-6425
> URL: https://issues.apache.org/jira/browse/YARN-6425
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Yufei Gu
>Priority: Major
>  Labels: newbie++
>
> Better to move out FS state dump code out of update()
> {code}
> if (LOG.isDebugEnabled()) {
>   if (--updatesToSkipForDebug < 0) {
> updatesToSkipForDebug = UPDATE_DEBUG_FREQUENCY;
> dumpSchedulerState();
>   }
> }
> {code}
> And, after that we should distinct between update call and update thread 
> duration like before YARN-6112. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5913) Consolidate "resource" and "amResourceRequest" in ApplicationSubmissionContext

2019-08-27 Thread Yousef Abu-Salah (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916838#comment-16916838
 ] 

Yousef Abu-Salah edited comment on YARN-5913 at 8/27/19 4:21 PM:
-

Can I take this issue? In addition, could I get a more descriptive definition 
of the problem and what needs to be accomplished?


was (Author: ykabusalah):
Can I take this issue?

> Consolidate "resource" and "amResourceRequest" in ApplicationSubmissionContext
> --
>
> Key: YARN-5913
> URL: https://issues.apache.org/jira/browse/YARN-5913
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Priority: Minor
>  Labels: newbie
>
> Usage of these two variables overlaps and causes confusion. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5913) Consolidate "resource" and "amResourceRequest" in ApplicationSubmissionContext

2019-08-27 Thread Yousef Abu-Salah (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916838#comment-16916838
 ] 

Yousef Abu-Salah commented on YARN-5913:


Can I take this issue?

> Consolidate "resource" and "amResourceRequest" in ApplicationSubmissionContext
> --
>
> Key: YARN-5913
> URL: https://issues.apache.org/jira/browse/YARN-5913
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Priority: Minor
>  Labels: newbie
>
> Usage of these two variables overlaps and causes confusion. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9754) Add support for arbitrary DAG AM Simulator.

2019-08-27 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916721#comment-16916721
 ] 

Abhishek Modi commented on YARN-9754:
-

Thanks [~elgoiri] for review. I have addressed all the review comments in the 
latest patch.

Since changing the AMSimulator will also impact MRAMSimulator and 
StreamAMSimulator, I will create a separate Jira for that.

> Add support for arbitrary DAG AM Simulator.
> ---
>
> Key: YARN-9754
> URL: https://issues.apache.org/jira/browse/YARN-9754
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9754.001.patch, YARN-9754.002.patch, 
> YARN-9754.003.patch, YARN-9754.004.patch
>
>
> Currently, all map containers are requests as soon as Application master 
> comes up and then all reducer containers are requested. This doesn't get 
> flexibility to simulate behavior of DAG where various number of containers 
> would be requested at different time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9754) Add support for arbitrary DAG AM Simulator.

2019-08-27 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9754:

Attachment: YARN-9754.004.patch

> Add support for arbitrary DAG AM Simulator.
> ---
>
> Key: YARN-9754
> URL: https://issues.apache.org/jira/browse/YARN-9754
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9754.001.patch, YARN-9754.002.patch, 
> YARN-9754.003.patch, YARN-9754.004.patch
>
>
> Currently, all map containers are requests as soon as Application master 
> comes up and then all reducer containers are requested. This doesn't get 
> flexibility to simulate behavior of DAG where various number of containers 
> would be requested at different time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9509) Capped cpu usage with cgroup strict-resource-usage based on a mulitplier

2019-08-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916669#comment-16916669
 ] 

Hadoop QA commented on YARN-9509:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
24s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
32s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
31s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 5 new + 219 unchanged - 0 fixed = 224 total (was 219) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m  
7s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
53s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
52s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
42s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
46s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}122m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-

[jira] [Commented] (YARN-9009) Fix flaky test TestEntityGroupFSTimelineStore.testCleanLogs

2019-08-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916657#comment-16916657
 ] 

Hadoop QA commented on YARN-9009:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 10s{color} 
| {color:red} https://github.com/apache/hadoop/pull/438 does not apply to 
trunk. Rebase required? Wrong Branch? See 
https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| GITHUB PR | https://github.com/apache/hadoop/pull/438 |
| JIRA Issue | YARN-9009 |
| Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-438/8/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |


This message was automatically generated.



> Fix flaky test TestEntityGroupFSTimelineStore.testCleanLogs
> ---
>
> Key: YARN-9009
> URL: https://issues.apache.org/jira/browse/YARN-9009
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: Ubuntu 18.04
> java version "1.8.0_181"
> Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
>  
> Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 
> 2018-06-17T13:33:14-05:00)
>Reporter: OrDTesters
>Assignee: OrDTesters
>Priority: Minor
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-9009-trunk-001.patch
>
>
> In TestEntityGroupFSTimelineStore, testCleanLogs fails when run after 
> testMoveToDone.
> testCleanLogs fails because testMoveToDone moves a file into the same 
> directory that testCleanLogs cleans, causing testCleanLogs to clean 3 files, 
> instead of 2 as testCleanLogs expects.
> To fix the failure of testCleanLogs, we can delete the file after the file is 
> moved by testMoveToDone.
> Pull request link: [https://github.com/apache/hadoop/pull/438]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9738) Remove lock on ClusterNodeTracker#getNodeReport as it blocks application submission

2019-08-27 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-9738:

Attachment: YARN-9738-003.patch

> Remove lock on ClusterNodeTracker#getNodeReport as it blocks application 
> submission
> ---
>
> Key: YARN-9738
> URL: https://issues.apache.org/jira/browse/YARN-9738
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9738-001.patch, YARN-9738-002.patch, 
> YARN-9738-003.patch
>
>
> *Env :*
> Server OS :- UBUNTU
> No. of Cluster Node:- 9120 NMs
> Env Mode:- [Secure / Non secure]Secure
> *Preconditions:*
> ~9120 NM's was running
> ~1250 applications was in running state 
> 35K applications was in pending state
> *Test Steps:*
> 1. Submit the application from 5 clients, each client 2 threads and total 10 
> queues
> 2. Once application submittion increases (for each application of 
> distributted shell will call getClusterNodes)
> *ClientRMservice#getClusterNodes tries to get 
> ClusterNodeTracker#getNodeReport where map nodes is locked.*
> {quote}
> "IPC Server handler 36 on 45022" #246 daemon prio=5 os_prio=0 
> tid=0x7f75095de000 nid=0x1949c waiting on condition [0x7f74cff78000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x7f759f6d8858> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.getNodeReport(ClusterNodeTracker.java:123)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getNodeReport(AbstractYarnScheduler.java:449)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.createNodeReports(ClientRMService.java:1067)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getClusterNodes(ClientRMService.java:992)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:313)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:589)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2792)
> {quote}
> *Instead we can make nodes as concurrentHashMap and remove readlock*



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9468) Fix inaccurate documentations in Placement Constraints

2019-08-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916640#comment-16916640
 ] 

Hadoop QA commented on YARN-9468:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
31m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-717/9/artifact/out/Dockerfile
 |
| GITHUB PR | https://github.com/apache/hadoop/pull/717 |
| JIRA Issue | YARN-9468 |
| Optional Tests | dupname asflicense mvnsite |
| uname | Linux 15bc56f1d87d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 3329257 |
| Max. process+thread count | 413 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-717/9/console |
| versions | git=2.7.4 maven=3.3.9 |
| Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |


This message was automatically generated.



> Fix inaccurate documentations in Placement Constraints
> --
>
> Key: YARN-9468
> URL: https://issues.apache.org/jira/browse/YARN-9468
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
>
> Document Placement Constraints
> *First* 
> {code:java}
> zk=3,NOTIN,NODE,zk:hbase=5,IN,RACK,zk:spark=7,CARDINALITY,NODE,hbase,1,3{code}
>  * place 5 containers with tag “hbase” with affinity to a rack on which 
> containers with tag “zk” are running (i.e., an “hbase” container 
> should{color:#ff} not{color} be placed at a rack where an “zk” container 
> is running, given that “zk” is the TargetTag of the second constraint);
> The _*not*_ word in brackets should be delete.
>  
> *Second*
> {code:java}
> PlacementSpec => "" | KeyVal;PlacementSpec
> {code}
> The semicolon should be replaced by colon
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9738) Remove lock on ClusterNodeTracker#getNodeReport as it blocks application submission

2019-08-27 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-9738:

Attachment: (was: YARN-9738-003.patch)

> Remove lock on ClusterNodeTracker#getNodeReport as it blocks application 
> submission
> ---
>
> Key: YARN-9738
> URL: https://issues.apache.org/jira/browse/YARN-9738
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9738-001.patch, YARN-9738-002.patch
>
>
> *Env :*
> Server OS :- UBUNTU
> No. of Cluster Node:- 9120 NMs
> Env Mode:- [Secure / Non secure]Secure
> *Preconditions:*
> ~9120 NM's was running
> ~1250 applications was in running state 
> 35K applications was in pending state
> *Test Steps:*
> 1. Submit the application from 5 clients, each client 2 threads and total 10 
> queues
> 2. Once application submittion increases (for each application of 
> distributted shell will call getClusterNodes)
> *ClientRMservice#getClusterNodes tries to get 
> ClusterNodeTracker#getNodeReport where map nodes is locked.*
> {quote}
> "IPC Server handler 36 on 45022" #246 daemon prio=5 os_prio=0 
> tid=0x7f75095de000 nid=0x1949c waiting on condition [0x7f74cff78000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x7f759f6d8858> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.getNodeReport(ClusterNodeTracker.java:123)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getNodeReport(AbstractYarnScheduler.java:449)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.createNodeReports(ClientRMService.java:1067)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getClusterNodes(ClientRMService.java:992)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:313)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:589)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2792)
> {quote}
> *Instead we can make nodes as concurrentHashMap and remove readlock*



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9738) Remove lock on ClusterNodeTracker#getNodeReport as it blocks application submission

2019-08-27 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-9738:

Attachment: YARN-9738-003.patch

> Remove lock on ClusterNodeTracker#getNodeReport as it blocks application 
> submission
> ---
>
> Key: YARN-9738
> URL: https://issues.apache.org/jira/browse/YARN-9738
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9738-001.patch, YARN-9738-002.patch
>
>
> *Env :*
> Server OS :- UBUNTU
> No. of Cluster Node:- 9120 NMs
> Env Mode:- [Secure / Non secure]Secure
> *Preconditions:*
> ~9120 NM's was running
> ~1250 applications was in running state 
> 35K applications was in pending state
> *Test Steps:*
> 1. Submit the application from 5 clients, each client 2 threads and total 10 
> queues
> 2. Once application submittion increases (for each application of 
> distributted shell will call getClusterNodes)
> *ClientRMservice#getClusterNodes tries to get 
> ClusterNodeTracker#getNodeReport where map nodes is locked.*
> {quote}
> "IPC Server handler 36 on 45022" #246 daemon prio=5 os_prio=0 
> tid=0x7f75095de000 nid=0x1949c waiting on condition [0x7f74cff78000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x7f759f6d8858> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.getNodeReport(ClusterNodeTracker.java:123)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getNodeReport(AbstractYarnScheduler.java:449)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.createNodeReports(ClientRMService.java:1067)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getClusterNodes(ClientRMService.java:992)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterNodes(ApplicationClientProtocolPBServiceImpl.java:313)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:589)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2792)
> {quote}
> *Instead we can make nodes as concurrentHashMap and remove readlock*



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9785) Application gets activated even when AM memory has reached

2019-08-27 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-9785:

Attachment: YARN-9785-001.patch

> Application gets activated even when AM memory has reached
> --
>
> Key: YARN-9785
> URL: https://issues.apache.org/jira/browse/YARN-9785
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Blocker
> Attachments: YARN-9785-001.patch
>
>
> Configure below property in resource-types.xml
> {quote}
>  yarn.resource-types
>  yarn.io/gpu
>  
> {quote}
> Submit applications even after AM limit for a queue is reached. Applications 
> get activated even after limit is reached
> !queue.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9579) the property of sharedcache in mapred-default.xml

2019-08-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916628#comment-16916628
 ] 

Hadoop QA commented on YARN-9579:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
32m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  5m 
50s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-848/10/artifact/out/Dockerfile
 |
| GITHUB PR | https://github.com/apache/hadoop/pull/848 |
| JIRA Issue | YARN-9579 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient xml |
| uname | Linux 1a74f0dda78e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 3329257 |
| Default Java | 1.8.0_222 |
|  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-848/10/testReport/ |
| Max. process+thread count | 1582 (vs. ulimit of 5500) |
| modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
U: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core |
| Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-848/10/console |
| versions | git=2.7.4 maven=3.3.9 |
| Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |


This message was automatically generated.



> the property of sharedcache in mapred-default.xml
> -
>
> Key: YARN-9579
> URL: https://issues.apache.org/jira/browse/YARN-9579
> Pr

[jira] [Commented] (YARN-9479) Change String.equals to Objects.equals(String,String) to avoid possible NullPointerException

2019-08-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916608#comment-16916608
 ] 

Hadoop QA commented on YARN-9479:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  9s{color} 
| {color:red} https://github.com/apache/hadoop/pull/738 does not apply to 
trunk. Rebase required? Wrong Branch? See 
https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| GITHUB PR | https://github.com/apache/hadoop/pull/738 |
| JIRA Issue | YARN-9479 |
| Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-738/11/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |


This message was automatically generated.



> Change String.equals to Objects.equals(String,String) to avoid possible 
> NullPointerException
> 
>
> Key: YARN-9479
> URL: https://issues.apache.org/jira/browse/YARN-9479
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: bd2019us
>Priority: Major
> Attachments: 1.patch
>
>
> Hello,
> I found that the String "queueName"  may cause potential risk of 
> NullPointerException since it is immediately used after initialization and 
> there is no null checker. One recommended API is 
> Objects.equals(String,String) which can avoid this exception.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9785) Application gets activated even when AM memory has reached

2019-08-27 Thread Bibin A Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916544#comment-16916544
 ] 

Bibin A Chundatt commented on YARN-9785:


Thank you [~BilwaST] for raising this

Issue looks like API interface level issue DominantResourceCalculator#compare 
or wrong usage. *0* should be returned only when all the resources are equal.
 But in this case if one resource is greater and other is less the the compare 
returns *0*


{code:java}
  /**
   * Compare two resources - if the value for every resource type for the lhs
   * is greater than that of the rhs, return 1. If the value for every resource
   * type in the lhs is less than the rhs, return -1. Otherwise, return 0
   *
   * @param lhs resource to be compared
   * @param rhs resource to be compared
   * @return 0, 1, or -1
   */

 private int compare(Resource lhs, Resource rhs) {

  public int compare(Resource clusterResource, Resource lhs, Resource rhs,
  boolean singleType) {
{code}


Cluster resource <10,10,0> memory,cpu,gpu

||lhs||rhs||
|<1,0>|<0,1>|returns --> 0|

ResourceCalculator#compare expects *0* only if values are equal.

All the callers that expects all the fields to be 
lessthanEqual/greaterThanEqual are affected.

[~rohithsharma]/[~tangzhankun]/[~sunil.gov...@gmail.com]


> Application gets activated even when AM memory has reached
> --
>
> Key: YARN-9785
> URL: https://issues.apache.org/jira/browse/YARN-9785
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Blocker
>
> Configure below property in resource-types.xml
> {quote}
>  yarn.resource-types
>  yarn.io/gpu
>  
> {quote}
> Submit applications even after AM limit for a queue is reached. Applications 
> get activated even after limit is reached
> !queue.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9781) SchedConfCli to get current stored scheduler configuration

2019-08-27 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9781:

Attachment: YARN-9781-003.patch

> SchedConfCli to get current stored scheduler configuration
> --
>
> Key: YARN-9781
> URL: https://issues.apache.org/jira/browse/YARN-9781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9781-001.patch, YARN-9781-002.patch, 
> YARN-9781-003.patch
>
>
> SchedConfCLI currently allows to add / remove / remove queue. It does not 
> support get configuration which RMWebServices provides as part of YARN-8559.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9789) Disable Option for Write Ahead Logs of LogMutation

2019-08-27 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9789:

Parent: YARN-5734
Issue Type: Sub-task  (was: Bug)

> Disable Option for Write Ahead Logs of LogMutation
> --
>
> Key: YARN-9789
> URL: https://issues.apache.org/jira/browse/YARN-9789
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> When yarn.scheduler.configuration.store.max-logs is set to zero, the 
> YARNConfigurationStore (ZK, LevelDB) reads the write ahead logs from the 
> backend which is not needed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9789) Disable Option for Write Ahead Logs of LogMutation

2019-08-27 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9789:

Component/s: capacity scheduler

> Disable Option for Write Ahead Logs of LogMutation
> --
>
> Key: YARN-9789
> URL: https://issues.apache.org/jira/browse/YARN-9789
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> When yarn.scheduler.configuration.store.max-logs is set to zero, the 
> YARNConfigurationStore (ZK, LevelDB) reads the write ahead logs from the 
> backend which is not needed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9789) Disable Option for Write Ahead Logs of LogMutation

2019-08-27 Thread Prabhu Joseph (Jira)
Prabhu Joseph created YARN-9789:
---

 Summary: Disable Option for Write Ahead Logs of LogMutation
 Key: YARN-9789
 URL: https://issues.apache.org/jira/browse/YARN-9789
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


When yarn.scheduler.configuration.store.max-logs is set to zero, the 
YARNConfigurationStore (ZK, LevelDB) reads the write ahead logs from the 
backend which is not needed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9789) Disable Option for Write Ahead Logs of LogMutation

2019-08-27 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9789:

Affects Version/s: 3.3.0

> Disable Option for Write Ahead Logs of LogMutation
> --
>
> Key: YARN-9789
> URL: https://issues.apache.org/jira/browse/YARN-9789
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> When yarn.scheduler.configuration.store.max-logs is set to zero, the 
> YARNConfigurationStore (ZK, LevelDB) reads the write ahead logs from the 
> backend which is not needed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9788) Queue Management API - does not support parallel updates

2019-08-27 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9788:

Affects Version/s: 3.3.0

> Queue Management API - does not support parallel updates
> 
>
> Key: YARN-9788
> URL: https://issues.apache.org/jira/browse/YARN-9788
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> Queue Management API - does not support parallel updates. When there are two 
> parallel schedule conf updates (logAndApplyMutation), the first update is 
> overwritten by the second one. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9788) Queue Management API - does not support parallel updates

2019-08-27 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9788:

Parent: YARN-5734
Issue Type: Sub-task  (was: Bug)

> Queue Management API - does not support parallel updates
> 
>
> Key: YARN-9788
> URL: https://issues.apache.org/jira/browse/YARN-9788
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> Queue Management API - does not support parallel updates. When there are two 
> parallel schedule conf updates (logAndApplyMutation), the first update is 
> overwritten by the second one. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9788) Queue Management API - does not support parallel updates

2019-08-27 Thread Prabhu Joseph (Jira)
Prabhu Joseph created YARN-9788:
---

 Summary: Queue Management API - does not support parallel updates
 Key: YARN-9788
 URL: https://issues.apache.org/jira/browse/YARN-9788
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


Queue Management API - does not support parallel updates. When there are two 
parallel schedule conf updates (logAndApplyMutation), the first update is 
overwritten by the second one. 





--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9714) ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby

2019-08-27 Thread Rohith Sharma K S (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916494#comment-16916494
 ] 

Rohith Sharma K S commented on YARN-9714:
-

[~bibinchundatt] Would you please take a look at the patch? 

> ZooKeeper connection in ZKRMStateStore leaks after RM transitioned to standby
> -
>
> Key: YARN-9714
> URL: https://issues.apache.org/jira/browse/YARN-9714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: memory-leak
> Attachments: YARN-9714.001.patch, YARN-9714.002.patch, 
> YARN-9714.003.patch, YARN-9714.004.patch
>
>
> Recently RM full GC happened in one of our clusters, after investigating the 
> dump memory and jstack, I found two places in RM may cause memory leaks after 
> RM transitioned to standby:
>  # Release cache cleanup timer in AbstractYarnScheduler never be canceled.
>  # ZooKeeper connection in ZKRMStateStore never be closed.
> To solve those leaks, we should close the connection or cancel the timer when 
> services are stopping.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9785) Application gets activated even when AM memory has reached

2019-08-27 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916485#comment-16916485
 ] 

Bilwa S T edited comment on YARN-9785 at 8/27/19 7:55 AM:
--

In LeafQueue#activateApplications there is a lessThanOrEqual to check for 
amResourceRequest and amlimit for a queue which in turn calls 
DominantResourceAllocator#compare as GPU resource is zero. 
DominantResourceAllocator#compare returns 0 as  lhsMemory > rhsMemory && 
lhsvcore < rhsVcore which would return true and then application gets 
activated.  To solve this lessThanOrEqual should make sure that none of the lhs 
resource is greater than rhs resource


was (Author: bilwast):
In LeafQueue#activateApplications there is a lessThanOrEqual to check for 
amResourceRequest and amlimit for a queue which in turn calls 
DominantResourceAllocator#compare as GPU resource is zero. 
DominantResourceAllocator#compare returns 0 as  lhsMemory > rhsMemory && 
lhsvcore < rhsVcore which would return true and then application gets 
activated. 

> Application gets activated even when AM memory has reached
> --
>
> Key: YARN-9785
> URL: https://issues.apache.org/jira/browse/YARN-9785
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Blocker
>
> Configure below property in resource-types.xml
> {quote}
>  yarn.resource-types
>  yarn.io/gpu
>  
> {quote}
> Submit applications even after AM limit for a queue is reached. Applications 
> get activated even after limit is reached
> !queue.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9785) Application gets activated even when AM memory has reached

2019-08-27 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916485#comment-16916485
 ] 

Bilwa S T commented on YARN-9785:
-

In LeafQueue#activateApplications there is a lessThanOrEqual to check for 
amResourceRequest and amlimit for a queue which in turn calls 
DominantResourceAllocator#compare as GPU resource is zero. 
DominantResourceAllocator#compare returns 0 as  lhsMemory > rhsMemory && 
lhsvcore < rhsVcore which would return true and then application gets 
activated. 

> Application gets activated even when AM memory has reached
> --
>
> Key: YARN-9785
> URL: https://issues.apache.org/jira/browse/YARN-9785
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Blocker
>
> Configure below property in resource-types.xml
> {quote}
>  yarn.resource-types
>  yarn.io/gpu
>  
> {quote}
> Submit applications even after AM limit for a queue is reached. Applications 
> get activated even after limit is reached
> !queue.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9786) testCancelledDelegationToken fails intermittently

2019-08-27 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916483#comment-16916483
 ] 

Adam Antal commented on YARN-9786:
--

Ah indeed. Quite strange, I searched in the apache jira, but there were no 
results for the test name. Closing as it's a dupe.

> testCancelledDelegationToken fails intermittently
> -
>
> Key: YARN-9786
> URL: https://issues.apache.org/jira/browse/YARN-9786
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Priority: Major
> Attachments: testCancelledDelegationToken.txt
>
>
> testCancelledDelegationToken[0] fails intermittently with the following 
> error/stack trace:
> {noformat}
> java.io.IOException: Server returned HTTP response code: 400 for URL: 
> http://localhost:8088/ws/v1/cluster/delegation-token
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.cancelDelegationToken(TestRMWebServicesDelegationTokenAuthentication.java:446)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.testCancelledDelegationToken(TestRMWebServicesDelegationTokenAuthentication.java:267)
> {noformat}
> I'll attach an stdout as well to this issue.
> It seems that the test helper infrastructure does not come up correctly. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9786) testCancelledDelegationToken fails intermittently

2019-08-27 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal resolved YARN-9786.
--
Resolution: Duplicate

> testCancelledDelegationToken fails intermittently
> -
>
> Key: YARN-9786
> URL: https://issues.apache.org/jira/browse/YARN-9786
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Priority: Major
> Attachments: testCancelledDelegationToken.txt
>
>
> testCancelledDelegationToken[0] fails intermittently with the following 
> error/stack trace:
> {noformat}
> java.io.IOException: Server returned HTTP response code: 400 for URL: 
> http://localhost:8088/ws/v1/cluster/delegation-token
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.cancelDelegationToken(TestRMWebServicesDelegationTokenAuthentication.java:446)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication.testCancelledDelegationToken(TestRMWebServicesDelegationTokenAuthentication.java:267)
> {noformat}
> I'll attach an stdout as well to this issue.
> It seems that the test helper infrastructure does not come up correctly. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7145) Identify potential flaky unit tests

2019-08-27 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned YARN-7145:


Assignee: Julia Kinga Marton  (was: Szilard Nemeth)

> Identify potential flaky unit tests
> ---
>
> Key: YARN-7145
> URL: https://issues.apache.org/jira/browse/YARN-7145
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: nodemanager, resourcemanager
>Reporter: Miklos Szegedi
>Assignee: Julia Kinga Marton
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-7145.000.patch, YARN-7145.001.patch
>
>
> I intend to add a 200 milliseconds sleep into AsyncDispatcher, and run the 
> job to identify the tests that are potentially flaky.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9640) Slow event processing could cause too many attempt unregister events

2019-08-27 Thread Bibin A Chundatt (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-9640:
---
Attachment: YARN-9640-branch-3.2.001.patch

> Slow event processing could cause too many attempt unregister events
> 
>
> Key: YARN-9640
> URL: https://issues.apache.org/jira/browse/YARN-9640
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>  Labels: scalability
> Fix For: 3.3.0
>
> Attachments: YARN-9640-branch-3.2.001.patch, YARN-9640.001.patch, 
> YARN-9640.002.patch, YARN-9640.003.patch
>
>
> We found in one of our test cluster verification that the number attempt 
> unregister events is about 300k+.
>  # AM all containers completed.
>  # AMRMClientImpl send finishApplcationMaster
>  # AMRMClient check event 100ms the finish Status using 
> finishApplicationMaster request.
>  # AMRMClientImpl#unregisterApplicationMaster
> {code:java}
>   while (true) {
> FinishApplicationMasterResponse response =
> rmClient.finishApplicationMaster(request);
> if (response.getIsUnregistered()) {
>   break;
> }
> LOG.info("Waiting for application to be successfully unregistered.");
> Thread.sleep(100);
>   }
> {code}
>  # ApplicationMasterService finishApplicationMaster interface sends 
> unregister events on every status update.
> We should send unregister event only once and cache event send , ignore and 
> send not unregistered response back to AM not overloading the event queue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5

2019-08-27 Thread Szalay-Beko Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szalay-Beko Mate updated YARN-9783:
---
Attachment: (was: YARN-9783.001.patch)

> Remove low-level zookeeper test to be able to build Hadoop against zookeeper 
> 3.5.5
> --
>
> Key: YARN-9783
> URL: https://issues.apache.org/jira/browse/YARN-9783
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szalay-Beko Mate
>Assignee: Szalay-Beko Mate
>Priority: Major
> Attachments: YARN-9783.001.patch
>
>
> ZooKeeper 3.5.5 release is the latest stable one. It contains many new 
> features (including SSL related improvements which are very important for 
> production use; see [the release 
> notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
> should be no backward incompatible changes on the API, so the applications 
> using ZooKeeper clients should be built against the new zookeeper without any 
> problem and the new ZooKeeper client should work with the older (3.4) servers 
> without any issue, at least until someone is start to use new functionality.
> The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
> YARN yet, but to enable people to rebuild and test Hadoop with the new 
> ZooKeeper version.
> Currently the Hadoop build (with ZooKeeper 3.5.5) fails because of a YARN 
> test case: 
> [TestSecureRegistry.testLowlevelZKSaslLogin()|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureRegistry.java#L64].
>  This test case seems to use low-level ZooKeeper internal code, which changed 
> in the new ZooKeeper version. Although I am not sure what was the original 
> reasoning of the inclusion of this test in the YARN code, I propose to remove 
> it, and if there is still any missing test case in ZooKeeper, then let's 
> issue a ZooKeeper ticket to test this scenario there.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5

2019-08-27 Thread Szalay-Beko Mate (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szalay-Beko Mate updated YARN-9783:
---
Attachment: YARN-9783.001.patch

> Remove low-level zookeeper test to be able to build Hadoop against zookeeper 
> 3.5.5
> --
>
> Key: YARN-9783
> URL: https://issues.apache.org/jira/browse/YARN-9783
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szalay-Beko Mate
>Assignee: Szalay-Beko Mate
>Priority: Major
> Attachments: YARN-9783.001.patch
>
>
> ZooKeeper 3.5.5 release is the latest stable one. It contains many new 
> features (including SSL related improvements which are very important for 
> production use; see [the release 
> notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
> should be no backward incompatible changes on the API, so the applications 
> using ZooKeeper clients should be built against the new zookeeper without any 
> problem and the new ZooKeeper client should work with the older (3.4) servers 
> without any issue, at least until someone is start to use new functionality.
> The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
> YARN yet, but to enable people to rebuild and test Hadoop with the new 
> ZooKeeper version.
> Currently the Hadoop build (with ZooKeeper 3.5.5) fails because of a YARN 
> test case: 
> [TestSecureRegistry.testLowlevelZKSaslLogin()|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureRegistry.java#L64].
>  This test case seems to use low-level ZooKeeper internal code, which changed 
> in the new ZooKeeper version. Although I am not sure what was the original 
> reasoning of the inclusion of this test in the YARN code, I propose to remove 
> it, and if there is still any missing test case in ZooKeeper, then let's 
> issue a ZooKeeper ticket to test this scenario there.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org