[jira] [Commented] (YARN-9644) First RMContext always leaked during switch over

2019-06-23 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870789#comment-16870789
 ] 

Tao Yang commented on YARN-9644:


Thanks [~bibinchundatt] for fixing this.
LGTM, +1 for the patch.

> First RMContext always leaked during switch over
> 
>
> Key: YARN-9644
> URL: https://issues.apache.org/jira/browse/YARN-9644
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-9644.001.patch, YARN-9644.002.patch, 
> YARN-9644.003.patch
>
>
> As per my understanding following 2 issues causes the issue.
> * WebApp holds the reference to First applicationMasterServer instance, which 
> has rmcontext with ActiveServiceContext (holds RMApps + nodes map). WebApp 
> remains to life time of RM process.
> * On transistion to active RMNMInfo object is registered in  MBean and never 
> unregistered on transitionToStandBy
> On transistion to Standby and again based to active new RMContext gets 
> created, but above 2 issues causes first RMcontext persist still RMShutdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9639) DecommissioningNodesWatcher cause memory leak

2019-06-23 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870774#comment-16870774
 ] 

Bibin A Chundatt commented on YARN-9639:


[~sunilg] assigning null i think is required. 
{{FileSystemTimelineWriter}},{{MetricsSystemImpl}} does the same..


> DecommissioningNodesWatcher cause memory leak
> -
>
> Key: YARN-9639
> URL: https://issues.apache.org/jira/browse/YARN-9639
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9639-001.patch
>
>
> Missing cancel() of Timer task in DecommissioningNodesWatcher could leak to 
> memory leak.
> PollTimerTask holds the reference of rmcontext



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9635) Nodes page displayed duplicate nodes

2019-06-23 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870769#comment-16870769
 ] 

Tao Yang commented on YARN-9635:


Hi, [~jiwq]. 
It seems as if NMs were using ephemeral ports, so that nodeIds were mostly 
changed after failover and NMs are seen as new nodes by RM. You can check if 
the conf value of "yarn.nodemanager.address" is "0.0.0.0:0"(the default value 
if not configured), if yes, you can set a fixed port to avoid this.

> Nodes page displayed duplicate nodes
> 
>
> Key: YARN-9635
> URL: https://issues.apache.org/jira/browse/YARN-9635
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api, webapp, yarn-ui-v2
>Affects Versions: 3.2.0
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: UI2-nodes.jpg
>
>
> Steps:
>  * shutdown nodes
>  * start nodes
> Nodes Page:
> !UI2-nodes.jpg!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9640) Slow event processing could cause too many attempt unregister events

2019-06-23 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870768#comment-16870768
 ] 

Bibin A Chundatt commented on YARN-9640:


[~tangzhankun]

IMHO server side mandatory since finishApplicationMaster is an interface given. 
{{AMRMClientImpl}} is only one implementation of it.

As an additional handling  we could make the client side retry time 
configurable, but then the load depends entirely on client configuring the 
property.

> Slow event processing could cause too many attempt unregister events
> 
>
> Key: YARN-9640
> URL: https://issues.apache.org/jira/browse/YARN-9640
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>  Labels: scalability
> Attachments: YARN-9640.001.patch, YARN-9640.002.patch, 
> YARN-9640.003.patch
>
>
> We found in one of our test cluster verification that the number attempt 
> unregister events is about 300k+.
>  # AM all containers completed.
>  # AMRMClientImpl send finishApplcationMaster
>  # AMRMClient check event 100ms the finish Status using 
> finishApplicationMaster request.
>  # AMRMClientImpl#unregisterApplicationMaster
> {code:java}
>   while (true) {
> FinishApplicationMasterResponse response =
> rmClient.finishApplicationMaster(request);
> if (response.getIsUnregistered()) {
>   break;
> }
> LOG.info("Waiting for application to be successfully unregistered.");
> Thread.sleep(100);
>   }
> {code}
>  # ApplicationMasterService finishApplicationMaster interface sends 
> unregister events on every status update.
> We should send unregister event only once and cache event send , ignore and 
> send not unregistered response back to AM not overloading the event queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9620) oozie completed jobs didn't send to yarn job history server

2019-06-23 Thread ZHOUBEIHUA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870755#comment-16870755
 ] 

ZHOUBEIHUA commented on YARN-9620:
--

Thanks. I find that oozie5.0 MR job has changed from LauncherMapper.java to 
LauncherAM.java  , so oozie submit job just a am application not a mr job , 
that will not sent to jobhistory server . 

As a temporary solution , "yarn.resourcemanager.max-completed-applications"  
change default 1 to 4 , so oozie jobs can stay longer in RM server .
h2.  

> oozie completed jobs didn't send to yarn job history server
> ---
>
> Key: YARN-9620
> URL: https://issues.apache.org/jira/browse/YARN-9620
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Affects Versions: 3.0.0
> Environment:  
> Hadoop 3.0.0+cdh6.0.1
> YARN  3.0.0+cdh6.0.1
> Hue 3.9.0+cdh6.0.1
> Oozie 5.0.0-beta1+cdh6.0.1
>Reporter: ZHOUBEIHUA
>Priority: Major
>
> Hi, I am using cdh6.0 cluster, when jobs completed from oozie ,they  didn't 
> send to yarn job history server . I can't find oozie  launcher jobs in JHS . 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9627) DelegationTokenRenewer could block transitionToStandy

2019-06-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870598#comment-16870598
 ] 

Hadoop QA commented on YARN-9627:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 38 unchanged - 1 fixed = 38 total (was 39) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 88m 
15s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}141m 50s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9627 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972619/YARN-9627.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux df74ec7a6840 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b28ddb2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24314/testReport/ |
| Max. process+thread count | 915 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-9644) First RMContext always leaked during switch over

2019-06-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870597#comment-16870597
 ] 

Hadoop QA commented on YARN-9644:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 55 unchanged - 4 fixed = 55 total (was 59) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 88m 
20s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}141m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9644 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972618/YARN-9644.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3552a04b1e85 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b28ddb2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24313/testReport/ |
| Max. process+thread count | 909 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-9639) DecommissioningNodesWatcher cause memory leak

2019-06-23 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870594#comment-16870594
 ] 

Sunil Govindan commented on YARN-9639:
--

One small nit, In DecommissioningNodesWatcher, Do we need to bind pollTimer 
with null check in stop method ?

 

> DecommissioningNodesWatcher cause memory leak
> -
>
> Key: YARN-9639
> URL: https://issues.apache.org/jira/browse/YARN-9639
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9639-001.patch
>
>
> Missing cancel() of Timer task in DecommissioningNodesWatcher could leak to 
> memory leak.
> PollTimerTask holds the reference of rmcontext



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9640) Slow event processing could cause too many attempt unregister events

2019-06-23 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870582#comment-16870582
 ] 

Zhankun Tang commented on YARN-9640:


[~bibinchundatt] , Thanks for the patch! One question is that how about we 
avoid this unnecessary events in the client side?
Not quite sure if this will cause much overhead or incompatibility to existing 
production wokload.

> Slow event processing could cause too many attempt unregister events
> 
>
> Key: YARN-9640
> URL: https://issues.apache.org/jira/browse/YARN-9640
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>  Labels: scalability
> Attachments: YARN-9640.001.patch, YARN-9640.002.patch, 
> YARN-9640.003.patch
>
>
> We found in one of our test cluster verification that the number attempt 
> unregister events is about 300k+.
>  # AM all containers completed.
>  # AMRMClientImpl send finishApplcationMaster
>  # AMRMClient check event 100ms the finish Status using 
> finishApplicationMaster request.
>  # AMRMClientImpl#unregisterApplicationMaster
> {code:java}
>   while (true) {
> FinishApplicationMasterResponse response =
> rmClient.finishApplicationMaster(request);
> if (response.getIsUnregistered()) {
>   break;
> }
> LOG.info("Waiting for application to be successfully unregistered.");
> Thread.sleep(100);
>   }
> {code}
>  # ApplicationMasterService finishApplicationMaster interface sends 
> unregister events on every status update.
> We should send unregister event only once and cache event send , ignore and 
> send not unregistered response back to AM not overloading the event queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9639) DecommissioningNodesWatcher cause memory leak

2019-06-23 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870572#comment-16870572
 ] 

Zhankun Tang commented on YARN-9639:


Thanks [~bilwa_st], [~bibinchundatt]. LGTM, +1.

> DecommissioningNodesWatcher cause memory leak
> -
>
> Key: YARN-9639
> URL: https://issues.apache.org/jira/browse/YARN-9639
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9639-001.patch
>
>
> Missing cancel() of Timer task in DecommissioningNodesWatcher could leak to 
> memory leak.
> PollTimerTask holds the reference of rmcontext



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9627) DelegationTokenRenewer could block transitionToStandy

2019-06-23 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-9627:
---
Attachment: YARN-9627.003.patch

> DelegationTokenRenewer could block transitionToStandy
> -
>
> Key: YARN-9627
> URL: https://issues.apache.org/jira/browse/YARN-9627
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: krishna reddy
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-9627.001.patch, YARN-9627.002.patch, 
> YARN-9627.003.patch
>
>
> Cluster size: 5K
> Running containers: 55K
> *Scenario*: Largenumber of pending applications (around 50K) and performing 
> RM switch over
> Below exception :
> {noformat}
> 2019-06-13 17:39:27,594 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: HDFS_DELEGATION_TOKEN, Service: X:1616, Ident: (token 
> for root: HDFS_DELEGATION_TOKEN owner=root/had...@hadoop.com, renewer=yarn, 
> realUser=, issueDate=1560361265181, maxDate=1560966065181, 
> sequenceNumber=104708, masterKeyId=3);exp=1560533965360; 
> apps=[application_1560346941775_20702] in 86397766 ms, appId = 
> [application_1560346941775_20702]
> 2019-06-13 17:39:27,609 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer on recovery.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>  
> 2019-06-13 17:58:20,878 ERROR org.apache.zookeeper.ClientCnxn: Time out error 
> occurred for the packet 'clientPath:null serverPath:null finished:false 
> header:: 27,4  replyHeader:: 27,4295687588,0  request:: 
> '/rmstore1/ZKRMStateRoot/RMDTSecretManagerRoot/RMDTMasterKeysRoot/DelegationKey_49,F
>   response:: 
> #31ff8a16b74ffe129768ffdbffe949ff8dffd517ffcafffa,s{4295423577,4295423577,1560342837789,1560342837789,0,0,0,0,17,0,4295423577}
>  '.
> 2019-06-13 17:58:20,877 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: HDFS_DELEGATION_TOKEN, Service: 
> X:1616, Ident: (token for root: HDFS_DELEGATION_TOKEN 
> owner=root/had...@hadoop.com, renewer=yarn, realUser=, 
> issueDate=1560366110990, maxDate=1560970910990, sequenceNumber=111891, 
> masterKeyId=3);exp=1560534896413; apps=[application_1560346941775_28115]]
> 2019-06-13 17:58:20,924 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer on recovery.
> java.lang.IllegalStateException: Timer already cancelled.
> at java.util.Timer.sched(Timer.java:397)
> at java.util.Timer.schedule(Timer.java:208)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.setTimerForTokenRenewal(DelegationTokenRenewer.java:612)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:523)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, 

[jira] [Commented] (YARN-9639) DecommissioningNodesWatcher cause memory leak

2019-06-23 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870560#comment-16870560
 ] 

Bibin A Chundatt commented on YARN-9639:


Thank you [~BilwaST] for uplading patch.

 Looks good to me.  Will wait for day before getting this in.

> DecommissioningNodesWatcher cause memory leak
> -
>
> Key: YARN-9639
> URL: https://issues.apache.org/jira/browse/YARN-9639
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9639-001.patch
>
>
> Missing cancel() of Timer task in DecommissioningNodesWatcher could leak to 
> memory leak.
> PollTimerTask holds the reference of rmcontext



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9644) First RMContext always leaked during switch over

2019-06-23 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-9644:
---
Attachment: YARN-9644.003.patch

> First RMContext always leaked during switch over
> 
>
> Key: YARN-9644
> URL: https://issues.apache.org/jira/browse/YARN-9644
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-9644.001.patch, YARN-9644.002.patch, 
> YARN-9644.003.patch
>
>
> As per my understanding following 2 issues causes the issue.
> * WebApp holds the reference to First applicationMasterServer instance, which 
> has rmcontext with ActiveServiceContext (holds RMApps + nodes map). WebApp 
> remains to life time of RM process.
> * On transistion to active RMNMInfo object is registered in  MBean and never 
> unregistered on transitionToStandBy
> On transistion to Standby and again based to active new RMContext gets 
> created, but above 2 issues causes first RMcontext persist still RMShutdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9639) DecommissioningNodesWatcher cause memory leak

2019-06-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870519#comment-16870519
 ] 

Hadoop QA commented on YARN-9639:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 82m 
59s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}134m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9639 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972612/YARN-9639-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bb3717c394f1 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b28ddb2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/24312/artifact/out/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24312/testReport/ |
| Max. process+thread count | 925 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-9644) First RMContext always leaked during switch over

2019-06-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870517#comment-16870517
 ] 

Hadoop QA commented on YARN-9644:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 56s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 55 unchanged - 4 fixed = 55 total (was 59) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 19s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}144m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMHA |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9644 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972611/YARN-9644.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bbd7a8fe87ae 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b28ddb2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| unit | 

[jira] [Assigned] (YARN-5867) DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir

2019-06-23 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt reassigned YARN-5867:
--

Assignee: (was: Bibin A Chundatt)

> DirectoryCollection#checkDirs can cause incorrect permission of nmlocal dir
> ---
>
> Key: YARN-5867
> URL: https://issues.apache.org/jira/browse/YARN-5867
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Major
>
> Steps to reproduce
> ===
> # Set umask to 077 for user
> # Start nodemanager with nmlocal dir configured
> nmlocal dir permission is *755* 
> {{LocalDirsHandlerService#serviceInit}}
> {code} 
> FsPermission perm = new FsPermission((short)0755);
> boolean createSucceeded = localDirs.createNonExistentDirs(localFs, perm);
> createSucceeded &= logDirs.createNonExistentDirs(localFs, perm);
> {code}
> # After  startup delete the nmlocal dir and wait for {{MonitoringTimerTask}} 
> to run (simulation using delete)
> # Now check the permission of {{nmlocal dir}} will be *700*
> *Root Cause*
> {{DirectoryCollection#testDirs}} checks as following
> {code}
> // create a random dir to make sure fs isn't in read-only mode
> verifyDirUsingMkdir(testDir);
> {code}
> which cause a new Random directory to be create in {{localdir}} using
> {{DiskChecker.checkDir(dir)}} -> {{!mkdirsWithExistsCheck(dir)}} causing the 
> nmlocal dir to be created with wrong permission. *700*
> Few application fail to container launch due to permission denied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9639) DecommissioningNodesWatcher cause memory leak

2019-06-23 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-9639:

Attachment: YARN-9639-001.patch

> DecommissioningNodesWatcher cause memory leak
> -
>
> Key: YARN-9639
> URL: https://issues.apache.org/jira/browse/YARN-9639
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9639-001.patch
>
>
> Missing cancel() of Timer task in DecommissioningNodesWatcher could leak to 
> memory leak.
> PollTimerTask holds the reference of rmcontext



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9644) First RMContext always leaked during switch over

2019-06-23 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870476#comment-16870476
 ] 

Bibin A Chundatt commented on YARN-9644:


cc: [~sunilg]/[~cheersyang]/[~wangda] Could you please review patch..

> First RMContext always leaked during switch over
> 
>
> Key: YARN-9644
> URL: https://issues.apache.org/jira/browse/YARN-9644
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-9644.001.patch, YARN-9644.002.patch
>
>
> As per my understanding following 2 issues causes the issue.
> * WebApp holds the reference to First applicationMasterServer instance, which 
> has rmcontext with ActiveServiceContext (holds RMApps + nodes map). WebApp 
> remains to life time of RM process.
> * On transistion to active RMNMInfo object is registered in  MBean and never 
> unregistered on transitionToStandBy
> On transistion to Standby and again based to active new RMContext gets 
> created, but above 2 issues causes first RMcontext persist still RMShutdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9644) First RMContext always leaked during switch over

2019-06-23 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-9644:
---
   Priority: Critical  (was: Major)
Description: 
As per my understanding following 2 issues causes the issue.

* WebApp holds the reference to First applicationMasterServer instance, which 
has rmcontext with ActiveServiceContext (holds RMApps + nodes map). WebApp 
remains to life time of RM process.
* On transistion to active RMNMInfo object is registered in  MBean and never 
unregistered on transitionToStandBy

On transistion to Standby and again based to active new RMContext gets created, 
but above 2 issues causes first RMcontext persist still RMShutdown.



  was:
On transistion to active RMNMInfo object is registered in  MBean and never 
unregistered on transitionToStandBy

Causing RMContext reference since its never unregistered

Summary: First RMContext always leaked during switch over  (was: 
RMNMInfo holds one RMContext causes memory leak)

> First RMContext always leaked during switch over
> 
>
> Key: YARN-9644
> URL: https://issues.apache.org/jira/browse/YARN-9644
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-9644.001.patch, YARN-9644.002.patch
>
>
> As per my understanding following 2 issues causes the issue.
> * WebApp holds the reference to First applicationMasterServer instance, which 
> has rmcontext with ActiveServiceContext (holds RMApps + nodes map). WebApp 
> remains to life time of RM process.
> * On transistion to active RMNMInfo object is registered in  MBean and never 
> unregistered on transitionToStandBy
> On transistion to Standby and again based to active new RMContext gets 
> created, but above 2 issues causes first RMcontext persist still RMShutdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9644) First RMContext always leaked during switch over

2019-06-23 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-9644:
---
Attachment: YARN-9644.002.patch

> First RMContext always leaked during switch over
> 
>
> Key: YARN-9644
> URL: https://issues.apache.org/jira/browse/YARN-9644
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-9644.001.patch, YARN-9644.002.patch
>
>
> As per my understanding following 2 issues causes the issue.
> * WebApp holds the reference to First applicationMasterServer instance, which 
> has rmcontext with ActiveServiceContext (holds RMApps + nodes map). WebApp 
> remains to life time of RM process.
> * On transistion to active RMNMInfo object is registered in  MBean and never 
> unregistered on transitionToStandBy
> On transistion to Standby and again based to active new RMContext gets 
> created, but above 2 issues causes first RMcontext persist still RMShutdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9640) Slow event processing could cause too many attempt unregister events

2019-06-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870469#comment-16870469
 ] 

Hadoop QA commented on YARN-9640:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m  8s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}138m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9640 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972602/YARN-9640.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a51e8640879b 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b28ddb2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24310/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24310/testReport/ |
| Max. process+thread count | 867 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-9374) HBaseTimelineWriterImpl sync writes has to avoid thread blocking if storage down

2019-06-23 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870448#comment-16870448
 ] 

Prabhu Joseph commented on YARN-9374:
-

[~eyang] Thanks for reviewing. Have removed {{isHBaseDown}} and used 
{{IOException}} detection for both {{HbaseTimelineWriter}} and 
{{HbaseTimelineReader}} in  [^YARN-9374-007.patch].

> HBaseTimelineWriterImpl sync writes has to avoid thread blocking if storage 
> down
> 
>
> Key: YARN-9374
> URL: https://issues.apache.org/jira/browse/YARN-9374
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9374-001.patch, YARN-9374-002.patch, 
> YARN-9374-003.patch, YARN-9374-004.patch, YARN-9374-005.patch, 
> YARN-9374-006.patch, YARN-9374-007.patch
>
>
> HBaseTimelineWriterImpl sync writes has to avoid thread blocking if storage 
> is down. Currently we check if hbase storage is down in TimelineReader before 
> reading entities and fail immediately in YARN-8302. Similar fix is needed for 
> write. Async is handled in YARN-9335.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org