[jira] [Commented] (YARN-10493) RunC container repository v2

2021-04-01 Thread Matthew Sharp (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313424#comment-17313424
 ] 

Matthew Sharp commented on YARN-10493:
--

[~ebadger] The latest PR contains the namespace support that we had discussed.  
I also updated the design doc to outline that a bit more.

> RunC container repository v2
> 
>
> Key: YARN-10493
> URL: https://issues.apache.org/jira/browse/YARN-10493
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, yarn
>Affects Versions: 3.3.0
>Reporter: Craig Condit
>Assignee: Matthew Sharp
>Priority: Major
>  Labels: pull-request-available
> Attachments: runc-container-repository-v2-design.pdf, 
> runc-container-repository-v2-design_updated.pdf
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The current runc container repository design has scalability and usability 
> issues which will likely limit widespread adoption. We should address this 
> with a new, V2 layout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10493) RunC container repository v2

2021-04-01 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313418#comment-17313418
 ] 

Hadoop QA commented on YARN-10493:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} codespell {color} | {color:blue}  0m  
1s{color} |  | {color:blue} codespell was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 3 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 14m 
53s{color} |  | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 9s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
5s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
54s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
47s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
51s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
28s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
39s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  5m 
32s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 44s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} |  | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
50s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
29s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
29s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
46s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
46s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
1s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
40s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
35s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} |  | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
12s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
25s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  5m 
52s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | 

[jira] [Comment Edited] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)

2021-04-01 Thread Anup Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313363#comment-17313363
 ] 

Anup Agarwal edited comment on YARN-10724 at 4/1/21, 6:20 PM:
--

completedContainer getting called multiple times may or may not be an issue, 
but logging the same event multiple times might be. SchedulerApplicationAttempt 
maintains a liveContainers collection and uses it to deduplicate container 
completion (incl preemption) events; while leafQueue does no such thing, that's 
why the patch moved the preemption logging to AppAttempt rather than leafQueue, 
similar to FSAppAttempt.


was (Author: 108anup):
completedContainer getting called multiple times may or may not be an issue, 
but logging the same event multiple times might be. SchedulerApplicationAttempt 
maintains a liveContainers collection and uses it to deduplicate preemption 
events; while leafQueue does no such thing, that's why the patch moved the 
preemption logging to AppAttempt rather than leafQueue, similar to FSAppAttempt.

> Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
> 
>
> Key: YARN-10724
> URL: https://issues.apache.org/jira/browse/YARN-10724
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anup Agarwal
>Assignee: Anup Agarwal
>Priority: Minor
> Attachments: YARN-10724-trunk.001.patch, YARN-10724-trunk.002.patch
>
>
> Currently CapacityScheduler over-counts preemption metrics inside 
> QueueMetrics.
>  
> One cause of the over-counting:
> When a container is already running, SchedulerNode does not remove the 
> container immediately from launchedContainer list and waits from the NM to 
> kill the container.
> Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke 
> signalContainersIfOvercommited (AbstractYarnScheduler) which look for 
> containers to preempt based on the launchedContainers list. Both these calls 
> can create a ContainerPreemptEvent for the same container (as RM is waiting 
> for NM to kill the container). This leads LeafQueue to log metrics for the 
> same preemption multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)

2021-04-01 Thread Anup Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313363#comment-17313363
 ] 

Anup Agarwal commented on YARN-10724:
-

completedContainer getting called multiple times may or may not be an issue, 
but logging the same event multiple times might be. SchedulerApplicationAttempt 
maintains a liveContainers collection and uses it to deduplicate preemption 
events; while leafQueue does no such thing, that's why the patch moved the 
preemption logging to AppAttempt rather than leafQueue, similar to FSAppAttempt.

> Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
> 
>
> Key: YARN-10724
> URL: https://issues.apache.org/jira/browse/YARN-10724
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anup Agarwal
>Assignee: Anup Agarwal
>Priority: Minor
> Attachments: YARN-10724-trunk.001.patch, YARN-10724-trunk.002.patch
>
>
> Currently CapacityScheduler over-counts preemption metrics inside 
> QueueMetrics.
>  
> One cause of the over-counting:
> When a container is already running, SchedulerNode does not remove the 
> container immediately from launchedContainer list and waits from the NM to 
> kill the container.
> Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke 
> signalContainersIfOvercommited (AbstractYarnScheduler) which look for 
> containers to preempt based on the launchedContainers list. Both these calls 
> can create a ContainerPreemptEvent for the same container (as RM is waiting 
> for NM to kill the container). This leads LeafQueue to log metrics for the 
> same preemption multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10493) RunC container repository v2

2021-04-01 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313322#comment-17313322
 ] 

Hadoop QA commented on YARN-10493:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} codespell {color} | {color:blue}  0m  
1s{color} |  | {color:blue} codespell was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 3 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 14m 
39s{color} |  | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
10s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
7s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
53s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
43s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
53s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
29s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
38s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  5m 
36s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  1s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} |  | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
51s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
33s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
33s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
48s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
48s{color} |  | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} blanks {color} | {color:red}  0m  
0s{color} | 
[/blanks-eol.txt|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2848/1/artifact/out/blanks-eol.txt]
 | {color:red} The patch has 1 line(s) that end in blanks. Use git apply 
--whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
40s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
36s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} |  | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
16s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
24s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| 

[jira] [Commented] (YARN-10465) Support getClusterNodes, getNodeToLabels, getLabelsToNodes, getClusterNodeLabels API's for Federation

2021-04-01 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313317#comment-17313317
 ] 

Brahma Reddy Battula commented on YARN-10465:
-

[~dmmkr]  thanks for reporting and working on this. at first glance, patch lgtm.

[~subru] any chance to review this..?

> Support getClusterNodes, getNodeToLabels, getLabelsToNodes, 
> getClusterNodeLabels API's for Federation
> -
>
> Key: YARN-10465
> URL: https://issues.apache.org/jira/browse/YARN-10465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Major
> Attachments: YARN-10465.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10724) Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)

2021-04-01 Thread Zhengbo Li (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313255#comment-17313255
 ] 

Zhengbo Li commented on YARN-10724:
---

Hi, I may encounter the same issue if it's as described, so I'm trying to 
understand it better. Do you mean the issue was that LeafQueue's 
`completedContainer` method are incorrectly invoked multiple times? Thanks

> Overcounting of preemptions in CapacityScheduler (LeafQueue metrics)
> 
>
> Key: YARN-10724
> URL: https://issues.apache.org/jira/browse/YARN-10724
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anup Agarwal
>Assignee: Anup Agarwal
>Priority: Minor
> Attachments: YARN-10724-trunk.001.patch, YARN-10724-trunk.002.patch
>
>
> Currently CapacityScheduler over-counts preemption metrics inside 
> QueueMetrics.
>  
> One cause of the over-counting:
> When a container is already running, SchedulerNode does not remove the 
> container immediately from launchedContainer list and waits from the NM to 
> kill the container.
> Both NODE_RESOURCE_UPDATE and NODE_UPDATE invoke 
> signalContainersIfOvercommited (AbstractYarnScheduler) which look for 
> containers to preempt based on the launchedContainers list. Both these calls 
> can create a ContainerPreemptEvent for the same container (as RM is waiting 
> for NM to kill the container). This leads LeafQueue to log metrics for the 
> same preemption multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313243#comment-17313243
 ] 

Qi Zhu edited comment on YARN-10726 at 4/1/21, 3:36 PM:


[~pbacsko]

I tested locally just now, it passed.

I make it fixed.

Thanks.


was (Author: zhuqi):
[~pbacsko]

I tested locally just now, it passed.

Thanks.

> Log the size of DelegationTokenRenewer event queue in case of too many 
> pending events
> -
>
> Key: YARN-10726
> URL: https://issues.apache.org/jira/browse/YARN-10726
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10726.001.patch, YARN-10726.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313243#comment-17313243
 ] 

Qi Zhu commented on YARN-10726:
---

[~pbacsko]

I tested locally just now, it passed.

Thanks.

> Log the size of DelegationTokenRenewer event queue in case of too many 
> pending events
> -
>
> Key: YARN-10726
> URL: https://issues.apache.org/jira/browse/YARN-10726
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10726.001.patch, YARN-10726.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events

2021-04-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313241#comment-17313241
 ] 

Peter Bacsko commented on YARN-10726:
-

Ok, I strongly believe that the failing tests are flaky.

[~zhuqi] could you verify it by running them locally a couple of times?

> Log the size of DelegationTokenRenewer event queue in case of too many 
> pending events
> -
>
> Key: YARN-10726
> URL: https://issues.apache.org/jira/browse/YARN-10726
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10726.001.patch, YARN-10726.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events

2021-04-01 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313240#comment-17313240
 ] 

Hadoop QA commented on YARN-10726:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
18s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
 9s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 40s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 19m 
46s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
49s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  4s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| 

[jira] [Comment Edited] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312219#comment-17312219
 ] 

Qi Zhu edited comment on YARN-10503 at 4/1/21, 3:10 PM:


[~pbacsko] [~gandras] [~ebadger] 

Sorry for too many issues needed for review.

If you any other advice about this?  When you are free. :D

Thanks.


was (Author: zhuqi):
[~pbacsko] [~gandras] [~ebadger] 

Sorry for too many issues needed for review.

If you any other advice about this? :D

Thanks.

> Support queue capacity in terms of absolute resources with custom 
> resourceType.
> ---
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10503.001.patch, YARN-10503.002.patch, 
> YARN-10503.003.patch, YARN-10503.004.patch, YARN-10503.005.patch, 
> YARN-10503.006.patch, YARN-10503.007.patch, YARN-10503.008.patch, 
> YARN-10503.009.patch
>
>
> Now the absolute resources are memory and cores.
> {code:java}
> /**
>  * Different resource types supported.
>  */
> public enum AbsoluteResourceType {
>   MEMORY, VCORES;
> }{code}
> But in our GPU production clusters, we need to support more resourceTypes.
> It's very import for cluster scaling when with different resourceType 
> absolute demands.
>  
> This Jira will handle GPU first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312219#comment-17312219
 ] 

Qi Zhu edited comment on YARN-10503 at 4/1/21, 3:05 PM:


[~pbacsko] [~gandras] [~ebadger] 

Sorry for too many issues needed for review.

If you any other advice about this? :D

Thanks.


was (Author: zhuqi):
[~pbacsko] [~gandras] [~ebadger] 

If you any other advice? :D

Thanks.

> Support queue capacity in terms of absolute resources with custom 
> resourceType.
> ---
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10503.001.patch, YARN-10503.002.patch, 
> YARN-10503.003.patch, YARN-10503.004.patch, YARN-10503.005.patch, 
> YARN-10503.006.patch, YARN-10503.007.patch, YARN-10503.008.patch, 
> YARN-10503.009.patch
>
>
> Now the absolute resources are memory and cores.
> {code:java}
> /**
>  * Different resource types supported.
>  */
> public enum AbsoluteResourceType {
>   MEMORY, VCORES;
> }{code}
> But in our GPU production clusters, we need to support more resourceTypes.
> It's very import for cluster scaling when with different resourceType 
> absolute demands.
>  
> This Jira will handle GPU first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10693) Add document for YARN-10623 auto refresh queue conf in cs.

2021-04-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313219#comment-17313219
 ] 

Peter Bacsko commented on YARN-10693:
-

I'll review this as soon as I have some spare cycles.

> Add document for YARN-10623 auto refresh queue conf in cs.
> --
>
> Key: YARN-10693
> URL: https://issues.apache.org/jira/browse/YARN-10693
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10693.001.patch, YARN-10693.002.patch, 
> YARN-10693.003.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10637) We should support fs to cs support for auto refresh queues when conf changed, after YARN-10623 finished.

2021-04-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313218#comment-17313218
 ] 

Peter Bacsko commented on YARN-10637:
-

Thanks [~zhuqi] I think it's good then.

[~gandras] do you have any comments?

> We should support fs to cs support for auto refresh queues when conf changed, 
> after YARN-10623 finished.
> 
>
> Key: YARN-10637
> URL: https://issues.apache.org/jira/browse/YARN-10637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10637.001.patch, YARN-10637.002.patch, 
> YARN-10637.003.patch, YARN-10637.004.patch
>
>
> cc [~pbacsko] [~gandras] [~bteke]
> We should also fill this, when  YARN-10623 finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10693) Add document for YARN-10623 auto refresh queue conf in cs.

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313199#comment-17313199
 ] 

Qi Zhu commented on YARN-10693:
---

[~pbacsko]

This is the corresponding document. :D

> Add document for YARN-10623 auto refresh queue conf in cs.
> --
>
> Key: YARN-10693
> URL: https://issues.apache.org/jira/browse/YARN-10693
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10693.001.patch, YARN-10693.002.patch, 
> YARN-10693.003.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10637) We should support fs to cs support for auto refresh queues when conf changed, after YARN-10623 finished.

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313196#comment-17313196
 ] 

Qi Zhu commented on YARN-10637:
---

Thanks [~pbacsko] for review.

Actually fs always enabled this, so we don't need to add this for FS-side:
{code:java}
@Override
public void serviceInit(Configuration conf) throws Exception {
  this.allocFile = getAllocationFile(conf);
  if (this.allocFile != null) {
this.fs = allocFile.getFileSystem(conf);
reloadThread = new Thread(() -> {
  while (running) {
try {
  synchronized (this) {
reloadListener.onCheck();
  }
  long time = clock.getTime();
  long lastModified =
  fs.getFileStatus(allocFile).getModificationTime();
  if (lastModified > lastSuccessfulReload &&
  time > lastModified + ALLOC_RELOAD_WAIT_MS) {
try {
  reloadAllocations();
} catch (Exception ex) {
  if (!lastReloadAttemptFailed) {
LOG.error("Failed to reload fair scheduler config file - " +
"will use existing allocations.", ex);
  }
  lastReloadAttemptFailed = true;
}
  } else if (lastModified == 0l) {
if (!lastReloadAttemptFailed) {
  LOG.warn("Failed to reload fair scheduler config file because" +
  " last modified returned 0. File exists: "
  + fs.exists(allocFile));
}
lastReloadAttemptFailed = true;
  }
} catch (IOException e) {
  LOG.error("Exception while loading allocation file: " + e);
}
try {
  Thread.sleep(reloadIntervalMs);
} catch (InterruptedException ex) {
  LOG.info(
  "Interrupted while waiting to reload alloc configuration");
}
  }
});
reloadThread.setName("AllocationFileReloader");
reloadThread.setDaemon(true);
  }
  super.serviceInit(conf);
}
{code}

> We should support fs to cs support for auto refresh queues when conf changed, 
> after YARN-10623 finished.
> 
>
> Key: YARN-10637
> URL: https://issues.apache.org/jira/browse/YARN-10637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10637.001.patch, YARN-10637.002.patch, 
> YARN-10637.003.patch, YARN-10637.004.patch
>
>
> cc [~pbacsko] [~gandras] [~bteke]
> We should also fill this, when  YARN-10623 finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313195#comment-17313195
 ] 

Qi Zhu commented on YARN-10726:
---

Thanks [~pbacsko] for commit.:D

> Log the size of DelegationTokenRenewer event queue in case of too many 
> pending events
> -
>
> Key: YARN-10726
> URL: https://issues.apache.org/jira/browse/YARN-10726
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10726.001.patch, YARN-10726.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events

2021-04-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313192#comment-17313192
 ] 

Peter Bacsko commented on YARN-10726:
-

Ah, I already committed the change. Let's hope Jenkins comes back green :)

+1

> Log the size of DelegationTokenRenewer event queue in case of too many 
> pending events
> -
>
> Key: YARN-10726
> URL: https://issues.apache.org/jira/browse/YARN-10726
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10726.001.patch, YARN-10726.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events

2021-04-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313189#comment-17313189
 ] 

Peter Bacsko commented on YARN-10726:
-

"hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer" - this 
is unrelated I believe. This test case has been failing for a long time.

> Log the size of DelegationTokenRenewer event queue in case of too many 
> pending events
> -
>
> Key: YARN-10726
> URL: https://issues.apache.org/jira/browse/YARN-10726
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10726.001.patch, YARN-10726.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10637) We should support fs to cs support for auto refresh queues when conf changed, after YARN-10623 finished.

2021-04-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313184#comment-17313184
 ] 

Peter Bacsko commented on YARN-10637:
-

Thanks [~zhuqi] this makes sense. Is this always enabled in Fair Scheduler? 
Because we should only add this policy if auto-refresh is enabled on the 
FS-side.

> We should support fs to cs support for auto refresh queues when conf changed, 
> after YARN-10623 finished.
> 
>
> Key: YARN-10637
> URL: https://issues.apache.org/jira/browse/YARN-10637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10637.001.patch, YARN-10637.002.patch, 
> YARN-10637.003.patch, YARN-10637.004.patch
>
>
> cc [~pbacsko] [~gandras] [~bteke]
> We should also fill this, when  YARN-10623 finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10637) We should support fs to cs support for auto refresh queues when conf changed, after YARN-10623 finished.

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313141#comment-17313141
 ] 

Qi Zhu commented on YARN-10637:
---

 [~pbacsko] [~gandras]

If you any advice about this, just as we done in auto queue deletion. :D

> We should support fs to cs support for auto refresh queues when conf changed, 
> after YARN-10623 finished.
> 
>
> Key: YARN-10637
> URL: https://issues.apache.org/jira/browse/YARN-10637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10637.001.patch, YARN-10637.002.patch, 
> YARN-10637.003.patch, YARN-10637.004.patch
>
>
> cc [~pbacsko] [~gandras] [~bteke]
> We should also fill this, when  YARN-10623 finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313139#comment-17313139
 ] 

Qi Zhu commented on YARN-10726:
---

Thanks [~pbacsko] for your review and suggestion.

You concern is very right, i agree with you.

I have updated it in latest patch, just as we done in async dispatcher.:D

Thanks.

> Log the size of DelegationTokenRenewer event queue in case of too many 
> pending events
> -
>
> Key: YARN-10726
> URL: https://issues.apache.org/jira/browse/YARN-10726
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10726.001.patch, YARN-10726.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events

2021-04-01 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10726:
--
Attachment: YARN-10726.002.patch

> Log the size of DelegationTokenRenewer event queue in case of too many 
> pending events
> -
>
> Key: YARN-10726
> URL: https://issues.apache.org/jira/browse/YARN-10726
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10726.001.patch, YARN-10726.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events

2021-04-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313138#comment-17313138
 ] 

Peter Bacsko commented on YARN-10726:
-

This is from {{AsyncDispatcher}}:

{noformat}
 if (qSize != 0 && qSize % 1000 == 0
  && lastEventQueueSizeLogged != qSize) {
lastEventQueueSizeLogged = qSize;
LOG.info("Size of event-queue is " + qSize);
  }
{noformat}

Update the code with {{lastEventQueueSizeLogged}}.

> Log the size of DelegationTokenRenewer event queue in case of too many 
> pending events
> -
>
> Key: YARN-10726
> URL: https://issues.apache.org/jira/browse/YARN-10726
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10726.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10714) Remove dangling dynamic queues on reinitialization

2021-04-01 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313125#comment-17313125
 ] 

Hadoop QA commented on YARN-10714:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
21s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
16s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 24s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 
35s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
53s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/888/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 2 unchanged - 2 fixed = 3 total (was 4) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 27s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | 

[jira] [Comment Edited] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events

2021-04-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313123#comment-17313123
 ] 

Peter Bacsko edited comment on YARN-10726 at 4/1/21, 12:01 PM:
---

Thanks [~zhuqi]. I think it's a good idea. My only concern (which might not be 
valid) is that we have too many events, this code can possibly run too 
frequently. For example, if you go 998, 998, 999, 1000, 1001, 1002, then it 
prints at 1000, then it starts to consume events, size goes back from 1000 to 
990, then it prints the size again.

I think we should limit how often we print this message. We should log it too 
often, I'm not sure how we do this in other parts of the code. I'll check what 
can be the best solution.


was (Author: pbacsko):
Thanks [~zhuqi]. I think it's a good idea. My only "concern" is that we have 
too many events, this code can possibly run too frequently. For example, if you 
go 998, 998, 999, 1000, 1001, 1002, then it prints at 1000, then it starts to 
consume events, size goes back from 1000 to 990, then it prints the size again.

I think we should limit how often we print this message. We should log it too 
often, I'm not sure how we do this in other parts of the code. I'll check what 
can be the best solution.

> Log the size of DelegationTokenRenewer event queue in case of too many 
> pending events
> -
>
> Key: YARN-10726
> URL: https://issues.apache.org/jira/browse/YARN-10726
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10726.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events

2021-04-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313123#comment-17313123
 ] 

Peter Bacsko commented on YARN-10726:
-

Thanks [~zhuqi]. I think it's a good idea. My only "concern" is that we have 
too many events, this code can possibly run too frequently. For example, if you 
go 998, 998, 999, 1000, 1001, 1002, then it prints at 1000, then it starts to 
consume events, size goes back from 1000 to 990, then it prints the size again.

I think we should limit how often we print this message. We should log it too 
often, I'm not sure how we do this in other parts of the code. I'll check what 
can be the best solution.

> Log the size of DelegationTokenRenewer event queue in case of too many 
> pending events
> -
>
> Key: YARN-10726
> URL: https://issues.apache.org/jira/browse/YARN-10726
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10726.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10726) Log the size of DelegationTokenRenewer event queue in case of too many pending events

2021-04-01 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10726:

Summary: Log the size of DelegationTokenRenewer event queue in case of too 
many pending events  (was: We should log size of pending 
DelegationTokenRenewerEvent queue, when pending too many events.)

> Log the size of DelegationTokenRenewer event queue in case of too many 
> pending events
> -
>
> Key: YARN-10726
> URL: https://issues.apache.org/jira/browse/YARN-10726
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10726.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9618) NodesListManager event improvement

2021-04-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313105#comment-17313105
 ] 

Peter Bacsko commented on YARN-9618:


Thanks for the patch [~zhuqi] and [~gandras] for the review, I committed this 
to trunk.

> NodesListManager event improvement
> --
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Fix For: 3.4.0
>
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch, 
> YARN-9618.006.patch, YARN-9618.007.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10726) We should log size of pending DelegationTokenRenewerEvent queue, when pending too many events.

2021-04-01 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313103#comment-17313103
 ] 

Hadoop QA commented on YARN-10726:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
20s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
22s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m  2s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 21m 
27s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m  
0s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
57s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 13s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| 

[jira] [Updated] (YARN-9618) NodesListManager event improvement

2021-04-01 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9618:
---
Summary: NodesListManager event improvement  (was: NodeListManager event 
improvement)

> NodesListManager event improvement
> --
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch, 
> YARN-9618.006.patch, YARN-9618.007.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10714) Remove dangling dynamic queues on reinitialization

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313029#comment-17313029
 ] 

Qi Zhu commented on YARN-10714:
---

Thanks [~gandras] for patch.

The latest patch LGTM +1.

Waiting for the jenkins.

> Remove dangling dynamic queues on reinitialization
> --
>
> Key: YARN-10714
> URL: https://issues.apache.org/jira/browse/YARN-10714
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10714.001.patch, YARN-10714.002.patch
>
>
> Current logic does not handle orphaned auto created child queues. The 
> following example steps show a scenario in which it is possible to submit 
> applications to an orphaned queue, that has an invalid (already removed) 
> ParentQueue.
>  # Auto create a queue root.a.a-auto
>  # Remove root.a from the config
>  # Reinitialize CS without restarting it (possible via mutation API)
>  # Submit application to root.a.a-auto, while root.a is a non-existent queue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10714) Remove dangling dynamic queues on reinitialization

2021-04-01 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10714:

Attachment: YARN-10714.002.patch

> Remove dangling dynamic queues on reinitialization
> --
>
> Key: YARN-10714
> URL: https://issues.apache.org/jira/browse/YARN-10714
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10714.001.patch, YARN-10714.002.patch
>
>
> Current logic does not handle orphaned auto created child queues. The 
> following example steps show a scenario in which it is possible to submit 
> applications to an orphaned queue, that has an invalid (already removed) 
> ParentQueue.
>  # Auto create a queue root.a.a-auto
>  # Remove root.a from the config
>  # Reinitialize CS without restarting it (possible via mutation API)
>  # Submit application to root.a.a-auto, while root.a is a non-existent queue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10726) We should log size of pending DelegationTokenRenewerEvent queue, when pending too many events.

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313009#comment-17313009
 ] 

Qi Zhu commented on YARN-10726:
---

cc [~pbacsko] [~gandras] 

Actually there are no any monitor for the pending DelegationTokenRenewerEvent 
queue, i think it is important for us to know if the pending 
DelegationTokenRenewerEvents are too many.

It's helpful for big cluster.

 

 

> We should log size of pending DelegationTokenRenewerEvent queue, when pending 
> too many events.
> --
>
> Key: YARN-10726
> URL: https://issues.apache.org/jira/browse/YARN-10726
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10726) We should log size of pending DelegationTokenRenewerEvent queue, when pending too many events.

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313009#comment-17313009
 ] 

Qi Zhu edited comment on YARN-10726 at 4/1/21, 8:46 AM:


cc [~pbacsko] [~gandras] 

Actually there are no any monitor for the pending DelegationTokenRenewerEvent 
queue, i think it is important for us to know if the pending 
DelegationTokenRenewerEvents are too many.

It's helpful for big cluster. Updated a patch for review.

Thanks.

 

 


was (Author: zhuqi):
cc [~pbacsko] [~gandras] 

Actually there are no any monitor for the pending DelegationTokenRenewerEvent 
queue, i think it is important for us to know if the pending 
DelegationTokenRenewerEvents are too many.

It's helpful for big cluster.

 

 

> We should log size of pending DelegationTokenRenewerEvent queue, when pending 
> too many events.
> --
>
> Key: YARN-10726
> URL: https://issues.apache.org/jira/browse/YARN-10726
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9618) NodeListManager event improvement

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313006#comment-17313006
 ] 

Qi Zhu commented on YARN-9618:
--

Thanks [~pbacsko] [~gandras] for confirm.

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch, 
> YARN-9618.006.patch, YARN-9618.007.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10726) We should log size of pending DelegationTokenRenewerEvent queue, when pending too many events.

2021-04-01 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10726:
--
Issue Type: Improvement  (was: Bug)

> We should log size of pending DelegationTokenRenewerEvent queue, when pending 
> too many events.
> --
>
> Key: YARN-10726
> URL: https://issues.apache.org/jira/browse/YARN-10726
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10726) We should log size of pending DelegationTokenRenewerEvent queue, when pending too many events.

2021-04-01 Thread Qi Zhu (Jira)
Qi Zhu created YARN-10726:
-

 Summary: We should log size of pending DelegationTokenRenewerEvent 
queue, when pending too many events.
 Key: YARN-10726
 URL: https://issues.apache.org/jira/browse/YARN-10726
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Qi Zhu
Assignee: Qi Zhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9618) NodeListManager event improvement

2021-04-01 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312997#comment-17312997
 ] 

Andras Gyori commented on YARN-9618:


I think it is well done now, +1.

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch, 
> YARN-9618.006.patch, YARN-9618.007.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9618) NodeListManager event improvement

2021-04-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312989#comment-17312989
 ] 

Peter Bacsko commented on YARN-9618:


+1 LGTM

[~gandras] are you OK with the patch?

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch, 
> YARN-9618.006.patch, YARN-9618.007.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10720) YARN WebAppProxyServlet should support connection timeout to prevent proxy server from hanging

2021-04-01 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312945#comment-17312945
 ] 

Peter Bacsko commented on YARN-10720:
-

+1

thanks [~zhuqi] for the patch, committed to trunk.

> YARN WebAppProxyServlet should support connection timeout to prevent proxy 
> server from hanging
> --
>
> Key: YARN-10720
> URL: https://issues.apache.org/jira/browse/YARN-10720
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10720.001.patch, YARN-10720.002.patch, 
> YARN-10720.003.patch, YARN-10720.004.patch, YARN-10720.005.patch, 
> YARN-10720.006.patch, image-2021-03-29-14-04-33-776.png, 
> image-2021-03-29-14-05-32-708.png
>
>
> Following is proxy server show, {color:#de350b}too many connections from one 
> client{color}, this caused the proxy server hang, and the yarn web can't jump 
> to web proxy.
> !image-2021-03-29-14-04-33-776.png|width=632,height=57!
> Following is the AM which is abnormal, but proxy server don't know it is 
> abnormal already, so the connections can't be closed, we should add time out 
> support in proxy server to prevent this. And one abnormal AM may cause 
> hundreds even thousands of connections, it is very heavy.
> !image-2021-03-29-14-05-32-708.png|width=669,height=101!
>  
> After i kill the abnormal AM, the proxy server become healthy. This case 
> happened many times in our production clusters, our clusters are huge, and 
> the abnormal AM will be existed in a regular case.
>  
> I will add timeout supported in web proxy server in this jira.
>  
> cc  [~pbacsko] [~ebadger] [~Jim_Brennan]  [~ztang]  [~epayne] [~gandras]  
> [~bteke]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10720) YARN WebAppProxyServlet should support connection timeout to prevent proxy server from hanging

2021-04-01 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10720:

Summary: YARN WebAppProxyServlet should support connection timeout to 
prevent proxy server from hanging  (was: YARN WebAppProxyServlet should support 
connection timeout to prevent proxy server hang.)

> YARN WebAppProxyServlet should support connection timeout to prevent proxy 
> server from hanging
> --
>
> Key: YARN-10720
> URL: https://issues.apache.org/jira/browse/YARN-10720
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10720.001.patch, YARN-10720.002.patch, 
> YARN-10720.003.patch, YARN-10720.004.patch, YARN-10720.005.patch, 
> YARN-10720.006.patch, image-2021-03-29-14-04-33-776.png, 
> image-2021-03-29-14-05-32-708.png
>
>
> Following is proxy server show, {color:#de350b}too many connections from one 
> client{color}, this caused the proxy server hang, and the yarn web can't jump 
> to web proxy.
> !image-2021-03-29-14-04-33-776.png|width=632,height=57!
> Following is the AM which is abnormal, but proxy server don't know it is 
> abnormal already, so the connections can't be closed, we should add time out 
> support in proxy server to prevent this. And one abnormal AM may cause 
> hundreds even thousands of connections, it is very heavy.
> !image-2021-03-29-14-05-32-708.png|width=669,height=101!
>  
> After i kill the abnormal AM, the proxy server become healthy. This case 
> happened many times in our production clusters, our clusters are huge, and 
> the abnormal AM will be existed in a regular case.
>  
> I will add timeout supported in web proxy server in this jira.
>  
> cc  [~pbacsko] [~ebadger] [~Jim_Brennan]  [~ztang]  [~epayne] [~gandras]  
> [~bteke]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312904#comment-17312904
 ] 

Qi Zhu edited comment on YARN-8631 at 4/1/21, 6:31 AM:
---

cc [~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi]

I think YARN-7962 already fixed this case:

We change isServiceStarted to false in write lock. 

 
{code:java}
serviceStateLock.writeLock().lock();
try {
  isServiceStarted = false;
  this.renewerService.shutdown();
} finally {
  serviceStateLock.writeLock().unlock();
}
{code}
And processDelegationTokenRenewerEvent race condition may happen before 
YARN-7962

 

 
{code:java}
private void processDelegationTokenRenewerEvent(
DelegationTokenRenewerEvent evt) {
  serviceStateLock.readLock().lock();
  try {
if (isServiceStarted) {
  Future future =
  renewerService.submit(new DelegationTokenRenewerRunnable(evt));
  futures.put(evt, future);
} else {
  pendingEventQueue.add(evt);
}
  } finally {
serviceStateLock.readLock().unlock();
  }
}

@Override
public void run() {
  if (evt instanceof DelegationTokenRenewerAppSubmitEvent) {
DelegationTokenRenewerAppSubmitEvent appSubmitEvt =
(DelegationTokenRenewerAppSubmitEvent) evt;
handleDTRenewerAppSubmitEvent(appSubmitEvt);
  } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) {
DelegationTokenRenewerAppRecoverEvent appRecoverEvt =
(DelegationTokenRenewerAppRecoverEvent) evt;
handleDTRenewerAppRecoverEvent(appRecoverEvt);
  } else if (evt.getType().equals(
  DelegationTokenRenewerEventType.FINISH_APPLICATION)) {
DelegationTokenRenewer.this.handleAppFinishEvent(evt);
  }
}

@SuppressWarnings("unchecked")
private void handleDTRenewerAppRecoverEvent(
DelegationTokenRenewerAppRecoverEvent event) {
  try {
// Setup tokens for renewal during recovery
DelegationTokenRenewer.this.handleAppSubmitEvent(event);
  } catch (Throwable t) {
LOG.warn("Unable to add the application to the delegation token"
+ " renewer on recovery.", t);
  }
}
{code}
Now the race condition not happened, including the null pointer error, my 
cluster happened also. 

I think we can close this now.

Thanks.

 


was (Author: zhuqi):
[~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi]

I think YARN-7962 already fixed this case:

We change isServiceStarted to false in write lock. 

 
{code:java}
serviceStateLock.writeLock().lock();
try {
  isServiceStarted = false;
  this.renewerService.shutdown();
} finally {
  serviceStateLock.writeLock().unlock();
}
{code}
And processDelegationTokenRenewerEvent race condition may happen before 
YARN-7962

 

 
{code:java}
private void processDelegationTokenRenewerEvent(
DelegationTokenRenewerEvent evt) {
  serviceStateLock.readLock().lock();
  try {
if (isServiceStarted) {
  Future future =
  renewerService.submit(new DelegationTokenRenewerRunnable(evt));
  futures.put(evt, future);
} else {
  pendingEventQueue.add(evt);
}
  } finally {
serviceStateLock.readLock().unlock();
  }
}

@Override
public void run() {
  if (evt instanceof DelegationTokenRenewerAppSubmitEvent) {
DelegationTokenRenewerAppSubmitEvent appSubmitEvt =
(DelegationTokenRenewerAppSubmitEvent) evt;
handleDTRenewerAppSubmitEvent(appSubmitEvt);
  } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) {
DelegationTokenRenewerAppRecoverEvent appRecoverEvt =
(DelegationTokenRenewerAppRecoverEvent) evt;
handleDTRenewerAppRecoverEvent(appRecoverEvt);
  } else if (evt.getType().equals(
  DelegationTokenRenewerEventType.FINISH_APPLICATION)) {
DelegationTokenRenewer.this.handleAppFinishEvent(evt);
  }
}

@SuppressWarnings("unchecked")
private void handleDTRenewerAppRecoverEvent(
DelegationTokenRenewerAppRecoverEvent event) {
  try {
// Setup tokens for renewal during recovery
DelegationTokenRenewer.this.handleAppSubmitEvent(event);
  } catch (Throwable t) {
LOG.warn("Unable to add the application to the delegation token"
+ " renewer on recovery.", t);
  }
}
{code}
Now the race condition not happened, including the null pointer error, my 
cluster happened also. 

I think we can close this now.

Thanks.

 

> YARN RM fails to add the application to the delegation token renewer on 
> recovery
> 
>
> Key: YARN-8631
> URL: https://issues.apache.org/jira/browse/YARN-8631
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Sanjay Divgi
>Assignee: Umesh Mittal
>Priority: Blocker
> Attachments: YARN-8631.001.patch, 
> hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log
>
>
> On HA 

[jira] [Comment Edited] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312904#comment-17312904
 ] 

Qi Zhu edited comment on YARN-8631 at 4/1/21, 6:31 AM:
---

[~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi]

I think YARN-7962 already fixed this case:

We change isServiceStarted to false in write lock. 

 
{code:java}
serviceStateLock.writeLock().lock();
try {
  isServiceStarted = false;
  this.renewerService.shutdown();
} finally {
  serviceStateLock.writeLock().unlock();
}
{code}
And processDelegationTokenRenewerEvent race condition may happen before 
YARN-7962

 

 
{code:java}
private void processDelegationTokenRenewerEvent(
DelegationTokenRenewerEvent evt) {
  serviceStateLock.readLock().lock();
  try {
if (isServiceStarted) {
  Future future =
  renewerService.submit(new DelegationTokenRenewerRunnable(evt));
  futures.put(evt, future);
} else {
  pendingEventQueue.add(evt);
}
  } finally {
serviceStateLock.readLock().unlock();
  }
}

@Override
public void run() {
  if (evt instanceof DelegationTokenRenewerAppSubmitEvent) {
DelegationTokenRenewerAppSubmitEvent appSubmitEvt =
(DelegationTokenRenewerAppSubmitEvent) evt;
handleDTRenewerAppSubmitEvent(appSubmitEvt);
  } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) {
DelegationTokenRenewerAppRecoverEvent appRecoverEvt =
(DelegationTokenRenewerAppRecoverEvent) evt;
handleDTRenewerAppRecoverEvent(appRecoverEvt);
  } else if (evt.getType().equals(
  DelegationTokenRenewerEventType.FINISH_APPLICATION)) {
DelegationTokenRenewer.this.handleAppFinishEvent(evt);
  }
}

@SuppressWarnings("unchecked")
private void handleDTRenewerAppRecoverEvent(
DelegationTokenRenewerAppRecoverEvent event) {
  try {
// Setup tokens for renewal during recovery
DelegationTokenRenewer.this.handleAppSubmitEvent(event);
  } catch (Throwable t) {
LOG.warn("Unable to add the application to the delegation token"
+ " renewer on recovery.", t);
  }
}
{code}
Now the race condition not happened, including the null pointer error, my 
cluster happened also. 

I think we can close this now.

Thanks.

 


was (Author: zhuqi):
[~snemeth] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi]

I think YARN-7962 already fixed this case:

We change isServiceStarted to false in write lock. 

 
{code:java}
serviceStateLock.writeLock().lock();
try {
  isServiceStarted = false;
  this.renewerService.shutdown();
} finally {
  serviceStateLock.writeLock().unlock();
}
{code}
And processDelegationTokenRenewerEvent race condition may happen before 
YARN-7962

 

 
{code:java}
private void processDelegationTokenRenewerEvent(
DelegationTokenRenewerEvent evt) {
  serviceStateLock.readLock().lock();
  try {
if (isServiceStarted) {
  Future future =
  renewerService.submit(new DelegationTokenRenewerRunnable(evt));
  futures.put(evt, future);
} else {
  pendingEventQueue.add(evt);
}
  } finally {
serviceStateLock.readLock().unlock();
  }
}

@Override
public void run() {
  if (evt instanceof DelegationTokenRenewerAppSubmitEvent) {
DelegationTokenRenewerAppSubmitEvent appSubmitEvt =
(DelegationTokenRenewerAppSubmitEvent) evt;
handleDTRenewerAppSubmitEvent(appSubmitEvt);
  } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) {
DelegationTokenRenewerAppRecoverEvent appRecoverEvt =
(DelegationTokenRenewerAppRecoverEvent) evt;
handleDTRenewerAppRecoverEvent(appRecoverEvt);
  } else if (evt.getType().equals(
  DelegationTokenRenewerEventType.FINISH_APPLICATION)) {
DelegationTokenRenewer.this.handleAppFinishEvent(evt);
  }
}

@SuppressWarnings("unchecked")
private void handleDTRenewerAppRecoverEvent(
DelegationTokenRenewerAppRecoverEvent event) {
  try {
// Setup tokens for renewal during recovery
DelegationTokenRenewer.this.handleAppSubmitEvent(event);
  } catch (Throwable t) {
LOG.warn("Unable to add the application to the delegation token"
+ " renewer on recovery.", t);
  }
}
{code}
Now the race condition not happened, including the null pointer error, my 
cluster happened also. 

I think we can close this now.

Thanks.

 

> YARN RM fails to add the application to the delegation token renewer on 
> recovery
> 
>
> Key: YARN-8631
> URL: https://issues.apache.org/jira/browse/YARN-8631
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Sanjay Divgi
>Assignee: Umesh Mittal
>Priority: Blocker
> Attachments: YARN-8631.001.patch, 
> hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log
>
>
> On HA cluster we have 

[jira] [Commented] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery

2021-04-01 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312904#comment-17312904
 ] 

Qi Zhu commented on YARN-8631:
--

[~snemeth] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi]

I think YARN-7962 already fixed this case:

We change isServiceStarted to false in write lock. 

 
{code:java}
serviceStateLock.writeLock().lock();
try {
  isServiceStarted = false;
  this.renewerService.shutdown();
} finally {
  serviceStateLock.writeLock().unlock();
}
{code}
And processDelegationTokenRenewerEvent race condition may happen before 
YARN-7962

 

 
{code:java}
private void processDelegationTokenRenewerEvent(
DelegationTokenRenewerEvent evt) {
  serviceStateLock.readLock().lock();
  try {
if (isServiceStarted) {
  Future future =
  renewerService.submit(new DelegationTokenRenewerRunnable(evt));
  futures.put(evt, future);
} else {
  pendingEventQueue.add(evt);
}
  } finally {
serviceStateLock.readLock().unlock();
  }
}

@Override
public void run() {
  if (evt instanceof DelegationTokenRenewerAppSubmitEvent) {
DelegationTokenRenewerAppSubmitEvent appSubmitEvt =
(DelegationTokenRenewerAppSubmitEvent) evt;
handleDTRenewerAppSubmitEvent(appSubmitEvt);
  } else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) {
DelegationTokenRenewerAppRecoverEvent appRecoverEvt =
(DelegationTokenRenewerAppRecoverEvent) evt;
handleDTRenewerAppRecoverEvent(appRecoverEvt);
  } else if (evt.getType().equals(
  DelegationTokenRenewerEventType.FINISH_APPLICATION)) {
DelegationTokenRenewer.this.handleAppFinishEvent(evt);
  }
}

@SuppressWarnings("unchecked")
private void handleDTRenewerAppRecoverEvent(
DelegationTokenRenewerAppRecoverEvent event) {
  try {
// Setup tokens for renewal during recovery
DelegationTokenRenewer.this.handleAppSubmitEvent(event);
  } catch (Throwable t) {
LOG.warn("Unable to add the application to the delegation token"
+ " renewer on recovery.", t);
  }
}
{code}
Now the race condition not happened, including the null pointer error, my 
cluster happened also. 

I think we can close this now.

Thanks.

 

> YARN RM fails to add the application to the delegation token renewer on 
> recovery
> 
>
> Key: YARN-8631
> URL: https://issues.apache.org/jira/browse/YARN-8631
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Sanjay Divgi
>Assignee: Umesh Mittal
>Priority: Blocker
> Attachments: YARN-8631.001.patch, 
> hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log
>
>
> On HA cluster we have observed that yarn resource manager fails to add the 
> application to the delegation token renewer on recovery.
> Below is the error:
> {code:java}
> 2018-08-07 08:41:23,850 INFO security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:renewToken(635)) - Renewed delegation-token= 
> [Kind: TIMELINE_DELEGATION_TOKEN, Service: 172.27.84.192:8188, Ident: 
> (TIMELINE_DELEGATION_TOKEN owner=hrt_qa_hive_spark, renewer=yarn, realUser=, 
> issueDate=1533624642302, maxDate=1534229442302, sequenceNumber=18, 
> masterKeyId=4);exp=1533717683478; apps=[application_1533623972681_0001]]
> 2018-08-07 08:41:23,855 WARN security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to 
> add the application to the delegation token renewer on recovery.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org