date:20200122



[ 
https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021810#comment-17021810
 ] 

Brahma Reddy Battula commented on YARN-10089:
-

Looks testfailure is related, will look into this.

> [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM 
> registeration))
> -
>
> Key: YARN-10089
> URL: https://issues.apache.org/jira/browse/YARN-10089
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-10089-001.patch, YARN-10089-002.patch, 
> YARN-10089-003.patch
>
>
> PhysicalResource will be null always, in following scenario
> i) Upgrade RM from 2.7 to 3.0.
> ii) Upgrade NM from 2.7 to 3.0.
> Here when NM re-register,as RMContext already have this nodeID so it will not 
> added again as httpport also same hence "PhysicalResource" will be always 
> null in the upgraded cluster till RM restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))



[ 
https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021722#comment-17021722
 ] 

Hadoop QA commented on YARN-10089:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
51s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 42s{color} | {color:orange} root: The patch generated 1 new + 105 unchanged 
- 0 fixed = 106 total (was 105) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  2s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 23m 
30s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}115m 36s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
20s{color} | {color:green} hadoop-sls in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
52s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}263m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing |
|   | hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | YARN-10089 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991583/YARN-10089-003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux faf93df9c17f 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28

[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry



[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021671#comment-17021671
 ] 

Hadoop QA commented on YARN-9768:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
7s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
51s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
45s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m  3s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}181m 20s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | YARN-9768 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991528/YARN-9768.009.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux b928eb10f94c 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime



[ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021659#comment-17021659
 ] 

Hadoop QA commented on YARN-10084:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
49s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 87m 
57s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
25s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}162m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | YARN-10084 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991581/YARN-10084.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a1aaabc6fd54 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC

[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-01-22 Thread Wangda Tan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021651#comment-17021651
 ] 

Wangda Tan commented on YARN-9879:
--

Thanks [~shuzirra], [~wilfreds] for sharing your thoughts!

1) Regarding change semantics of GetQueueName() to return full qualified queue 
name v.s. use GetQueuePath:

If we decide to go the first route, we need to remove usages of 
AbstractCSQueue.GetQueuePath (which has 128 usages), and add a 
GetShortQueueName in some places. So to me, there are no significant 
differences to just change internal CS usages to use GetQueuePath().

2) No matter which way we decided to go, I think we should make sure that:

API compatibility, this is critical since I assume there're lots of monitoring 
framework, JMX metrics, etc. based on this. If we upgrade an existing CS-based 
cluster, they should expect the same result. Please refer to API compatibility: 
[https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html]

Internal usage of GetQueuePath (or GetShortQueueName if we choose proposed 
approach). And externally, we should make sure we can get a queue by short 
name, or long name. I want to make sure we only check short name / long name in 
external call (like submit app to specified queue), and in all other places, we 
use the full queue path to operate. I think introducing a new CSQueueStore 
sounds good, but I recommend to add a separate method to CSQueueStore to check 
both short/long names and make it used by external callers only (And in 
contrast, internal CS method should check only one HashMap instead of two). We 
can review details of CSQueueStore separately.

> Allow multiple leaf queues with the same name in CS
> ---
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: DesignDoc_v1.pdf, YARN-9879.POC001.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime

2020-01-22 Thread Eric Badger (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021613#comment-17021613
 ] 

Eric Badger commented on YARN-10084:


Hey [~epayne], the patch looks good, but I have a comment about the tests. 
You've added a test to make sure that leaf queues correctly inherit their 
parent's default and max lifetimes. Could you also add a test to check that the 
leaf queue is able to override the parent's default and max lifetimes?

> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch, YARN-10084.002.patch, 
> YARN-10084.003.patch, YARN-10084.004.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10094) Add configuration to support NM overuse in RM



[ 
https://issues.apache.org/jira/browse/YARN-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021610#comment-17021610
 ] 

Eric Payne commented on YARN-10094:
---

[~cane], I feel that this JIRA may have the same goal as YARN-291. Several 
pieces of the overcommit feature are already in YARN.

> Add configuration to support NM overuse in RM
> -
>
> Key: YARN-10094
> URL: https://issues.apache.org/jira/browse/YARN-10094
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-10094.001.patch
>
>
> In a large cluster , upgrade NM will cost too much time.
> Some times we want to support memory or cpu overuse from RM view.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10090) ApplicationNotFoundException will cause a UndeclaredThrowableException



[ 
https://issues.apache.org/jira/browse/YARN-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021605#comment-17021605
 ] 

Eric Payne commented on YARN-10090:
---

[~yzzjjyy], can you please tell me where you are seeing this exception? When I 
try this (in 2.8 and 3.3), I don't see any exception either in the UI or in the 
RM log. If you are seeing it in the RM log, that may be okay, in my opinion.

> ApplicationNotFoundException will cause a UndeclaredThrowableException
> --
>
> Key: YARN-10090
> URL: https://issues.apache.org/jira/browse/YARN-10090
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2
> Environment: Hadoop 2.9.2 
>Reporter: qiwei huang
>Priority: Minor
>
> while entering a non-exist application page(e.g. 
> RM:8088/cluster/app/application_1234), the getApplicationReport will throw an 
> ApplicationNotFoundException and would cause UndeclaredThrowableException in 
> the UserGroupInformation. the log is like:
> 2020-01-15 15:10:13,056 [6224200281] - ERROR 
> [90425890@qtp-1302725372-97757:AppBlock@124] - Failed to read the application 
> application_1572848307818_1234.2020-01-15 15:10:13,056 [6224200281] - ERROR 
> [90425890@qtp-1302725372-97757:AppBlock@124] - Failed to read the application 
> application_1572848307818_2006587.java.lang.reflect.UndeclaredThrowableException
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911)
>  at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:114) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.render(RMAppBlock.java:70)
>  at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) 
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) 
> at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:848) at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
>  at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212) at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.app(RmController.java:54)
>  at sun.reflect.GeneratedMethodAccessor222.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:173) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>  at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>  at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>  at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:178)
>  at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>  at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>  at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>  at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>  at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>  at 
> org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
>  at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>  at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644)
>  at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
>  at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>  at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1440)
>  at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>  at

[jira] [Commented] (YARN-10091) Support clean up orphan app's log in LogAggService



[ 
https://issues.apache.org/jira/browse/YARN-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021587#comment-17021587
 ] 

Eric Payne commented on YARN-10091:
---

[~cane], can you please be more specific? What is an orphan app and where is 
the directory? Are you talking about /user//.staging?

> Support clean up orphan app's log in LogAggService
> --
>
> Key: YARN-10091
> URL: https://issues.apache.org/jira/browse/YARN-10091
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
>
> In a large cluster, there will exist orphan app log directory which will 
> cause disk leak.We should support cleanup app log directory for this kind of 
> app



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime

2020-01-22 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021579#comment-17021579
 ] 

Jim Brennan commented on YARN-10084:


+1 (non-binding) on patch 004.  I built it locally and ran the unit test again.

> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch, YARN-10084.002.patch, 
> YARN-10084.003.patch, YARN-10084.004.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))



 [ 
https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10089:

Attachment: YARN-10089-003.patch

> [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM 
> registeration))
> -
>
> Key: YARN-10089
> URL: https://issues.apache.org/jira/browse/YARN-10089
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-10089-001.patch, YARN-10089-002.patch, 
> YARN-10089-003.patch
>
>
> PhysicalResource will be null always, in following scenario
> i) Upgrade RM from 2.7 to 3.0.
> ii) Upgrade NM from 2.7 to 3.0.
> Here when NM re-register,as RMContext already have this nodeID so it will not 
> added again as httpport also same hence "PhysicalResource" will be always 
> null in the upgraded cluster till RM restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9790) Failed to set default-application-lifetime if maximum-application-lifetime is less than or equal to zero



[ 
https://issues.apache.org/jira/browse/YARN-9790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021574#comment-17021574
 ] 

Eric Payne commented on YARN-9790:
--

If there are no objections, I'd like to backport this all the way back to 
branch-2.10.

> Failed to set default-application-lifetime if maximum-application-lifetime is 
> less than or equal to zero
> 
>
> Key: YARN-9790
> URL: https://issues.apache.org/jira/browse/YARN-9790
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9790.001.patch, YARN-9790.002.patch, 
> YARN-9790.003.patch, YARN-9790.004.patch
>
>
> capacity-scheduler
> {code}
> ...
> yarn.scheduler.capacity.root.dev.maximum-application-lifetime=-1
> yarn.scheduler.capacity.root.dev.default-application-lifetime=604800
> {code}
> refreshQueue was failed as follows
> {code}
> 2019-08-28 15:21:57,423 WARN  resourcemanager.AdminService 
> (AdminService.java:logAndWrapException(910)) - Exception refresh queues.
> java.io.IOException: Failed to re-init queues : Default lifetime604800 can't 
> exceed maximum lifetime -1
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:477)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:423)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:394)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceManagerAdministrationProtocolPBServiceImpl.refreshQueues(ResourceManagerAdministrationProtocolPBServiceImpl.java:114)
> at 
> org.apache.hadoop.yarn.proto.ResourceManagerAdministrationProtocol$ResourceManagerAdministrationProtocolService$2.callBlockingMethod(ResourceManagerAdministrationProtocol.java:271)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Default 
> lifetime604800 can't exceed maximum lifetime -1
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:268)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:162)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:141)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:259)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:171)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:726)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:472)
> ... 12 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime



 [ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-10084:
--
Attachment: YARN-10084.004.patch

> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch, YARN-10084.002.patch, 
> YARN-10084.003.patch, YARN-10084.004.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))



 [ 
https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10089:

Attachment: YARN-10089-002.patch

> [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM 
> registeration))
> -
>
> Key: YARN-10089
> URL: https://issues.apache.org/jira/browse/YARN-10089
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-10089-001.patch, YARN-10089-002.patch
>
>
> PhysicalResource will be null always, in following scenario
> i) Upgrade RM from 2.7 to 3.0.
> ii) Upgrade NM from 2.7 to 3.0.
> Here when NM re-register,as RMContext already have this nodeID so it will not 
> added again as httpport also same hence "PhysicalResource" will be always 
> null in the upgraded cluster till RM restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-8472) YARN Container Phase 2

2020-01-22 Thread Eric Yang (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang resolved YARN-8472.
-
Fix Version/s: 3.3.0
 Release Note: 
- Improved debugging Docker container on YARN
- Improved security for running Docker containers
- Improved cgroup management for docker container.
   Resolution: Fixed

> YARN Container Phase 2
> --
>
> Key: YARN-8472
> URL: https://issues.apache.org/jira/browse/YARN-8472
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
>
> In YARN-3611, we have implemented basic Docker container support for YARN.  
> This story is the next phase to improve container usability.
> Several area for improvements are:
>  # Software defined network support
>  # Interactive shell to container
>  # User management sss/nscd integration
>  # Runc/containerd support
>  # Metrics/Logs integration with Timeline service v2 
>  # Docker container profiles
>  # Docker cgroup management



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime



[ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021554#comment-17021554
 ] 

Eric Payne commented on YARN-10084:
---

I clicked on the "compile" link above and it says:
{noformat}
[ERROR] warning Error running install script for optional dependency: 
"/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp/node_modules/phantomjs-prebuilt:
 Command failed.
[ERROR] Exit code: 1
[ERROR] Command: node install.js
[ERROR] Arguments: 
[ERROR] Directory: 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp/node_modules/phantomjs-prebuilt
[ERROR] Output:
[ERROR] PhantomJS not found on PATH
[INFO] info This module is OPTIONAL, you can safely ignore this error
[ERROR] Downloading 
https://github.com/Medium/phantomjs/releases/download/v2.1.1/phantomjs-2.1.1-linux-x86_64.tar.bz2
[ERROR] Saving to /tmp/phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2
[ERROR] Receiving...
[ERROR] 
[ERROR] Error making request.
[ERROR] Error: socket hang up
[ERROR] at createHangUpError (_http_client.js:342:15)
[ERROR] at TLSSocket.socketOnEnd (_http_client.js:437:23)
[ERROR] at emitNone (events.js:111:20)
[ERROR] at TLSSocket.emit (events.js:208:7)
[ERROR] at endReadableNT (_stream_readable.js:1064:12)
[ERROR] at _combinedTickCallback (internal/process/next_tick.js:139:11)
[ERROR] at process._tickCallback (internal/process/next_tick.js:181:9)
[ERROR] 
[ERROR] Please report this full log at https://github.com/Medium/phantomjs;
{noformat}
I don't think this is related to the code in the patch. Hopefully, it's a 
transient build environment issue.
I'm uploading version 004 to address the checkstyle issues.

> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch, YARN-10084.002.patch, 
> YARN-10084.003.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))



 [ 
https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10089:

Attachment: (was: YARN-10089-002.patch)

> [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM 
> registeration))
> -
>
> Key: YARN-10089
> URL: https://issues.apache.org/jira/browse/YARN-10089
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-10089-001.patch
>
>
> PhysicalResource will be null always, in following scenario
> i) Upgrade RM from 2.7 to 3.0.
> ii) Upgrade NM from 2.7 to 3.0.
> Here when NM re-register,as RMContext already have this nodeID so it will not 
> added again as httpport also same hence "PhysicalResource" will be always 
> null in the upgraded cluster till RM restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9292) Implement logic to keep docker image consistent in application that uses :latest tag

2020-01-22 Thread Eric Yang (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021545#comment-17021545
 ] 

Eric Yang commented on YARN-9292:
-

>From today's YARN Docker community meeting, we have decided to abandon this 
>patch.  There is possibilities that AM can fail over a node which has 
>different latest tag than previous node.  The frame of reference to latest tag 
>is relative to the node where AM is running.  If there are inconsistency in 
>the cluster, this patch will not solve the consistency problem.  Newly spawned 
>AM will use a different sha id that maps to latest tag, which leads to 
>inconsistent sha id used by the same application.

The ideal design is to have YARN client to discover the latest tag is 
referencing, then populate that information to rest of the job.  Unfortunately, 
there is no connection between YARN and where docker registry might be running. 
 Hence, it is not possible to implement this proper for YARN and Docker 
integration.  The community settle on document this wrinkle and try to avoid 
using latest tag as best practice.

For Runc container, it will be possible to use HDFS as source of truth to look 
up the global hash designation for runc container.  YARN client can query HDFS 
for the latest tag and it will be consistent on all nodes.  This will add some 
extra protocol interactions between YARN client and RM to solve this problem by 
the ideal design.

> Implement logic to keep docker image consistent in application that uses 
> :latest tag
> 
>
> Key: YARN-9292
> URL: https://issues.apache.org/jira/browse/YARN-9292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9292.001.patch, YARN-9292.002.patch, 
> YARN-9292.003.patch, YARN-9292.004.patch, YARN-9292.005.patch, 
> YARN-9292.006.patch, YARN-9292.007.patch, YARN-9292.008.patch
>
>
> Docker image with latest tag can run in YARN cluster without any validation 
> in node managers. If a image with latest tag is changed during containers 
> launch. It might produce inconsistent results between nodes. This is surfaced 
> toward end of development for YARN-9184 to keep docker image consistent 
> within a job. One of the ideas to keep :latest tag consistent for a job, is 
> to use docker image command to figure out the image id and use image id to 
> propagate to rest of the container requests. There are some challenges to 
> overcome:
>  # The latest tag does not exist on the node where first container starts. 
> The first container will need to download the latest image, and find image 
> ID. This can introduce lag time for other containers to start.
>  # If image id is used to start other container, container-executor may have 
> problems to check if the image is coming from a trusted source. Both image 
> name and ID must be supply through .cmd file to container-executor. However, 
> hacker can supply incorrect image id and defeat container-executor security 
> checks.
> If we can over come those challenges, it maybe possible to keep docker image 
> consistent with one application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))



[ 
https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021537#comment-17021537
 ] 

Brahma Reddy Battula commented on YARN-10089:
-

[~elgoiri] thanks for taking a look.

Just cloned the code in my new office laptop, so format issues are there.

apart from the following, everything addressed now. since rest of the log's 
have same format.I feel, we can change all together in another jira..?
 * Let's use the logger format {} for NodeStatusUpdaterImpl#219.

> [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM 
> registeration))
> -
>
> Key: YARN-10089
> URL: https://issues.apache.org/jira/browse/YARN-10089
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-10089-001.patch, YARN-10089-002.patch
>
>
> PhysicalResource will be null always, in following scenario
> i) Upgrade RM from 2.7 to 3.0.
> ii) Upgrade NM from 2.7 to 3.0.
> Here when NM re-register,as RMContext already have this nodeID so it will not 
> added again as httpport also same hence "PhysicalResource" will be always 
> null in the upgraded cluster till RM restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))



 [ 
https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10089:

Attachment: YARN-10089-002.patch

> [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM 
> registeration))
> -
>
> Key: YARN-10089
> URL: https://issues.apache.org/jira/browse/YARN-10089
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-10089-001.patch, YARN-10089-002.patch
>
>
> PhysicalResource will be null always, in following scenario
> i) Upgrade RM from 2.7 to 3.0.
> ii) Upgrade NM from 2.7 to 3.0.
> Here when NM re-register,as RMContext already have this nodeID so it will not 
> added again as httpport also same hence "PhysicalResource" will be always 
> null in the upgraded cluster till RM restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))

2020-01-22 Thread Jira



[ 
https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021510#comment-17021510
 ] 

Íñigo Goiri commented on YARN-10089:


Thanks [~brahmareddy] for fixing this.
Minor comments:
* Add a comment to the empty {{setPhysicalResource()}} implementations saying 
that we do this for backwards compatibility or similar.
* Let's use the logger format {} for NodeStatusUpdaterImpl#219.
* Is it correct to compare with != for ResourceTrackerService#469? Should this 
be !equals?
* Add an extra space in RMNode#137
* There will probably be a longer than 80 chars error in TestNMReconnect.

> [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM 
> registeration))
> -
>
> Key: YARN-10089
> URL: https://issues.apache.org/jira/browse/YARN-10089
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-10089-001.patch
>
>
> PhysicalResource will be null always, in following scenario
> i) Upgrade RM from 2.7 to 3.0.
> ii) Upgrade NM from 2.7 to 3.0.
> Here when NM re-register,as RMContext already have this nodeID so it will not 
> added again as httpport also same hence "PhysicalResource" will be always 
> null in the upgraded cluster till RM restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))



 [ 
https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula reassigned YARN-10089:
---

Assignee: Brahma Reddy Battula

> [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM 
> registeration))
> -
>
> Key: YARN-10089
> URL: https://issues.apache.org/jira/browse/YARN-10089
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-10089-001.patch
>
>
> PhysicalResource will be null always, in following scenario
> i) Upgrade RM from 2.7 to 3.0.
> ii) Upgrade NM from 2.7 to 3.0.
> Here when NM re-register,as RMContext already have this nodeID so it will not 
> added again as httpport also same hence "PhysicalResource" will be always 
> null in the upgraded cluster till RM restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))



 [ 
https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10089:

Attachment: (was: YARN-10089-001.patch)

> [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM 
> registeration))
> -
>
> Key: YARN-10089
> URL: https://issues.apache.org/jira/browse/YARN-10089
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-10089-001.patch
>
>
> PhysicalResource will be null always, in following scenario
> i) Upgrade RM from 2.7 to 3.0.
> ii) Upgrade NM from 2.7 to 3.0.
> Here when NM re-register,as RMContext already have this nodeID so it will not 
> added again as httpport also same hence "PhysicalResource" will be always 
> null in the upgraded cluster till RM restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))



 [ 
https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10089:

Attachment: YARN-10089-001.patch

> [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM 
> registeration))
> -
>
> Key: YARN-10089
> URL: https://issues.apache.org/jira/browse/YARN-10089
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-10089-001.patch
>
>
> PhysicalResource will be null always, in following scenario
> i) Upgrade RM from 2.7 to 3.0.
> ii) Upgrade NM from 2.7 to 3.0.
> Here when NM re-register,as RMContext already have this nodeID so it will not 
> added again as httpport also same hence "PhysicalResource" will be always 
> null in the upgraded cluster till RM restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))



[ 
https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021504#comment-17021504
 ] 

Brahma Reddy Battula commented on YARN-10089:
-

Uploaded the initail patch. Kindly review. [~elgoiri] could please review, as 
you worked on YARN-5356.

*Testcase before FIx:*

==
{noformat}
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR] TestNMReconnect.testReconnect:155 expected:<> 
but was:
[ERROR] TestNMReconnect.testReconnect:155 expected:<> 
but was:
[INFO] 
[ERROR] Tests run: 8, Failures: 2, Errors: 0, Skipped: 0
[INFO] 
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 10.730 s
[INFO] Finished at: 2020-01-23T02:27:11+05:30{noformat}

> [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM 
> registeration))
> -
>
> Key: YARN-10089
> URL: https://issues.apache.org/jira/browse/YARN-10089
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-10089-001.patch
>
>
> PhysicalResource will be null always, in following scenario
> i) Upgrade RM from 2.7 to 3.0.
> ii) Upgrade NM from 2.7 to 3.0.
> Here when NM re-register,as RMContext already have this nodeID so it will not 
> added again as httpport also same hence "PhysicalResource" will be always 
> null in the upgraded cluster till RM restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))



 [ 
https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10089:

Attachment: YARN-10089-001.patch

> [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM 
> registeration))
> -
>
> Key: YARN-10089
> URL: https://issues.apache.org/jira/browse/YARN-10089
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Priority: Blocker
> Attachments: YARN-10089-001.patch
>
>
> PhysicalResource will be null always, in following scenario
> i) Upgrade RM from 2.7 to 3.0.
> ii) Upgrade NM from 2.7 to 3.0.
> Here when NM re-register,as RMContext already have this nodeID so it will not 
> added again as httpport also same hence "PhysicalResource" will be always 
> null in the upgraded cluster till RM restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime



[ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021463#comment-17021463
 ] 

Hadoop QA commented on YARN-10084:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
48s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
34s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
 4s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  6m 
27s{color} | {color:red} hadoop-yarn in trunk failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 15s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 5 new + 142 unchanged - 0 fixed = 147 total (was 142) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 45s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 23s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
25s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
46s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}183m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | YARN-10084 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991550/YARN-10084.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime

2020-01-22 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021445#comment-17021445
 ] 

Jim Brennan commented on YARN-10084:


Thanks [~epayne]!   I am +1 (non-binding) on patch 003.

cc: [~ebadger]

> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch, YARN-10084.002.patch, 
> YARN-10084.003.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime



[ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021440#comment-17021440
 ] 

Eric Payne commented on YARN-10084:
---

Thanks [~Jim_Brennan]. I uploaded version 003.

> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch, YARN-10084.002.patch, 
> YARN-10084.003.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime



 [ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-10084:
--
Attachment: YARN-10084.003.patch

> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch, YARN-10084.002.patch, 
> YARN-10084.003.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6592) [Umbrella] Rich placement constraints in YARN

2020-01-22 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated YARN-6592:
-
Attachment: (was: [YARN-7812] Improvements to Rich Placement 
Constraints in YARN - ASF JIRA.pdf)

> [Umbrella] Rich placement constraints in YARN
> -
>
> Key: YARN-6592
> URL: https://issues.apache.org/jira/browse/YARN-6592
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Konstantinos Karanasos
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: YARN-6592-Rich-Placement-Constraints-Design-V1.pdf
>
>
> This JIRA consolidates the efforts of YARN-5468 and YARN-4902.
> It adds support for rich placement constraints to YARN, such as affinity and 
> anti-affinity between allocations within the same or across applications.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6592) [Umbrella] Rich placement constraints in YARN

2020-01-22 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated YARN-6592:
-
Attachment: (was: [YARN-5468] Scheduling of long-running applications - 
ASF JIRA.pdf)

> [Umbrella] Rich placement constraints in YARN
> -
>
> Key: YARN-6592
> URL: https://issues.apache.org/jira/browse/YARN-6592
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Konstantinos Karanasos
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: YARN-6592-Rich-Placement-Constraints-Design-V1.pdf
>
>
> This JIRA consolidates the efforts of YARN-5468 and YARN-4902.
> It adds support for rich placement constraints to YARN, such as affinity and 
> anti-affinity between allocations within the same or across applications.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime

2020-01-22 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021401#comment-17021401
 ] 

Jim Brennan commented on YARN-10084:


Thanks for the update [~epayne]!  The code looks good to me.  One comment on 
the documentation:
{quote}`yarn.scheduler.capacity.root..default-application-lifetime` 
| Default lifetime (in seconds) of an application which is submitted to a 
queue.  Any value less than or equal to zero will be considered as disabled. If 
the user has not submitted application with lifetime value then this value will 
be taken. It is point-in-time configuration. This feature can be set at any 
level in the queue hierarchy.  Child queues will inherit their parent's value 
unless overridden at the child level.

Child queues can set this property to a value less than or equal to their 
parent's value.
{quote}
This sentence is inaccurate. Maybe just remove it or change to something like: 
If a child queue inherits this from the parent and the parent value is greater 
than the child's max value, the child's max value will be used for the default.
{quote}If set to 0, all the queue's max value must also be
 unlimited. Note : Default lifetime (if set at this level) can't exceed maximum 
lifetime.
{quote}

> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch, YARN-10084.002.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-22 Thread Jira



[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021368#comment-17021368
 ] 

Íñigo Goiri commented on YARN-9768:
---

Let's see what Yetus says.

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch, 
> YARN-9768.009.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-01-22 Thread Gergely Pollak (Jira)

[
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021280#comment-17021280
]

Gergely Pollak edited comment on YARN-9879 at 1/22/20 6:10 PM:
---

Thank you for your feedback [~leftnoteasy] and [~wilfreds]. Originally I tried
to keep the getQueueName's behavior, but as I started to investigate it's
behavior I've realized we MUST change the way it works.

First let's start with a simple question: What is the purpose of the queue's
name? Why does it have one, what do we want to use it for? (Ok these are
actually 3 questions)

As I see in the code the queue name's main purpose is to IDENTIFY a queue, and
not just some nice display string. This means the name MUST identify uniquely
the queue. Queues are looked up by their name, hence it must be unique or all
those references can break. So this is the reason I changed it's behavior to
return a unique identifier (the queue's path). Obviously I must check if it
breaks anything, and fix it, but allowing multiple leaf queues with the same
name is inherently a breaking change. I just try to minimize the impact to
change the reference internally to full name everywhere (as you both suggested
earlier).

About the API breaking. If we have an API which provides us with a queue name,
and currently it is a short name, then anyone who uses it to reference to the
queue by the provided name will fail in the case of name duplicates. If we
return the full name of the queue, then it will still work for them, unless
they build on the fact it is just a short name. As long as the queue name is
used for queue identification, and not for string operations, it shouldn't
cause any problem. Other cases must be identified.

This is why I ended up with this approach. This way we change the queue naming
once and for all to use full names, and we adjust services which would fail on
this change. But we cannot keep the short queue name as reference and have
multiple queues with the same name, it's just impossible. This patch will
already introduce some changes which can cause issues in already working
systems and it might be better to do all invasive changes at once.

I can use the getQueuePath (almost) everywhere where we currently using
getQueueName, but the result would be the same, with some severe
inconsistencies: Using short names would result you being able to get the name
of a queue, but you wouldn't be able to get your queue by that very same name
from the queue manager. This is just confusing, inconsistent, and not
maintenable in my opinion. The quemanager.get(queue.getQueueName()) call can
result in NULL or error! (when the queue name is not unique) This is not good
practice in my opinion.

We need the ambiguous queue list, because we provide a remove method, which can
result in a previously ambiguous name becoming ambiguous, and it's much faster
to get it from a hashmap O(1), and then check the size of the Set O(1), instead
of looking through all queues to see if the collision have been resolved O( n ).

The short name map has been introduced for the very same reason, when we look
up a queue, we just look it up in 2 HashMaps 2 x O(1), instead of iterating
through all queue names and splicing the last part for short name O( n ).

So all in all, I've sacrificed some memory space for a drastic speed increase.
O( n ) vs O(1) might not seem a huge improvement in the case of a few queues,
but considering the queue parse method will make a get call for each queue to
check if it is already present in the store, we have a complexity of O(n*n),
which IS something to think about.

Please help me to think this through one more time with taking my reasons into
consideration, thank you.

was (Author: shuzirra):
Thank you for your feedback [~leftnoteasy] and [~wilfreds]. Originally I tried
to keep the getQueueName's behavior, but as I started to investigate it's
behavior I've realized we MUST change the way it works.

First let's start with a simple question: What is the purpose of the queue's
name? Why does it have one, what do we want to use it for? (Ok these are
actually 3 questions)

About the API breaking. If we have an API which provides us with a queue name,
and currently it is

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime



[ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021294#comment-17021294
 ] 

Eric Payne commented on YARN-10084:
---

Version 002 attached. I did not change the logic since it already worked as you 
described. The changes are as follows:
- AbstractCSQueue: I added a comment and took out extra parenthesis.
- CapacityScheduler.md: I updated the descriptions of 
maximum-application-lifetime and default-application-lifetime.

> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch, YARN-10084.002.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime



[ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021287#comment-17021287
 ] 

Eric Payne commented on YARN-10084:
---

Okay. Thanks for your further analysis, [~Jim_Brennan].
bq. a child queue should not have a max lifetime longer than its parent's max 
lifetime
After thinking about it more, there is no reason a child queue can't have a 
larger max lifetime than a parent queue.
{quote}
What if you want only one queue to have no max?   How would you configure that? 
 Would be nice if you could specify the max at the root once, and only specify 
zero on the long job queue to specify that it has no max.
{quote}
So, >= 0 means that the max lifetime was set in the config and it should be 
used. < 0 means use the parent's max lifetime value. If root queue, then < 0 
means no lifetime value, and that will be inherited by child queues unless 
overridden.

> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch, YARN-10084.002.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime



 [ 
https://issues.apache.org/jira/browse/YARN-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-10084:
--
Attachment: YARN-10084.002.patch

> Allow inheritance of max app lifetime / default app lifetime
> 
>
> Key: YARN-10084
> URL: https://issues.apache.org/jira/browse/YARN-10084
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10084.001.patch, YARN-10084.002.patch
>
>
> Currently, {{maximum-application-lifetime}} and 
> {{default-application-lifetime}} must be set for each leaf queue. If it is 
> not set for a particular leaf queue, then there will be no time limit on apps 
> running in that queue. It should be possible to set 
> {{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
> queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-01-22 Thread Gergely Pollak (Jira)

[
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021280#comment-17021280
]

Gergely Pollak edited comment on YARN-9879 at 1/22/20 5:01 PM:
---

First let's start with a simple question: What is the purpose of the queue's
name? Why does it have one, what do we want to use it for? (Ok these are
actually 3 questions)

So all in all, I've sacrificed some memory space for a drastic speed increase.
O(n) vs O(1) might not seem a huge improvement in the case of a few queues, but
considering the queue parse method will make a get call for each queue to check
if it is already present in the store, we have a complexity of O(n*n), which IS
something to think about.

Please help me to think this through one more time with taking my reasons into
consideration, thank you.

was (Author: shuzirra):
Thank you for your feedback Wilfred Spiegelenburg and Wangda Tan. Originally I
tried to keep the getQueueName's behavior, but as I started to investigate it's
behavior I've realized we MUST change the way it works.

First let's start with a simple question: What is the purpose of the queue's
name? Why does it have one, what do we want to use it for? (Ok these are
actually 3 questions)

About the API breaking. If we have an API which provides us with a queue name,
and currently it is

[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-01-22 Thread Gergely Pollak (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021280#comment-17021280
 ] 

Gergely Pollak commented on YARN-9879:
--

Thank you for your feedback Wilfred Spiegelenburg and Wangda Tan. Originally I 
tried to keep the getQueueName's behavior, but as I started to investigate it's 
behavior I've realized we MUST change the way it works.

First let's start with a simple question: What is the purpose of the queue's 
name? Why does it have one, what do we want to use it for? (Ok these are 
actually 3 questions)

As I see in the code the queue name's main purpose is to IDENTIFY a queue, and 
not just some nice display string. This means the name MUST identify uniquely 
the queue. Queues are looked up by their name, hence it must be unique or all 
those references can break. So this is the reason I changed it's behavior to 
return a unique identifier (the queue's path). Obviously I must check if it 
breaks anything, and fix it, but allowing multiple leaf queues with the same 
name is inherently a breaking change. I just try to minimize the impact to 
change the reference internally to full name everywhere (as you both suggested 
earlier).

About the API breaking. If we have an API which provides us with a queue name, 
and currently it is a short name, then anyone who uses it to reference to the 
queue by the provided name will fail in the case of name duplicates. If we 
return the full name of the queue, then it will still work for them, unless 
they build on the fact it is just a short name. As long as the queue name is 
used for queue identification, and not for string operations, it shouldn't 
cause any problem. Other cases must be identified.

This is why I ended up with this approach. This way we change the queue naming 
once and for all to use full names, and we adjust services which would fail on 
this change. But we cannot keep the short queue name as reference and have 
multiple queues with the same name, it's just impossible. This patch will 
already introduce some changes which can cause issues in already working 
systems and it might be better to do all invasive changes at once.

I can use the getQueuePath (almost) everywhere where we currently using 
getQueueName, but the result would be the same, with some severe 
inconsistencies: Using short names would result you being able to get the name 
of a queue, but you wouldn't be able to get your queue by that very same name 
from the queue manager. This is just confusing, inconsistent, and not 
maintenable in my opinion. The quemanager.get(queue.getQueueName()) call can 
result in NULL or error! (when the queue name is not unique) This is not good 
practice in my opinion. 

We need the ambiguous queue list, because we provide a remove method, which can 
result in a previously ambiguous name becoming ambiguous, and it's much faster 
to get it from a hashmap O(1), and then check the size of the Set O(1), instead 
of looking through all queues to see if the collision have been resolved O(n).

The short name map has been introduced for the very same reason, when we look 
up a queue, we just look it up in 2 HashMaps 2 x O(1), instead of iterating 
through all queue names and splicing the last part for short name O(n).

So all in all, I've sacrificed some memory space for a drastic speed increase. 
O(n) vs O(1) might not seem a huge improvement in the case of a few queues, but 
considering the queue parse method will make a get call for each queue to check 
if it is already present in the store, we have a complexity of O(n*n), which IS 
something to think about.

Please help me to think this through one more time with taking my reasons into 
consideration, thank you.

> Allow multiple leaf queues with the same name in CS
> ---
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: DesignDoc_v1.pdf, YARN-9879.POC001.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check



[ 
https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021239#comment-17021239
 ] 

Hadoop QA commented on YARN-10085:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 27s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}152m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | YARN-10085 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991512/YARN-10085-005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 8801c3b30216 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d40d7cc |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/25420/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25420/testReport/ |
| Max. process+thread

[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception



[ 
https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021228#comment-17021228
 ] 

Hadoop QA commented on YARN-7913:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
 4s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 37s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} branch-3.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  1s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 67m 
14s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}137m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:70a0ef5d4a6 |
| JIRA Issue | YARN-7913 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991516/YARN-7913-branch-3.1.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 53efb2016a9f 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.1 / 96c653d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25418/testReport/ |
| Max. process+thread count | 763 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25418/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



>

[jira] [Commented] (YARN-10083) Provide utility to ask whether an application is in final status

2020-01-22 Thread Adam Antal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021226#comment-17021226
 ] 

Adam Antal commented on YARN-10083:
---

Thanks for the commit [~snemeth]!

> Provide utility to ask whether an application is in final status
> 
>
> Key: YARN-10083
> URL: https://issues.apache.org/jira/browse/YARN-10083
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-10083.001.patch, YARN-10083.002.patch, 
> YARN-10083.002.patch, YARN-10083.003.patch, YARN-10083.branch-3.2.001.patch
>
>
> This code part is severely duplicated across the Hadoop repo:
> {code:java}
>   public static boolean isApplicationFinalState(YarnApplicationState 
> appState) {
> return appState == YarnApplicationState.FINISHED
> || appState == YarnApplicationState.FAILED
> || appState == YarnApplicationState.KILLED;
>   }
> {code}
> This functionality is used heavily by the log aggregation as well, so we may 
> do some sanitizing here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10098) Add interface to get node iterators by scheduler key for AppPlacementAllocator



 [ 
https://issues.apache.org/jira/browse/YARN-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin Chundatt updated YARN-10098:
--
Summary: Add interface to get node iterators by scheduler key for 
AppPlacementAllocator  (was:  AppPlacementAllocator getPreferredNodeIterator 
based on scheduler key)

> Add interface to get node iterators by scheduler key for AppPlacementAllocator
> --
>
> Key: YARN-10098
> URL: https://issues.apache.org/jira/browse/YARN-10098
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception



[ 
https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021222#comment-17021222
 ] 

Szilard Nemeth commented on YARN-7913:
--

Thanks [~wilfreds] for other patches, committed them to their respective 
branches. Closing this jira.

> Improve error handling when application recovery fails with exception
> -
>
> Key: YARN-7913
> URL: https://issues.apache.org/jira/browse/YARN-7913
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-7913-branch-3.1.001.patch, 
> YARN-7913-branch-3.1.001.patch, YARN-7913-branch-3.2.001.patch, 
> YARN-7913.000.poc.patch, YARN-7913.001.patch, YARN-7913.002.patch, 
> YARN-7913.003.patch
>
>
> There are edge cases when the application recovery fails with an exception.
> Example failure scenario:
>  * setup: a queue is a leaf queue in the primary RM's config and the same 
> queue is a parent queue in the secondary RM's config.
>  * When failover happens with this setup, the recovery will fail for 
> applications on this queue, and an APP_REJECTED event will be dispatched to 
> the async dispatcher. On the same thread (that handles the recovery), a 
> NullPointerException is thrown when the applicationAttempt is tried to be 
> recovered 
> (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494).
>  I don't see a good way to avoid the NPE in this scenario, because when the 
> NPE occurs the APP_REJECTED has not been processed yet, and we don't know 
> that the application recovery failed.
> Currently the first exception will abort the recovery, and if there are X 
> applications, there will be ~X passive -> active RM transition attempts - the 
> passive -> active RM transition will only succeed when the last APP_REJECTED 
> event is processed on the async dispatcher thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7913) Improve error handling when application recovery fails with exception



 [ 
https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-7913:
-
Fix Version/s: 3.1.4
   3.2.2

> Improve error handling when application recovery fails with exception
> -
>
> Key: YARN-7913
> URL: https://issues.apache.org/jira/browse/YARN-7913
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-7913-branch-3.1.001.patch, 
> YARN-7913-branch-3.1.001.patch, YARN-7913-branch-3.2.001.patch, 
> YARN-7913.000.poc.patch, YARN-7913.001.patch, YARN-7913.002.patch, 
> YARN-7913.003.patch
>
>
> There are edge cases when the application recovery fails with an exception.
> Example failure scenario:
>  * setup: a queue is a leaf queue in the primary RM's config and the same 
> queue is a parent queue in the secondary RM's config.
>  * When failover happens with this setup, the recovery will fail for 
> applications on this queue, and an APP_REJECTED event will be dispatched to 
> the async dispatcher. On the same thread (that handles the recovery), a 
> NullPointerException is thrown when the applicationAttempt is tried to be 
> recovered 
> (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494).
>  I don't see a good way to avoid the NPE in this scenario, because when the 
> NPE occurs the APP_REJECTED has not been processed yet, and we don't know 
> that the application recovery failed.
> Currently the first exception will abort the recovery, and if there are X 
> applications, there will be ~X passive -> active RM transition attempts - the 
> passive -> active RM transition will only succeed when the last APP_REJECTED 
> event is processed on the async dispatcher thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10083) Provide utility to ask whether an application is in final status

2020-01-22 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021213#comment-17021213
 ] 

Hudson commented on YARN-10083:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17892 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17892/])
YARN-10083. Provide utility to ask whether an application is in final (snemeth: 
rev 9520b2ad790bd8527033a03e7ee50da71a85df1d)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppsBlock.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppsBlock.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogToolUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/src/main/java/org/apache/hadoop/yarn/server/timeline/EntityGroupFSTimelineStore.java
* (edit) 
hadoop-tools/hadoop-dynamometer/hadoop-dynamometer-infra/src/main/java/org/apache/hadoop/tools/dynamometer/Client.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/LogServlet.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/LogWebServiceUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java


> Provide utility to ask whether an application is in final status
> 
>
> Key: YARN-10083
> URL: https://issues.apache.org/jira/browse/YARN-10083
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-10083.001.patch, YARN-10083.002.patch, 
> YARN-10083.002.patch, YARN-10083.003.patch, YARN-10083.branch-3.2.001.patch
>
>
> This code part is severely duplicated across the Hadoop repo:
> {code:java}
>   public static boolean isApplicationFinalState(YarnApplicationState 
> appState) {
> return appState == YarnApplicationState.FINISHED
> || appState == YarnApplicationState.FAILED
> || appState == YarnApplicationState.KILLED;
>   }
> {code}
> This functionality is used heavily by the log aggregation as well, so we may 
> do some sanitizing here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10099) FS-CS converter: handle allow-undeclared-pools and user-as-default queue properly



 [ 
https://issues.apache.org/jira/browse/YARN-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10099:

Description: 
Based on the latest documentation, there are two important properties that are 
ignored if we have placement rules:

||Property||Explanation||
|yarn.scheduler.fair.allow-undeclared-pools|If this is true, new queues can be 
created at application submission time, whether because they are specified as 
the application’s queue by the submitter or because they are placed there by 
the user-as-default-queue property. If this is false, any time an app would be 
placed in a queue that is not specified in the allocations file, it is placed 
in the “default” queue instead. Defaults to true. *If a queue placement policy 
is given in the allocations file, this property is ignored.*|
|yarn.scheduler.fair.user-as-default-queue|Whether to use the username 
associated with the allocation as the default queue name, in the event that a 
queue name is not specified. If this is set to “false” or unset, all jobs have 
a shared default queue, named “default”. Defaults to true. *If a queue 
placement policy is given in the allocations file, this property is ignored.*|

Right now these settings affects the conversion regardless of the placement 
rules. 

  was:
Based on the latest documentation, there are two important properties that are 
ignored if we have placement rules:

||Property||Explanation||
|yarn.scheduler.fair.allow-undeclared-pools|If this is true, new queues can be 
created at application submission time, whether because they are specified as 
the application’s queue by the submitter or because they are placed there by 
the user-as-default-queue property. If this is false, any time an app would be 
placed in a queue that is not specified in the allocations file, it is placed 
in the “default” queue instead. Defaults to true. *If a queue placement policy 
is given in the allocations file, this property is ignored.*|
|yarn.scheduler.fair.user-as-default-queue|Whether to use the username 
associated with the allocation as the default queue name, in the event that a 
queue name is not specified. If this is set to “false” or unset, all jobs have 
a shared default queue, named “default”. Defaults to true. *If a queue 
placement policy is given in the allocations file, this property is ignored.*|
| | |

Right now these settings affects the conversion regardless of the placement 
rules. 

 


> FS-CS converter: handle allow-undeclared-pools and user-as-default queue 
> properly
> -
>
> Key: YARN-10099
> URL: https://issues.apache.org/jira/browse/YARN-10099
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Based on the latest documentation, there are two important properties that 
> are ignored if we have placement rules:
> ||Property||Explanation||
> |yarn.scheduler.fair.allow-undeclared-pools|If this is true, new queues can 
> be created at application submission time, whether because they are specified 
> as the application’s queue by the submitter or because they are placed there 
> by the user-as-default-queue property. If this is false, any time an app 
> would be placed in a queue that is not specified in the allocations file, it 
> is placed in the “default” queue instead. Defaults to true. *If a queue 
> placement policy is given in the allocations file, this property is ignored.*|
> |yarn.scheduler.fair.user-as-default-queue|Whether to use the username 
> associated with the allocation as the default queue name, in the event that a 
> queue name is not specified. If this is set to “false” or unset, all jobs 
> have a shared default queue, named “default”. Defaults to true. *If a queue 
> placement policy is given in the allocations file, this property is ignored.*|
> Right now these settings affects the conversion regardless of the placement 
> rules. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10099) FS-CS converter: handle allow-undeclared-pools and user-as-default queue properly

Peter Bacsko created YARN-10099:
---

 Summary: FS-CS converter: handle allow-undeclared-pools and 
user-as-default queue properly
 Key: YARN-10099
 URL: https://issues.apache.org/jira/browse/YARN-10099
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Based on the latest documentation, there are two important properties that are 
ignored if we have placement rules:

||Property||Explanation||
|yarn.scheduler.fair.allow-undeclared-pools|If this is true, new queues can be 
created at application submission time, whether because they are specified as 
the application’s queue by the submitter or because they are placed there by 
the user-as-default-queue property. If this is false, any time an app would be 
placed in a queue that is not specified in the allocations file, it is placed 
in the “default” queue instead. Defaults to true. *If a queue placement policy 
is given in the allocations file, this property is ignored.*|
|yarn.scheduler.fair.user-as-default-queue|Whether to use the username 
associated with the allocation as the default queue name, in the event that a 
queue name is not specified. If this is set to “false” or unset, all jobs have 
a shared default queue, named “default”. Defaults to true. *If a queue 
placement policy is given in the allocations file, this property is ignored.*|
| | |

Right now these settings affects the conversion regardless of the placement 
rules. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10083) Provide utility to ask whether an application is in final status



[ 
https://issues.apache.org/jira/browse/YARN-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021183#comment-17021183
 ] 

Szilard Nemeth commented on YARN-10083:
---

Hi [~adam.antal],
Latest patch LGTM, committed to trunk and branch-3.2
Closing this jira as well.

> Provide utility to ask whether an application is in final status
> 
>
> Key: YARN-10083
> URL: https://issues.apache.org/jira/browse/YARN-10083
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-10083.001.patch, YARN-10083.002.patch, 
> YARN-10083.002.patch, YARN-10083.003.patch, YARN-10083.branch-3.2.001.patch
>
>
> This code part is severely duplicated across the Hadoop repo:
> {code:java}
>   public static boolean isApplicationFinalState(YarnApplicationState 
> appState) {
> return appState == YarnApplicationState.FINISHED
> || appState == YarnApplicationState.FAILED
> || appState == YarnApplicationState.KILLED;
>   }
> {code}
> This functionality is used heavily by the log aggregation as well, so we may 
> do some sanitizing here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10083) Provide utility to ask whether an application is in final status



 [ 
https://issues.apache.org/jira/browse/YARN-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10083:
--
Fix Version/s: 3.2.2
   3.3.0

> Provide utility to ask whether an application is in final status
> 
>
> Key: YARN-10083
> URL: https://issues.apache.org/jira/browse/YARN-10083
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-10083.001.patch, YARN-10083.002.patch, 
> YARN-10083.002.patch, YARN-10083.003.patch, YARN-10083.branch-3.2.001.patch
>
>
> This code part is severely duplicated across the Hadoop repo:
> {code:java}
>   public static boolean isApplicationFinalState(YarnApplicationState 
> appState) {
> return appState == YarnApplicationState.FINISHED
> || appState == YarnApplicationState.FAILED
> || appState == YarnApplicationState.KILLED;
>   }
> {code}
> This functionality is used heavily by the log aggregation as well, so we may 
> do some sanitizing here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9462) TestResourceTrackerService.testNodeRemovalGracefully fails sporadically



[ 
https://issues.apache.org/jira/browse/YARN-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021138#comment-17021138
 ] 

Szilard Nemeth commented on YARN-9462:
--

Thanks [~prabhujoseph], 
Pushed 3.2 patch to branch-3.2 as well.

> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> ---
>
> Key: YARN-9462
> URL: https://issues.apache.org/jira/browse/YARN-9462
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: 
> TestResourceTrackerService.testNodeRemovalGracefully.txt, 
> YARN-9462-001.patch, YARN-9462-branch-3.2.001.patch
>
>
> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> {code}
> [ERROR] 
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 3.385 s  <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtilDecomToUntracked(TestResourceTrackerService.java:2318)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtil(TestResourceTrackerService.java:2280)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalGracefully(TestResourceTrackerService.java:2133)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9462) TestResourceTrackerService.testNodeRemovalGracefully fails sporadically



 [ 
https://issues.apache.org/jira/browse/YARN-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9462:
-
Fix Version/s: 3.2.2

> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> ---
>
> Key: YARN-9462
> URL: https://issues.apache.org/jira/browse/YARN-9462
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Fix For: 3.3.0, 3.2.2
>
> Attachments: 
> TestResourceTrackerService.testNodeRemovalGracefully.txt, 
> YARN-9462-001.patch, YARN-9462-branch-3.2.001.patch
>
>
> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> {code}
> [ERROR] 
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 3.385 s  <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtilDecomToUntracked(TestResourceTrackerService.java:2318)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtil(TestResourceTrackerService.java:2280)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalGracefully(TestResourceTrackerService.java:2133)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-22 Thread Manikandan R (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9768:
---
Attachment: YARN-9768.009.patch

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch, 
> YARN-9768.009.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-22 Thread Manikandan R (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021106#comment-17021106
 ] 

Manikandan R commented on YARN-9768:


Rebased the patch. Can you please take it forward?

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch, 
> YARN-9768.009.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception



[ 
https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021104#comment-17021104
 ] 

Hadoop QA commented on YARN-7913:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 19m  
8s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
29s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
22s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} branch-3.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 71m 
20s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}146m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:70a0ef5d4a6 |
| JIRA Issue | YARN-7913 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991516/YARN-7913-branch-3.1.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 053237d4c20a 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.1 / 96c653d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25417/testReport/ |
| Max. process+thread count | 810 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25417/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



>

[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check



[ 
https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021098#comment-17021098
 ] 

Hadoop QA commented on YARN-10085:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 
14s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}143m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | YARN-10085 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991512/YARN-10085-005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux d4b3f013bf68 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d40d7cc |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25416/testReport/ |
| Max. process+thread count | 820 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25416/console |
|

[jira] [Commented] (YARN-4575) ApplicationResourceUsageReport should return ALL reserved resource



[ 
https://issues.apache.org/jira/browse/YARN-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021066#comment-17021066
 ] 

Hadoop QA commented on YARN-4575:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 12s{color} 
| {color:red} YARN-4575 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-4575 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12782086/0002-YARN-4575.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25419/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> ApplicationResourceUsageReport should return ALL  reserved resource
> ---
>
> Key: YARN-4575
> URL: https://issues.apache.org/jira/browse/YARN-4575
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin Chundatt
>Priority: Major
>  Labels: oct16-easy
> Attachments: 0001-YARN-4575.patch, 0002-YARN-4575.patch
>
>
> ApplicationResourceUsageReport reserved resource report  is only of default 
> parition should be of all partitions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10098) AppPlacementAllocator getPreferredNodeIterator based on scheduler key



 [ 
https://issues.apache.org/jira/browse/YARN-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin Chundatt updated YARN-10098:
--
Summary:  AppPlacementAllocator getPreferredNodeIterator based on scheduler 
key  (was:  AppPlacementAllocator get getPreferredNodeIterator based on 
scheduler key)

>  AppPlacementAllocator getPreferredNodeIterator based on scheduler key
> --
>
> Key: YARN-10098
> URL: https://issues.apache.org/jira/browse/YARN-10098
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10098) AppPlacementAllocator get getPreferredNodeIterator based on scheduler key

Bibin Chundatt created YARN-10098:
-

 Summary:  AppPlacementAllocator get getPreferredNodeIterator based 
on scheduler key
 Key: YARN-10098
 URL: https://issues.apache.org/jira/browse/YARN-10098
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bibin Chundatt






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-4575) ApplicationResourceUsageReport should return ALL reserved resource



 [ 
https://issues.apache.org/jira/browse/YARN-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin Chundatt reassigned YARN-4575:


Assignee: (was: Bibin Chundatt)

> ApplicationResourceUsageReport should return ALL  reserved resource
> ---
>
> Key: YARN-4575
> URL: https://issues.apache.org/jira/browse/YARN-4575
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin Chundatt
>Priority: Major
>  Labels: oct16-easy
> Attachments: 0001-YARN-4575.patch, 0002-YARN-4575.patch
>
>
> ApplicationResourceUsageReport reserved resource report  is only of default 
> parition should be of all partitions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7913) Improve error handling when application recovery fails with exception



[ 
https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020988#comment-17020988
 ] 

Szilard Nemeth commented on YARN-7913:
--

Thanks [~wilfreds], 
Makes sense.
Triggered build for branch-3.1 patch, also reuploaded the patch so Jenkins will 
pick that up instead of 3.2 patch.

> Improve error handling when application recovery fails with exception
> -
>
> Key: YARN-7913
> URL: https://issues.apache.org/jira/browse/YARN-7913
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-7913-branch-3.1.001.patch, 
> YARN-7913-branch-3.1.001.patch, YARN-7913-branch-3.2.001.patch, 
> YARN-7913.000.poc.patch, YARN-7913.001.patch, YARN-7913.002.patch, 
> YARN-7913.003.patch
>
>
> There are edge cases when the application recovery fails with an exception.
> Example failure scenario:
>  * setup: a queue is a leaf queue in the primary RM's config and the same 
> queue is a parent queue in the secondary RM's config.
>  * When failover happens with this setup, the recovery will fail for 
> applications on this queue, and an APP_REJECTED event will be dispatched to 
> the async dispatcher. On the same thread (that handles the recovery), a 
> NullPointerException is thrown when the applicationAttempt is tried to be 
> recovered 
> (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494).
>  I don't see a good way to avoid the NPE in this scenario, because when the 
> NPE occurs the APP_REJECTED has not been processed yet, and we don't know 
> that the application recovery failed.
> Currently the first exception will abort the recovery, and if there are X 
> applications, there will be ~X passive -> active RM transition attempts - the 
> passive -> active RM transition will only succeed when the last APP_REJECTED 
> event is processed on the async dispatcher thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7913) Improve error handling when application recovery fails with exception



 [ 
https://issues.apache.org/jira/browse/YARN-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-7913:
-
Attachment: YARN-7913-branch-3.1.001.patch

> Improve error handling when application recovery fails with exception
> -
>
> Key: YARN-7913
> URL: https://issues.apache.org/jira/browse/YARN-7913
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Gergo Repas
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-7913-branch-3.1.001.patch, 
> YARN-7913-branch-3.1.001.patch, YARN-7913-branch-3.2.001.patch, 
> YARN-7913.000.poc.patch, YARN-7913.001.patch, YARN-7913.002.patch, 
> YARN-7913.003.patch
>
>
> There are edge cases when the application recovery fails with an exception.
> Example failure scenario:
>  * setup: a queue is a leaf queue in the primary RM's config and the same 
> queue is a parent queue in the secondary RM's config.
>  * When failover happens with this setup, the recovery will fail for 
> applications on this queue, and an APP_REJECTED event will be dispatched to 
> the async dispatcher. On the same thread (that handles the recovery), a 
> NullPointerException is thrown when the applicationAttempt is tried to be 
> recovered 
> (https://github.com/apache/hadoop/blob/55066cc53dc22b68f9ca55a0029741d6c846be0a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L494).
>  I don't see a good way to avoid the NPE in this scenario, because when the 
> NPE occurs the APP_REJECTED has not been processed yet, and we don't know 
> that the application recovery failed.
> Currently the first exception will abort the recovery, and if there are X 
> applications, there will be ~X passive -> active RM transition attempts - the 
> passive -> active RM transition will only succeed when the last APP_REJECTED 
> event is processed on the async dispatcher thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check



[ 
https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020965#comment-17020965
 ] 

Peter Bacsko commented on YARN-10085:
-

Fixed checkstyle in patch v5.

> FS-CS converter: remove mixed ordering policy check
> ---
>
> Key: YARN-10085
> URL: https://issues.apache.org/jira/browse/YARN-10085
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-10085-001.patch, YARN-10085-002.patch, 
> YARN-10085-003.patch, YARN-10085-004.patch, YARN-10085-004.patch, 
> YARN-10085-005.patch
>
>
> In the converter, this part is very strict and probably unnecessary:
> {noformat}
> // Validate ordering policy
> if (queueConverter.isDrfPolicyUsedOnQueueLevel()) {
>   if (queueConverter.isFifoOrFairSharePolicyUsed()) {
> throw new ConversionException(
> "DRF ordering policy cannot be used together with fifo/fair");
>   } else {
> capacitySchedulerConfig.set(
> CapacitySchedulerConfiguration.RESOURCE_CALCULATOR_CLASS,
> DominantResourceCalculator.class.getCanonicalName());
>   }
> }
> {noformat}
> It's also misleading, because Fair policy can be used under DRF, so the error 
> message is incorrect.
> Let's remove these checks and rewrite the converter in a way that it 
> generates a valid config even if fair/drf is somehow mixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10085) FS-CS converter: remove mixed ordering policy check



 [ 
https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10085:

Attachment: YARN-10085-005.patch

> FS-CS converter: remove mixed ordering policy check
> ---
>
> Key: YARN-10085
> URL: https://issues.apache.org/jira/browse/YARN-10085
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-10085-001.patch, YARN-10085-002.patch, 
> YARN-10085-003.patch, YARN-10085-004.patch, YARN-10085-004.patch, 
> YARN-10085-005.patch
>
>
> In the converter, this part is very strict and probably unnecessary:
> {noformat}
> // Validate ordering policy
> if (queueConverter.isDrfPolicyUsedOnQueueLevel()) {
>   if (queueConverter.isFifoOrFairSharePolicyUsed()) {
> throw new ConversionException(
> "DRF ordering policy cannot be used together with fifo/fair");
>   } else {
> capacitySchedulerConfig.set(
> CapacitySchedulerConfiguration.RESOURCE_CALCULATOR_CLASS,
> DominantResourceCalculator.class.getCanonicalName());
>   }
> }
> {noformat}
> It's also misleading, because Fair policy can be used under DRF, so the error 
> message is incorrect.
> Let's remove these checks and rewrite the converter in a way that it 
> generates a valid config even if fair/drf is somehow mixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check