[jira] [Commented] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery

2017-02-07 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857587#comment-15857587
 ] 

Sunil G commented on YARN-6031:
---

Thanks [~Ying Zhang] for the clarification. Makes sense for me.

New ticket could be raised to track test case improvement. I will wait for 
jenkins to branch-2.8 patch.

> Application recovery has failed when node label feature is turned off during 
> RM recovery
> 
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
> Attachments: YARN-6031.001.patch, YARN-6031.002.patch, 
> YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, 
> YARN-6031.006.patch, YARN-6031.007.patch, YARN-6031-branch-2.8.001.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had 
> node label expression specified while node label has been disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery

2017-02-07 Thread Ying Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Zhang updated YARN-6031:
-
Attachment: (was: YARN-6031-branch-2.8.001.patch)

> Application recovery has failed when node label feature is turned off during 
> RM recovery
> 
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
> Attachments: YARN-6031.001.patch, YARN-6031.002.patch, 
> YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, 
> YARN-6031.006.patch, YARN-6031.007.patch, YARN-6031-branch-2.8.001.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had 
> node label expression specified while node label has been disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery

2017-02-07 Thread Ying Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Zhang updated YARN-6031:
-
Attachment: YARN-6031-branch-2.8.001.patch

> Application recovery has failed when node label feature is turned off during 
> RM recovery
> 
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
> Attachments: YARN-6031.001.patch, YARN-6031.002.patch, 
> YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, 
> YARN-6031.006.patch, YARN-6031.007.patch, YARN-6031-branch-2.8.001.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had 
> node label expression specified while node label has been disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6125) The application attempt's diagnostic message should have a maximum size

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857576#comment-15857576
 ] 

Hadoop QA commented on YARN-6125:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
48s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 50s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 316 unchanged - 2 fixed = 317 total (was 318) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
34s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
37s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 44s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}103m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6125 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12851552/YARN-6125.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  xml  |
| uname | Linux 61b1acf0a7b5 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 2007e0c |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| checkstyle | 

[jira] [Comment Edited] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery

2017-02-07 Thread Ying Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857562#comment-15857562
 ] 

Ying Zhang edited comment on YARN-6031 at 2/8/17 7:18 AM:
--

I'm thinking it is a separate question. No matter we backport YARN-4805 or not, 
the test case itself can be improved to avoid running with FairScheduler:-) 


was (Author: ying zhang):
I'm thinking it is a separate question. No matter we backport YARN-4805 or not, 
the test case itself should be improved to avoid running with FairScheduler:-) 

> Application recovery has failed when node label feature is turned off during 
> RM recovery
> 
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
> Attachments: YARN-6031.001.patch, YARN-6031.002.patch, 
> YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, 
> YARN-6031.006.patch, YARN-6031.007.patch, YARN-6031-branch-2.8.001.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had 
> node label expression specified while node label has been disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery

2017-02-07 Thread Ying Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857562#comment-15857562
 ] 

Ying Zhang commented on YARN-6031:
--

I'm thinking it is a separate question. No matter we backport YARN-4805 or not, 
the test case itself should be improved to avoid running with FairScheduler:-) 

> Application recovery has failed when node label feature is turned off during 
> RM recovery
> 
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
> Attachments: YARN-6031.001.patch, YARN-6031.002.patch, 
> YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, 
> YARN-6031.006.patch, YARN-6031.007.patch, YARN-6031-branch-2.8.001.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had 
> node label expression specified while node label has been disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery

2017-02-07 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857547#comment-15857547
 ] 

Sunil G commented on YARN-6031:
---

Or do we need to backport YARN-4805 to branch-2.8 ?

> Application recovery has failed when node label feature is turned off during 
> RM recovery
> 
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
> Attachments: YARN-6031.001.patch, YARN-6031.002.patch, 
> YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, 
> YARN-6031.006.patch, YARN-6031.007.patch, YARN-6031-branch-2.8.001.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had 
> node label expression specified while node label has been disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6150) TestContainerManagerSecurity tests for Yarn Server are flakey

2017-02-07 Thread Daniel Sturman (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Sturman updated YARN-6150:
-
Attachment: YARN-6150.005.patch

> TestContainerManagerSecurity tests for Yarn Server are flakey
> -
>
> Key: YARN-6150
> URL: https://issues.apache.org/jira/browse/YARN-6150
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Daniel Sturman
>Assignee: Daniel Sturman
> Attachments: YARN-6150.001.patch, YARN-6150.002.patch, 
> YARN-6150.003.patch, YARN-6150.004.patch, YARN-6150.005.patch
>
>
> Repeated runs of 
> {{org.apache.hadoop.yarn.server.TestContainerManagedSecurity}} can either 
> pass or fail on repeated runs on the same codebase.  Also, the two runs (one 
> in secure mode, one without security) aren't well labeled in JUnit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5912) [YARN-3368] Fix breadcrumb issues in new YARN UI

2017-02-07 Thread Akhil PB (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-5912:
---
Summary: [YARN-3368] Fix breadcrumb issues in new YARN UI  (was: 
[YARN-3368] Fix breadcrumb issues in yarn-node page in new YARN UI)

> [YARN-3368] Fix breadcrumb issues in new YARN UI
> 
>
> Key: YARN-5912
> URL: https://issues.apache.org/jira/browse/YARN-5912
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-ui-v2
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Minor
> Attachments: YARN-5912.001.patch
>
>
> Fix breadcrumb issues in yarn-node-app and yarn-node-container pages in new 
> YARN UI.
> Fix breadcrumb issues in application page as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5912) [YARN-3368] Fix breadcrumb issues in yarn-node page in new YARN UI

2017-02-07 Thread Akhil PB (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-5912:
---
Description: 
Fix breadcrumb issues in yarn-node-app and yarn-node-container pages in new 
YARN UI.
Fix breadcrumb issues in application page as well.


  was:
Fix breadcrumb issues in yarn-node-app and yarn-node-container pages in new 
YARN UI.



> [YARN-3368] Fix breadcrumb issues in yarn-node page in new YARN UI
> --
>
> Key: YARN-5912
> URL: https://issues.apache.org/jira/browse/YARN-5912
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-ui-v2
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Minor
> Attachments: YARN-5912.001.patch
>
>
> Fix breadcrumb issues in yarn-node-app and yarn-node-container pages in new 
> YARN UI.
> Fix breadcrumb issues in application page as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857387#comment-15857387
 ] 

Hadoop QA commented on YARN-6151:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
26s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
18s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
24s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 19 unchanged - 1 fixed = 19 total (was 20) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m  6s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_121. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}175m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_121 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_121 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5af2af1 |
| JIRA 

[jira] [Updated] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery

2017-02-07 Thread Ying Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Zhang updated YARN-6031:
-
Attachment: YARN-6031-branch-2.8.001.patch

> Application recovery has failed when node label feature is turned off during 
> RM recovery
> 
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
> Attachments: YARN-6031.001.patch, YARN-6031.002.patch, 
> YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, 
> YARN-6031.006.patch, YARN-6031.007.patch, YARN-6031-branch-2.8.001.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had 
> node label expression specified while node label has been disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery

2017-02-07 Thread Ying Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Zhang updated YARN-6031:
-
Attachment: (was: YARN-6031.branch-2.8.001.patch)

> Application recovery has failed when node label feature is turned off during 
> RM recovery
> 
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
> Attachments: YARN-6031.001.patch, YARN-6031.002.patch, 
> YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, 
> YARN-6031.006.patch, YARN-6031.007.patch, YARN-6031-branch-2.8.001.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had 
> node label expression specified while node label has been disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery

2017-02-07 Thread Ying Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Zhang updated YARN-6031:
-
Attachment: YARN-6031.branch-2.8.001.patch

> Application recovery has failed when node label feature is turned off during 
> RM recovery
> 
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
> Attachments: YARN-6031.001.patch, YARN-6031.002.patch, 
> YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, 
> YARN-6031.006.patch, YARN-6031.007.patch, YARN-6031.branch-2.8.001.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had 
> node label expression specified while node label has been disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery

2017-02-07 Thread Ying Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857371#comment-15857371
 ] 

Ying Zhang commented on YARN-6031:
--

Hi [~sunilg], sorry for the late reply (was out for the Spring Festival 
holiday). Here is the patch for branch-2.8, please have a look.
I've found a problem with the test case when making the patch for branch-2.8. 
TestRMRestart runs all test cases for CapacityScheduler and FairScheduler 
respectively, and this test case can only run successfully for 
CapacityScheduler since it involves running application with node label 
specified. On trunk, we don't see this problem because due to YARN-4805, 
TestRMRestart now only runs for CapacityScheduler. I've modified the test case 
a little bit to just run when it is CapacityScheduler.
{code}
  public void testRMRestartAfterNodeLabelDisabled() throws Exception {
// Skip this test case if it is not CapacityScheduler since NodeLabel is
// not fully supported yet for FairScheduler and others.
if (!getSchedulerType().equals(SchedulerType.CAPACITY)) {
  return;
}
...
{code}
We should probably make this change to trunk too. Let me know you want to make 
the change through this JIRA, or I need to open another JIRA to address it?

> Application recovery has failed when node label feature is turned off during 
> RM recovery
> 
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
> Attachments: YARN-6031.001.patch, YARN-6031.002.patch, 
> YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, 
> YARN-6031.006.patch, YARN-6031.007.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had 
> node label expression specified while node label has been disabled.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6125) The application attempt's diagnostic message should have a maximum size

2017-02-07 Thread Andras Piros (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Piros updated YARN-6125:
---
Attachment: YARN-6125.005.patch

Refactored new functionality to nested class 
{{RMAppAttemptImpl.BoundedAppender}}, got rid of the unnecessary stuff.

> The application attempt's diagnostic message should have a maximum size
> ---
>
> Key: YARN-6125
> URL: https://issues.apache.org/jira/browse/YARN-6125
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Daniel Templeton
>Assignee: Andras Piros
>Priority: Critical
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6125.000.patch, YARN-6125.001.patch, 
> YARN-6125.002.patch, YARN-6125.003.patch, YARN-6125.004.patch, 
> YARN-6125.005.patch
>
>
> We've found through experience that the diagnostic message can grow 
> unbounded.  I've seen attempts that have diagnostic messages over 1MB.  Since 
> the message is stored in the state store, it's a bad idea to allow the 
> message to grow unbounded.  Instead, there should be a property that sets a 
> maximum size on the message.
> I suspect that some of the ZK state store issues we've seen in the past were 
> due to the size of the diagnostic messages and not to the size of the 
> classpath, as is the current prevailing opinion.
> An open question is how best to prune the message once it grows too large.  
> Should we
> # truncate the tail,
> # truncate the head,
> # truncate the middle,
> # add another property to make the behavior selectable, or
> # none of the above?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6042) Fairscheduler: Dump scheduler state in log

2017-02-07 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857340#comment-15857340
 ] 

Wilfred Spiegelenburg commented on YARN-6042:
-

I looked at the changes and it will help debugging the FS a lot when we get 
this into a release

A couple of things:
# In the FairScheduler change you add a new method {{dumpSchedulerState()}} why 
are you not passing in the rootQueue to the method? It safes getting it again 
since you have already got it the update method.
# I am missing one number for the applications in the {{dumpStateInternal()}} 
for the FSLeafQueue: {{getNumPendingApps()}} or {{getNumActiveApps()}}. We need 
to have one of those to have a full view of what the application state is in 
the queue.
# We add the LastTimeAtMinShare but not the LastTimeAtFairShare for the leaf 
queue as per: {{getLastTimeAtFairShareThreshold()}}

I am also a bit worried about the test: in the output we build the debug string 
and get the time in milliseconds for the LastTimeAtMinShare. What if the 
{{updateStarvationStats()}} call was run 1 millisecond earlier than the debug 
string was build? The comparison would fail and the test would fail because of 
that. I don't think we can guarantee that those two calls will be in the same 
millisecond.

> Fairscheduler: Dump scheduler state in log
> --
>
> Key: YARN-6042
> URL: https://issues.apache.org/jira/browse/YARN-6042
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6042.001.patch, YARN-6042.002.patch
>
>
> To improve the debugging of scheduler issues it would be a big improvement to 
> be able to dump the scheduler state into a log on request. 
> The Dump the scheduler state at a point in time would allow debugging of a 
> scheduler that is not hung (deadlocked) but also not assigning containers. 
> Currently we do not have a proper overview of what state the scheduler and 
> the queues are in and we have to make assumptions or guess
> The scheduler and queue state needed would include (not exhaustive):
> - instantaneous and steady fair share (app / queue)
> - AM share and resources
> - weight
> - app demand
> - application run state (runnable/non runnable)
> - last time at fair/min share



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6137) Yarn client implicitly invoke ATS client which accesses HDFS

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857325#comment-15857325
 ] 

Hadoop QA commented on YARN-6137:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 16s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: The patch generated 1 new + 
81 unchanged - 11 fixed = 82 total (was 92) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m  
1s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 39m 29s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6137 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12851518/YARN-6137-trunk.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 77475e2b0c12 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / a87e385 |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/14856/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14856/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14856/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Yarn client implicitly invoke ATS client which accesses HDFS
> 
>
> Key: YARN-6137
> URL: https://issues.apache.org/jira/browse/YARN-6137
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: 

[jira] [Commented] (YARN-6144) FairScheduler: preempted resources can become negative

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857288#comment-15857288
 ] 

Hadoop QA commented on YARN-6144:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 201 unchanged - 1 fixed = 202 total (was 202) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 45m  0s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 69m 41s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6144 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12851496/YARN-6144.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux fcf28fb8c50d 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / a87e385 |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/14854/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/14854/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14854/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 

[jira] [Assigned] (YARN-5501) Container Pooling in YARN

2017-02-07 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma reassigned YARN-5501:
---

Assignee: Hitesh Sharma

> Container Pooling in YARN
> -
>
> Key: YARN-5501
> URL: https://issues.apache.org/jira/browse/YARN-5501
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
> Attachments: Container Pooling - one pager.pdf
>
>
> This JIRA proposes a method for reducing the container launch latency in 
> YARN. It introduces a notion of pooling *Unattached Pre-Initialized 
> Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create 
> these unattached containers.
> * The NM would then advertise these containers as special resource types 
> (this should be possible via YARN-3926).
> * When a start container request is received by the node manager for 
> launching a container requesting this specific type of resource, it will take 
> one of these unattached pre-initialized containers from the pool, and use it 
> to service the container request.
> * Once the request is complete, the pre-initialized container would be 
> released and ready to serve another request.
> This capability would help reduce container launch latencies and thereby 
> allow for development of more interactive applications on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6027) Improve /flows API for more flexible filters fromid, collapse, userid

2017-02-07 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857267#comment-15857267
 ] 

Rohith Sharma K S commented on YARN-6027:
-

Thanks [~varun_saxena] for the review.. 
bq.Do we need cluster ID in fromId because we are ignoring it completely?
Yes, it is required even though it is ignored, considering when fromId is being 
used. Do not want user to parse something and provide it as fromId. User can 
directly provide flow entity ID as fromId. Lets reader server handles it. 
Cluster Id check can be done to verify context cluster and from clusterId are 
equal. Ideally both should match. Otherwise we can throw exception.

bq. If there is a / in cluster ID we may have to escape it to avoid parsing 
errors.
If need to parse the errors, then why flow entity id is providing full row key 
as id? I think need to change flow entity id format itself. 

bq. If we use collapse, even with fromId, there seems to be a full table scan 
which will impact
Yes, it does table scan. But it is expected to collapse with date range 
otherwise default behavior of  /flows should be changed to give one day flows 
rather than full table data. It is a engineering issue, and may be can mention 
like performance will be bit slow. 

bq. Maybe we can send the last real ID in info field of last flow activity 
entity if previous query was made with collapse field
Initially idea was to send last real id as fromId field info. But flows are 
stored per day for each user which not useful. Note that when collapse is used, 
we must scan to get all entities and apply fromId. Scanning can't be done half 
the way which end up in redundant entries for the user. Given previous comment 
is satisfied this should not be an issue. 

bq. you have mentioned that fromId validation is happening in getResult method. 
Could not find it
ahh, I think I have missed it at global level. I have validating in one 
condition. Will validate at global level.

bq. In processResults we first get the result from backend while applying limit 
and then process result for collapse and fromId filters.
If you look at the patch, I have removed PageFilter while scanning which gives 
all the data. One optimization I can do is PageFilter can be applied in 
non-collapse mode because in non collapse mode scanning will start from given 
fromId. But the same logic can not be used for collapse mode. 

> Improve /flows API for more flexible filters fromid, collapse, userid
> -
>
> Key: YARN-6027
> URL: https://issues.apache.org/jira/browse/YARN-6027
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-6027-YARN-5355.0001.patch
>
>
> In YARN-5585 , fromId is supported for retrieving entities. We need similar 
> filter for flows/flowRun apps and flow run and flow as well. 
> Along with supporting fromId, this JIRA should also discuss following points
> * Should we throw an exception for entities/entity retrieval if duplicates 
> found?
> * TimelieEntity :
> ** Should equals method also check for idPrefix?
> ** Does idPrefix is part of identifiers?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6135) Node manager REST API documentation is not up to date

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857250#comment-15857250
 ] 

Hudson commented on YARN-6135:
--

ABORTED: Integrated in Jenkins build Hadoop-trunk-Commit #11220 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11220/])
YARN-6135. Node manager REST API documentation is not up to date (templedf: rev 
a87e3850b9240900bca295ad9c4b2e49d0068e98)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md


> Node manager REST API documentation is not up to date
> -
>
> Key: YARN-6135
> URL: https://issues.apache.org/jira/browse/YARN-6135
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, restapi
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Trivial
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6135.000.patch, YARN-6135.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5258) Document Use of Docker with LinuxContainerExecutor

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857251#comment-15857251
 ] 

Hudson commented on YARN-5258:
--

ABORTED: Integrated in Jenkins build Hadoop-trunk-Commit #11220 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11220/])
YARN-5258. Document Use of Docker with LinuxContainerExecutor (sidharta: rev 
307ec58ffb9ab0bc0d58e69a98767a232e458c22)
* (edit) hadoop-project/src/site/site.xml
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md


> Document Use of Docker with LinuxContainerExecutor
> --
>
> Key: YARN-5258
> URL: https://issues.apache.org/jira/browse/YARN-5258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-easy
> Attachments: YARN-5258.001.patch, YARN-5258.002.patch, 
> YARN-5258.003.patch, YARN-5258.004.patch, YARN-5258.005.patch
>
>
> There aren't currently any docs that explain how to configure Docker and all 
> of its various options aside from reading all of the JIRAs.  We need to 
> document the configuration, use, and troubleshooting, along with helpful 
> examples.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6153) keepContainer does not work when AM retry window is set

2017-02-07 Thread kyungwan nam (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-6153:
---
Attachment: YARN-6153.001.patch

if maybeLastAttempt in RMAppAttemptImpl is true, keepContainers is always 
ignored.
but, after AM reset window time, it is no longer the last attempt.

I'm attaching a patch.
if the last attempt is aged as longer than AM reset window time, the 
keepContainers will be kept.

> keepContainer does not work when AM retry window is set
> ---
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: kyungwan nam
> Attachments: YARN-6153.001.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=30.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6158) FairScheduler: app usage can go to negative

2017-02-07 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6158:


 Summary: FairScheduler: app usage can go to negative
 Key: YARN-6158
 URL: https://issues.apache.org/jira/browse/YARN-6158
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 3.0.0-alpha2
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


FiCaSchedulerApp.containerCompleted checks, if the container being completed is 
in the active list:
{code}
  // Remove from the list of containers
  if (null == liveContainers.remove(containerId)) {
return false;
  }
{code}
Fair scheduler should do the same, otherwise multiple different container close 
events leave the application with negative resource usage in 
{{queue.getMetrics().releaseResources}} and {{attemptResourceUsage.decUsed}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6151:
---
Attachment: YARN-6151.branch-2.8.003.patch

> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch, 
> YARN-6151.branch-2.8.002.patch, YARN-6151.branch-2.8.003.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6112) fsOpDurations.addUpdateCallDuration() should be independent to LOG level

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857219#comment-15857219
 ] 

Hadoop QA commented on YARN-6112:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
30s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} branch-2 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
19s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} branch-2 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 44s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_121. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}105m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_121 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestRMRestart |
| JDK v1.7.0_121 Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:b59b8b7 |
| JIRA Issue | YARN-6112 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12851493/YARN-6112.branch-2.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 36dae673401c 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 

[jira] [Commented] (YARN-6125) The application attempt's diagnostic message should have a maximum size

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857205#comment-15857205
 ] 

Hadoop QA commented on YARN-6125:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 47s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 2 new + 316 unchanged - 2 fixed = 318 total (was 318) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
14s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
31s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
30s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
26s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 41s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 99m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
|  |  Format string should use %n rather than n in 
org.apache.hadoop.yarn.util.BoundedAppender.toString()  At 
BoundedAppender.java:rather than n in 
org.apache.hadoop.yarn.util.BoundedAppender.toString()  At 
BoundedAppender.java:[line 259] |
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6125 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12851489/YARN-6125.004.patch |
| 

[jira] [Updated] (YARN-5501) Container Pooling in YARN

2017-02-07 Thread Hitesh Sharma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Sharma updated YARN-5501:

Attachment: Container Pooling - one pager.pdf

> Container Pooling in YARN
> -
>
> Key: YARN-5501
> URL: https://issues.apache.org/jira/browse/YARN-5501
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
> Attachments: Container Pooling - one pager.pdf
>
>
> This JIRA proposes a method for reducing the container launch latency in 
> YARN. It introduces a notion of pooling *Unattached Pre-Initialized 
> Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create 
> these unattached containers.
> * The NM would then advertise these containers as special resource types 
> (this should be possible via YARN-3926).
> * When a start container request is received by the node manager for 
> launching a container requesting this specific type of resource, it will take 
> one of these unattached pre-initialized containers from the pool, and use it 
> to service the container request.
> * Once the request is complete, the pre-initialized container would be 
> released and ready to serve another request.
> This capability would help reduce container launch latencies and thereby 
> allow for development of more interactive applications on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5501) Container Pooling in YARN

2017-02-07 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857192#comment-15857192
 ] 

Hitesh Sharma commented on YARN-5501:
-

Attaching a one pager design doc to capture some of the details. This is still 
an early draft so appreciate some feedback.

> Container Pooling in YARN
> -
>
> Key: YARN-5501
> URL: https://issues.apache.org/jira/browse/YARN-5501
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
> Attachments: Container Pooling - one pager.pdf
>
>
> This JIRA proposes a method for reducing the container launch latency in 
> YARN. It introduces a notion of pooling *Unattached Pre-Initialized 
> Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create 
> these unattached containers.
> * The NM would then advertise these containers as special resource types 
> (this should be possible via YARN-3926).
> * When a start container request is received by the node manager for 
> launching a container requesting this specific type of resource, it will take 
> one of these unattached pre-initialized containers from the pool, and use it 
> to service the container request.
> * Once the request is complete, the pre-initialized container would be 
> released and ready to serve another request.
> This capability would help reduce container launch latencies and thereby 
> allow for development of more interactive applications on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6157) Inconsistencies in verifying Max Applications

2017-02-07 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-6157:
---

Assignee: Naganarasimha G R

> Inconsistencies in verifying Max Applications
> -
>
> Key: YARN-6157
> URL: https://issues.apache.org/jira/browse/YARN-6157
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> Inconsistencies in verifying Max Applications when the max apps is reduced 
> and either HA is done ow work preserving restart is done.
> # currently Max applications across cluster should not be done for the 
> recovered apps. Seems like currently we are doing it
> #  Max applications for a queue is done @ CapacityScheduler.addApplication 
> which considers sum of Pending and running applications but we add to pending 
> applications in {{CapacityScheduler.addApplicationAttempt -> 
> LeafQueue.addApplicationAttempt}} so between these 2 checks we can activate 
> more apps than what can queue restrict.
> # During recovery of a RMApp, if applicationAttempts are not found then we 
> recover it without recovery false @ {{RMAppImpl.RMAppRecoveredTransition}}, 
> this can lead to failure of apps which were accepted earlier but attempt was 
> not yet created and HA happens when MAX app configuration (for cluster/queue) 
> is modified.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6157) Inconsistencies in verifying Max Applications

2017-02-07 Thread Naganarasimha G R (JIRA)
Naganarasimha G R created YARN-6157:
---

 Summary: Inconsistencies in verifying Max Applications
 Key: YARN-6157
 URL: https://issues.apache.org/jira/browse/YARN-6157
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Naganarasimha G R


Inconsistencies in verifying Max Applications when the max apps is reduced and 
either HA is done ow work preserving restart is done.

# currently Max applications across cluster should not be done for the 
recovered apps. Seems like currently we are doing it
#  Max applications for a queue is done @ CapacityScheduler.addApplication 
which considers sum of Pending and running applications but we add to pending 
applications in {{CapacityScheduler.addApplicationAttempt -> 
LeafQueue.addApplicationAttempt}} so between these 2 checks we can activate 
more apps than what can queue restrict.
# During recovery of a RMApp, if applicationAttempts are not found then we 
recover it without recovery false @ {{RMAppImpl.RMAppRecoveredTransition}}, 
this can lead to failure of apps which were accepted earlier but attempt was 
not yet created and HA happens when MAX app configuration (for cluster/queue) 
is modified.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6137) Yarn client implicitly invoke ATS client which accesses HDFS

2017-02-07 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-6137:

Attachment: YARN-6137-trunk.002.patch

Thanks [~jlowe] for the review! A new patch to address all review comments. 

> Yarn client implicitly invoke ATS client which accesses HDFS
> 
>
> Key: YARN-6137
> URL: https://issues.apache.org/jira/browse/YARN-6137
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Li Lu
> Attachments: YARN-6137-trunk.001.patch, YARN-6137-trunk.002.patch
>
>
> Yarn is implicitly trying to invoke ATS Client even though client does not 
> need it. and ATSClient code is trying to access hdfs. Due to that service is 
> hitting GSS exception. 
> Yarnclient is implicitly creating ats client that tries to access Hdfs.
> All servers that use yarnclient cannot be expected to change to accommodate 
> this behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6135) Node manager REST API documentation is not up to date

2017-02-07 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857148#comment-15857148
 ] 

Miklos Szegedi commented on YARN-6135:
--

Thank you for the review and commit [~templedf]!

> Node manager REST API documentation is not up to date
> -
>
> Key: YARN-6135
> URL: https://issues.apache.org/jira/browse/YARN-6135
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, restapi
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Trivial
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6135.000.patch, YARN-6135.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5258) Document Use of Docker with LinuxContainerExecutor

2017-02-07 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857144#comment-15857144
 ] 

Daniel Templeton commented on YARN-5258:


Thanks, [~sidharta-s]!

> Document Use of Docker with LinuxContainerExecutor
> --
>
> Key: YARN-5258
> URL: https://issues.apache.org/jira/browse/YARN-5258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-easy
> Attachments: YARN-5258.001.patch, YARN-5258.002.patch, 
> YARN-5258.003.patch, YARN-5258.004.patch, YARN-5258.005.patch
>
>
> There aren't currently any docs that explain how to configure Docker and all 
> of its various options aside from reading all of the JIRAs.  We need to 
> document the configuration, use, and troubleshooting, along with helpful 
> examples.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6112) fsOpDurations.addUpdateCallDuration() should be independent to LOG level

2017-02-07 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857137#comment-15857137
 ] 

Karthik Kambatla commented on YARN-6112:


+1 for the branch-2 patch too, pending Jenkins. 

> fsOpDurations.addUpdateCallDuration() should be independent to LOG level
> 
>
> Key: YARN-6112
> URL: https://issues.apache.org/jira/browse/YARN-6112
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6112.001.patch, YARN-6112.002.patch, 
> YARN-6112.003.patch, YARN-6112.branch-2.001.patch
>
>
> In the update thread of Fair Scheduler, the 
> {{fsOpDurations.addUpdateCallDuration()}} records the duration of 
> {{update()}}, it should be independent to LOG level. YARN-4752 put the it 
> inside a {{LOG.isDebugEnabled()}} block. Not sure any particular reason to do 
> that. cc [~kasha]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857129#comment-15857129
 ] 

ASF GitHub Bot commented on YARN-6151:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/188#discussion_r99968314
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
 ---
@@ -242,7 +244,9 @@ public void setFairSharePreemptionThreshold(float 
fairSharePreemptionThreshold)
* @return true if the queue can be preempted
*/
   public boolean canBePreempted() {
-assert parent != null;
+Preconditions.checkNotNull(parent, "Parent queue can't be null since"
--- End diff --

Maybe, we could make this message more clear. "Parent queue is null. Looks 
like we are checking if root can be preempted."

Alternatively, can we make the if check (parent != null && ...)? That way, 
else would capture the null case and things should work fine? 


> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch, 
> YARN-6151.branch-2.8.002.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857130#comment-15857130
 ] 

ASF GitHub Bot commented on YARN-6151:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/188#discussion_r99967958
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
 ---
@@ -2043,6 +2043,78 @@ public void testPreemptionIsNotDelayedToNextRound() 
throws Exception {
 .size());
   }
 
+  @Test
+  public void testPreemptionFilterOutNonPreemptableQueues() throws 
Exception {
--- End diff --

Can we add this test to TestFairSchedulerPreemption instead? 


> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch, 
> YARN-6151.branch-2.8.002.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6125) The application attempt's diagnostic message should have a maximum size

2017-02-07 Thread Andras Piros (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857127#comment-15857127
 ] 

Andras Piros commented on YARN-6125:


[~templedf] it's actually always a good idea to throw out code if the feature 
can remain... but the header message should always change because of the 
current and total character counts appended, as well as the specified 
{{limit}}. So in any case, either in a separate class like {{BoundedAppender}} 
or within {{RMAppAttemptImpl}} we have to take care of that pieces of state.

I'm for keeping the separate class, makes code more readable for me.

> The application attempt's diagnostic message should have a maximum size
> ---
>
> Key: YARN-6125
> URL: https://issues.apache.org/jira/browse/YARN-6125
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Daniel Templeton
>Assignee: Andras Piros
>Priority: Critical
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6125.000.patch, YARN-6125.001.patch, 
> YARN-6125.002.patch, YARN-6125.003.patch, YARN-6125.004.patch
>
>
> We've found through experience that the diagnostic message can grow 
> unbounded.  I've seen attempts that have diagnostic messages over 1MB.  Since 
> the message is stored in the state store, it's a bad idea to allow the 
> message to grow unbounded.  Instead, there should be a property that sets a 
> maximum size on the message.
> I suspect that some of the ZK state store issues we've seen in the past were 
> due to the size of the diagnostic messages and not to the size of the 
> classpath, as is the current prevailing opinion.
> An open question is how best to prune the message once it grows too large.  
> Should we
> # truncate the tail,
> # truncate the head,
> # truncate the middle,
> # add another property to make the behavior selectable, or
> # none of the above?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857120#comment-15857120
 ] 

ASF GitHub Bot commented on YARN-6151:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/188#discussion_r99967724
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
 ---
@@ -236,6 +236,29 @@ public void setFairSharePreemptionThreshold(float 
fairSharePreemptionThreshold)
   }
 
   /**
+   * Recursively check if the queue can be preempted based on whether the
+   * resource usage is greater than fair share.
+   *
+   * @return true if the queue can be preempted
+   */
+  public boolean canBePreempted() {
--- End diff --

Aah, I keep forgetting branch-2.8 was cut years ago. :(


> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch, 
> YARN-6151.branch-2.8.002.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5258) Document Use of Docker with LinuxContainerExecutor

2017-02-07 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857115#comment-15857115
 ] 

Sidharta Seethana commented on YARN-5258:
-

Thanks, [~templedf]. This version looks good to me. Committed to trunk and 
branch-2. 

> Document Use of Docker with LinuxContainerExecutor
> --
>
> Key: YARN-5258
> URL: https://issues.apache.org/jira/browse/YARN-5258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-easy
> Attachments: YARN-5258.001.patch, YARN-5258.002.patch, 
> YARN-5258.003.patch, YARN-5258.004.patch, YARN-5258.005.patch
>
>
> There aren't currently any docs that explain how to configure Docker and all 
> of its various options aside from reading all of the JIRAs.  We need to 
> document the configuration, use, and troubleshooting, along with helpful 
> examples.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6125) The application attempt's diagnostic message should have a maximum size

2017-02-07 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857113#comment-15857113
 ] 

Daniel Templeton commented on YARN-6125:


Thanks for the update, [~andras.piros].  Now that we're just truncating the 
top, do we still need the full bounded appender?  Seems like the RM could just 
truncate the message and reapply the header after each append.  Because we're 
trimming to the exact length, once we start truncating, the additional header 
will make sure every append ends up truncating.

> The application attempt's diagnostic message should have a maximum size
> ---
>
> Key: YARN-6125
> URL: https://issues.apache.org/jira/browse/YARN-6125
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Daniel Templeton
>Assignee: Andras Piros
>Priority: Critical
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6125.000.patch, YARN-6125.001.patch, 
> YARN-6125.002.patch, YARN-6125.003.patch, YARN-6125.004.patch
>
>
> We've found through experience that the diagnostic message can grow 
> unbounded.  I've seen attempts that have diagnostic messages over 1MB.  Since 
> the message is stored in the state store, it's a bad idea to allow the 
> message to grow unbounded.  Instead, there should be a property that sets a 
> maximum size on the message.
> I suspect that some of the ZK state store issues we've seen in the past were 
> due to the size of the diagnostic messages and not to the size of the 
> classpath, as is the current prevailing opinion.
> An open question is how best to prune the message once it grows too large.  
> Should we
> # truncate the tail,
> # truncate the head,
> # truncate the middle,
> # add another property to make the behavior selectable, or
> # none of the above?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6145) Improve log message on fail over

2017-02-07 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857056#comment-15857056
 ] 

Daniel Templeton edited comment on YARN-6145 at 2/8/17 12:28 AM:
-

-I would be nice to move the + for concatenation to the end of the line instead 
of the beginning of the next line, just for consistency.  Otherwise, looks 
good.-

Nevermind.  Long day.  LGTM.  +1


was (Author: templedf):
I would be nice to move the + for concatenation to the end of the line instead 
of the beginning of the next line, just for consistency.  Otherwise, looks good.

> Improve log message on fail over
> 
>
> Key: YARN-6145
> URL: https://issues.apache.org/jira/browse/YARN-6145
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-6145.1.patch
>
>
> On failover, a series of exception stack shown in the log, which is harmless, 
> but confusing to user.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6145) Improve log message on fail over

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857065#comment-15857065
 ] 

Hadoop QA commented on YARN-6145:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
56s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 
53s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 37s{color} | {color:orange} root: The patch generated 1 new + 108 unchanged 
- 0 fixed = 109 total (was 108) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 41s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
16s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}102m 22s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6145 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12850906/YARN-6145.1.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 4b6f82c08ced 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / deb368b |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/14851/artifact/patchprocess/diff-checkstyle-root.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/14851/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
 |
|  Test Results | 

[jira] [Commented] (YARN-6145) Improve log message on fail over

2017-02-07 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857056#comment-15857056
 ] 

Daniel Templeton commented on YARN-6145:


I would be nice to move the + for concatenation to the end of the line instead 
of the beginning of the next line, just for consistency.  Otherwise, looks good.

> Improve log message on fail over
> 
>
> Key: YARN-6145
> URL: https://issues.apache.org/jira/browse/YARN-6145
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-6145.1.patch
>
>
> On failover, a series of exception stack shown in the log, which is harmless, 
> but confusing to user.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857053#comment-15857053
 ] 

Yufei Gu commented on YARN-6151:


Thanks [~kasha] for the review. Uploaded patch 002 which adds a unit test 
having exactly same situation from the JIRA description. 

> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch, 
> YARN-6151.branch-2.8.002.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6151:
---
Attachment: YARN-6151.branch-2.8.002.patch

> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch, 
> YARN-6151.branch-2.8.002.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6145) Improve log message on fail over

2017-02-07 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857048#comment-15857048
 ] 

Junping Du commented on YARN-6145:
--

Patch looks reasonable. Previous warn message for connection failure could be 
unnecessary given we have other layer of retry on top of RPC. 
+1. Will commit it tomorrow if no further comments from others.

> Improve log message on fail over
> 
>
> Key: YARN-6145
> URL: https://issues.apache.org/jira/browse/YARN-6145
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-6145.1.patch
>
>
> On failover, a series of exception stack shown in the log, which is harmless, 
> but confusing to user.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6151:
---
Attachment: (was: YARN-6151.branch-2.8.002.patch)

> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6151:
---
Attachment: YARN-6151.branch-2.8.002.patch

> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857002#comment-15857002
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99953234
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
 ---
@@ -21,10 +21,13 @@
 import java.util.Collection;
 import java.util.Comparator;
 
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
 import org.apache.hadoop.classification.InterfaceAudience.Private;
 import org.apache.hadoop.classification.InterfaceStability.Unstable;
 import org.apache.hadoop.yarn.api.records.Resource;
 import org.apache.hadoop.yarn.server.resourcemanager.resource.ResourceType;
+import 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException;
--- End diff --

Unused import. 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857000#comment-15857000
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99955335
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
 ---
@@ -5096,4 +5097,178 @@ public void testUpdateDemand() throws IOException {
 Resources.equals(bQueue.getDemand(), maxResource));
   }
 
+  @Test
+  public void testSchedulingPolicyViolation() throws IOException {
--- End diff --

TestFairScheduler is awfully long. Can we please add these methods 
elsewhere? TestSchedulingPolicy and TestQueueManager are potential candidates. 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857004#comment-15857004
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99953469
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
 ---
@@ -191,4 +164,12 @@ public abstract boolean checkIfUsageOverFairShare(
   public abstract Resource getHeadroom(Resource queueFairShare,
   Resource queueUsage, Resource maxAvailable);
 
+  /**
+   * Check whether the policy of a child queue are allowed.
--- End diff --

s/are/is


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857001#comment-15857001
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99953850
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
 ---
@@ -175,7 +179,13 @@ public boolean checkIfUsageOverFairShare(Resource 
usage, Resource fairShare) {
   }
 
   @Override
-  public byte getApplicableDepth() {
-return SchedulingPolicy.DEPTH_ANY;
+  public boolean isChildPolicyAllowed(SchedulingPolicy childPolicy) {
+if (childPolicy instanceof DominantResourceFairnessPolicy) {
+  LOG.info("Queue policies can't be " + 
DominantResourceFairnessPolicy.NAME
--- End diff --

s/policies/policy


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857007#comment-15857007
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99955463
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestSchedulingPolicy.java
 ---
@@ -79,66 +79,6 @@ public void testParseSchedulingPolicy()
   }
 
   /**
-   * Trivial tests that make sure
-   * {@link SchedulingPolicy#isApplicableTo(SchedulingPolicy, byte)} works 
as
-   * expected for the possible values of depth
-   * 
-   * @throws AllocationConfigurationException
-   */
-  @Test(timeout = 1000)
-  public void testIsApplicableTo() throws AllocationConfigurationException 
{
--- End diff --

Are all the cases in this test covered by other tests added here? If not, 
can we keep the test, maybe rename it, and capture the cases that are not 
covered? 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857006#comment-15857006
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99954044
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java
 ---
@@ -175,7 +179,13 @@ public boolean checkIfUsageOverFairShare(Resource 
usage, Resource fairShare) {
   }
 
   @Override
-  public byte getApplicableDepth() {
-return SchedulingPolicy.DEPTH_ANY;
+  public boolean isChildPolicyAllowed(SchedulingPolicy childPolicy) {
+if (childPolicy instanceof DominantResourceFairnessPolicy) {
+  LOG.info("Queue policies can't be " + 
DominantResourceFairnessPolicy.NAME
+  + " if the parent policy is " + getName() + ". Please choose "
+  + "other polices for child queues instead.");
--- End diff --

IMO, we should either (1) not say anything about other policies or (2) list 
the policies that are allowed. 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857008#comment-15857008
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99953540
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/SchedulingPolicy.java
 ---
@@ -191,4 +164,12 @@ public abstract boolean checkIfUsageOverFairShare(
   public abstract Resource getHeadroom(Resource queueFairShare,
   Resource queueUsage, Resource maxAvailable);
 
+  /**
+   * Check whether the policy of a child queue are allowed.
+   *
+   * @param childPolicy the policy of child queue
+   */
+  public boolean isChildPolicyAllowed(SchedulingPolicy childPolicy) {
--- End diff --

Like that we are adding a non-abstract method. 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857003#comment-15857003
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99954456
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
 ---
@@ -91,20 +91,22 @@ public FSQueue(String name, FairScheduler scheduler, 
FSParentQueue parent) {
 this.queueEntity = new PrivilegedEntity(EntityType.QUEUE, name);
 this.metrics = FSQueueMetrics.forQueue(getName(), parent, true, 
scheduler.getConf());
 this.parent = parent;
+
setPolicy(scheduler.getAllocationConfiguration().getSchedulingPolicy(name));
 reinit(false);
   }
 
   /**
* Initialize a queue by setting its queue-specific properties and its
-   * metrics.
+   * metrics. This function don't set the policy for queues since there is
--- End diff --

s/function/method - there is one other instance of this in the javadoc

s/don't/does not

Instead of saying there is different logic, can we call out what method 
does that for easier code navigability? And, it might be worth mentioning why 
that logic is separated, either here or at the other method.


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857005#comment-15857005
 ] 

ASF GitHub Bot commented on YARN-4212:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/181#discussion_r99954759
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
 ---
@@ -463,4 +461,33 @@ boolean fitsInMaxShare(Resource additionalResource) {
 }
 return true;
   }
+
+  /**
+   * Recursively check policies for queues in pre-order. Get queue policies
+   * from the allocation file instead of properties of {@link FSQueue} 
objects.
+   * Set the policy for current queue if there is no policy violation for 
its
+   * children.
+   *
+   * @param queueConf allocation configuration
+   * @return true if no policy violation and successfully set polices
+   * for queues; false otherwise
+   */
+  public boolean verifyAndSetPolicyFromConf(AllocationConfiguration 
queueConf) {
--- End diff --

It might be worthwhile to point out the intended caller for this method. 


> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.006.patch, 
> YARN-4212.007.patch, YARN-4212.008.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5271) ATS client doesn't work with Jersey 2 on the classpath

2017-02-07 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856995#comment-15856995
 ] 

Li Lu commented on YARN-5271:
-

Thanks [~cheersyang]. 
bq.  The fix here was trying to alleviate this pain, it prints a warning on 
console and warns user timeline client could not be initialized because of 
dependency issue, more user friendly.
The goad sounds reasonable but I don't think that justifies the behavior to 
catch and swallow an Error. What we can do is to clearly document this behavior 
as a known issue, *suggest* uses to *try* disable timeline services when seeing 
this error, instead of directly assume the root cause of an error? 

> ATS client doesn't work with Jersey 2 on the classpath
> --
>
> Key: YARN-5271
> URL: https://issues.apache.org/jira/browse/YARN-5271
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, timelineserver
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Weiwei Yang
>  Labels: oct16-medium
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: YARN-5271.01.patch, YARN-5271.02.patch, 
> YARN-5271.branch-2.01.patch, YARN-5271-branch-2.8.01.patch
>
>
> see SPARK-15343 : once Jersey 2 is on the CP, you can't instantiate a 
> timeline client, *even if the server is an ATS1.5 server and publishing is 
> via the FS*



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6144) FairScheduler: preempted resources can become negative

2017-02-07 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856989#comment-15856989
 ] 

Miklos Szegedi commented on YARN-6144:
--

Thank you, [~kasha] for the review. I submitted a new patch addressing your 
comments.

> FairScheduler: preempted resources can become negative
> --
>
> Key: YARN-6144
> URL: https://issues.apache.org/jira/browse/YARN-6144
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Blocker
> Attachments: Screen Shot 2017-02-02 at 12.49.14 PM.png, 
> YARN-6144.000.patch, YARN-6144.001.patch, YARN-6144.002.patch
>
>
> {{preemptContainers()}} calls {{trackContainerForPreemption()}} to collect 
> the list of containers and resources that were preempted for an application. 
> Later the list is reduced when {{containerCompleted()}} calls 
> {{untrackContainerForPreemption()}}. The bug is that the resource variable 
> {{preemptedResources}} is subtracted, not just when the container was 
> preempted but also when it has completed successfully. This causes that we 
> return an incorrect value in {{getResourceUsage()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6144) FairScheduler: preempted resources can become negative

2017-02-07 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-6144:
-
Attachment: YARN-6144.002.patch

> FairScheduler: preempted resources can become negative
> --
>
> Key: YARN-6144
> URL: https://issues.apache.org/jira/browse/YARN-6144
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Blocker
> Attachments: Screen Shot 2017-02-02 at 12.49.14 PM.png, 
> YARN-6144.000.patch, YARN-6144.001.patch, YARN-6144.002.patch
>
>
> {{preemptContainers()}} calls {{trackContainerForPreemption()}} to collect 
> the list of containers and resources that were preempted for an application. 
> Later the list is reduced when {{containerCompleted()}} calls 
> {{untrackContainerForPreemption()}}. The bug is that the resource variable 
> {{preemptedResources}} is subtracted, not just when the container was 
> preempted but also when it has completed successfully. This causes that we 
> return an incorrect value in {{getResourceUsage()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6112) fsOpDurations.addUpdateCallDuration() should be independent to LOG level

2017-02-07 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6112:
---
Attachment: YARN-6112.branch-2.001.patch

> fsOpDurations.addUpdateCallDuration() should be independent to LOG level
> 
>
> Key: YARN-6112
> URL: https://issues.apache.org/jira/browse/YARN-6112
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6112.001.patch, YARN-6112.002.patch, 
> YARN-6112.003.patch, YARN-6112.branch-2.001.patch
>
>
> In the update thread of Fair Scheduler, the 
> {{fsOpDurations.addUpdateCallDuration()}} records the duration of 
> {{update()}}, it should be independent to LOG level. YARN-4752 put the it 
> inside a {{LOG.isDebugEnabled()}} block. Not sure any particular reason to do 
> that. cc [~kasha]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6112) fsOpDurations.addUpdateCallDuration() should be independent to LOG level

2017-02-07 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856977#comment-15856977
 ] 

Yufei Gu commented on YARN-6112:


[~kasha], thanks for the review. Uploaded the patch for branch-2.

> fsOpDurations.addUpdateCallDuration() should be independent to LOG level
> 
>
> Key: YARN-6112
> URL: https://issues.apache.org/jira/browse/YARN-6112
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6112.001.patch, YARN-6112.002.patch, 
> YARN-6112.003.patch, YARN-6112.branch-2.001.patch
>
>
> In the update thread of Fair Scheduler, the 
> {{fsOpDurations.addUpdateCallDuration()}} records the duration of 
> {{update()}}, it should be independent to LOG level. YARN-4752 put the it 
> inside a {{LOG.isDebugEnabled()}} block. Not sure any particular reason to do 
> that. cc [~kasha]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5685) Non-embedded HA failover is broken

2017-02-07 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856960#comment-15856960
 ] 

Karthik Kambatla commented on YARN-5685:


My bad. Looked at this patch, but forgot to post my comments. :(

The patch looks good. One nit: I don't think we should lower the visibility or 
stability annotations of fields that are already marked Public-Stable. Can we 
drop those changes and just deprecate them? 

> Non-embedded HA failover is broken
> --
>
> Key: YARN-5685
> URL: https://issues.apache.org/jira/browse/YARN-5685
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-hard
> Attachments: YARN-5685.001.patch, YARN-5685.002.patch, 
> YARN-5685.003.patch
>
>
> If HA is enabled with automatic failover enabled and embedded failover 
> disabled, all RMs all come up in standby state.  To make one of them active, 
> the {{\-\-forcemanual}} flag must be used when manually triggering the state 
> change.  Should the active go down, the standby will not become active and 
> must be manually transitioned with the {{\-\-forcemanual}} flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6135) Node manager REST API documentation is not up to date

2017-02-07 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856959#comment-15856959
 ] 

Daniel Templeton commented on YARN-6135:


LGTM. +1

> Node manager REST API documentation is not up to date
> -
>
> Key: YARN-6135
> URL: https://issues.apache.org/jira/browse/YARN-6135
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, restapi
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Trivial
> Attachments: YARN-6135.000.patch, YARN-6135.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads in RM

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856951#comment-15856951
 ] 

ASF GitHub Bot commented on YARN-6061:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/182#discussion_r99949838
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
 ---
@@ -819,19 +824,39 @@ public void handle(RMFatalEvent event) {
 }
   }
 
-  public void handleTransitionToStandBy() {
-if (rmContext.isHAEnabled()) {
-  try {
-// Transition to standby and reinit active services
-LOG.info("Transitioning RM to Standby mode");
-transitionToStandby(true);
-EmbeddedElector elector = rmContext.getLeaderElectorService();
-if (elector != null) {
-  elector.rejoinElection();
+  /**
+   * Transition to standby in a new thread.
+   */
+  public void handleTransitionToStandByInNewThread() {
+Thread standByTransitionThread =
+new Thread(activeServices.standByTransitionRunnable);
+standByTransitionThread.setName("StandByTransitionThread");
+standByTransitionThread.start();
+  }
+
+  private class StandByTransitionRunnable implements Runnable {
+private AtomicBoolean hasRun = new AtomicBoolean(false);
--- End diff --

Maybe, rename this to hasAlreadyRun? And, again add some javadoc here too? 


> Add a customized uncaughtexceptionhandler for critical threads in RM
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch, YARN-6061.002.patch, 
> YARN-6061.003.patch, YARN-6061.004.patch, YARN-6061.005.patch, 
> YARN-6061.006.patch, YARN-6061.007.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads in RM

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856953#comment-15856953
 ] 

ASF GitHub Bot commented on YARN-6061:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/182#discussion_r99948742
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMCriticalThreadUncaughtExceptionHandler.java
 ---
@@ -0,0 +1,60 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.yarn.server.resourcemanager;
+
+import java.lang.Thread.UncaughtExceptionHandler;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.classification.InterfaceAudience.Public;
+import org.apache.hadoop.classification.InterfaceStability.Evolving;
+import org.apache.hadoop.yarn.conf.HAUtil;
+
+/**
+ * This class either shutdowns {@link ResourceManager} or makes
--- End diff --

- s/shutdowns/shuts down
- s/makes RM transition/ transitions the RM
- s/if any uncaught exception.../if a critical thread throws an uncaught 
exception. 


> Add a customized uncaughtexceptionhandler for critical threads in RM
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch, YARN-6061.002.patch, 
> YARN-6061.003.patch, YARN-6061.004.patch, YARN-6061.005.patch, 
> YARN-6061.006.patch, YARN-6061.007.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads in RM

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856954#comment-15856954
 ] 

ASF GitHub Bot commented on YARN-6061:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/182#discussion_r99949581
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
 ---
@@ -819,19 +824,39 @@ public void handle(RMFatalEvent event) {
 }
   }
 
-  public void handleTransitionToStandBy() {
-if (rmContext.isHAEnabled()) {
-  try {
-// Transition to standby and reinit active services
-LOG.info("Transitioning RM to Standby mode");
-transitionToStandby(true);
-EmbeddedElector elector = rmContext.getLeaderElectorService();
-if (elector != null) {
-  elector.rejoinElection();
+  /**
+   * Transition to standby in a new thread.
+   */
+  public void handleTransitionToStandByInNewThread() {
+Thread standByTransitionThread =
+new Thread(activeServices.standByTransitionRunnable);
+standByTransitionThread.setName("StandByTransitionThread");
+standByTransitionThread.start();
+  }
+
+  private class StandByTransitionRunnable implements Runnable {
--- End diff --

Let us add javadoc for this class, and include details on how we use the 
same runnable.


> Add a customized uncaughtexceptionhandler for critical threads in RM
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch, YARN-6061.002.patch, 
> YARN-6061.003.patch, YARN-6061.004.patch, YARN-6061.005.patch, 
> YARN-6061.006.patch, YARN-6061.007.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads in RM

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856952#comment-15856952
 ] 

ASF GitHub Bot commented on YARN-6061:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/182#discussion_r99949771
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
 ---
@@ -819,19 +824,39 @@ public void handle(RMFatalEvent event) {
 }
   }
 
-  public void handleTransitionToStandBy() {
-if (rmContext.isHAEnabled()) {
-  try {
-// Transition to standby and reinit active services
-LOG.info("Transitioning RM to Standby mode");
-transitionToStandby(true);
-EmbeddedElector elector = rmContext.getLeaderElectorService();
-if (elector != null) {
-  elector.rejoinElection();
+  /**
+   * Transition to standby in a new thread.
+   */
+  public void handleTransitionToStandByInNewThread() {
+Thread standByTransitionThread =
+new Thread(activeServices.standByTransitionRunnable);
+standByTransitionThread.setName("StandByTransitionThread");
+standByTransitionThread.start();
+  }
+
+  private class StandByTransitionRunnable implements Runnable {
+private AtomicBoolean hasRun = new AtomicBoolean(false);
+
+@Override
+public void run() {
+  // Prevent from running again if it has run.
--- End diff --

Add more detail here: "Run this only once, even if multiple threads end up 
triggering this simultaneously."


> Add a customized uncaughtexceptionhandler for critical threads in RM
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch, YARN-6061.002.patch, 
> YARN-6061.003.patch, YARN-6061.004.patch, YARN-6061.005.patch, 
> YARN-6061.006.patch, YARN-6061.007.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6125) The application attempt's diagnostic message should have a maximum size

2017-02-07 Thread Andras Piros (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Piros updated YARN-6125:
---
Attachment: YARN-6125.004.patch

> The application attempt's diagnostic message should have a maximum size
> ---
>
> Key: YARN-6125
> URL: https://issues.apache.org/jira/browse/YARN-6125
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Daniel Templeton
>Assignee: Andras Piros
>Priority: Critical
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6125.000.patch, YARN-6125.001.patch, 
> YARN-6125.002.patch, YARN-6125.003.patch, YARN-6125.004.patch
>
>
> We've found through experience that the diagnostic message can grow 
> unbounded.  I've seen attempts that have diagnostic messages over 1MB.  Since 
> the message is stored in the state store, it's a bad idea to allow the 
> message to grow unbounded.  Instead, there should be a property that sets a 
> maximum size on the message.
> I suspect that some of the ZK state store issues we've seen in the past were 
> due to the size of the diagnostic messages and not to the size of the 
> classpath, as is the current prevailing opinion.
> An open question is how best to prune the message once it grows too large.  
> Should we
> # truncate the tail,
> # truncate the head,
> # truncate the middle,
> # add another property to make the behavior selectable, or
> # none of the above?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6113) re-direct NM Web Service to get container logs for finished applications

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856943#comment-15856943
 ] 

Hadoop QA commented on YARN-6113:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
8s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 45s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 215 unchanged - 2 fixed = 216 total (was 217) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
36s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
40s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
25s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 73m 35s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6113 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12851471/YARN-6113.trunk.v3.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  xml  |
| uname | Linux 77ea684560ae 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / deb368b |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/14850/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
 |
|  Test Results 

[jira] [Commented] (YARN-6112) fsOpDurations.addUpdateCallDuration() should be independent to LOG level

2017-02-07 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856930#comment-15856930
 ] 

Karthik Kambatla commented on YARN-6112:


+1 on the trunk patch. [~yufeigu] - can you post another patch for branch-2 
that keeps the metric, but fixes the original problem reported on this JIRA? 

> fsOpDurations.addUpdateCallDuration() should be independent to LOG level
> 
>
> Key: YARN-6112
> URL: https://issues.apache.org/jira/browse/YARN-6112
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6112.001.patch, YARN-6112.002.patch, 
> YARN-6112.003.patch
>
>
> In the update thread of Fair Scheduler, the 
> {{fsOpDurations.addUpdateCallDuration()}} records the duration of 
> {{update()}}, it should be independent to LOG level. YARN-4752 put the it 
> inside a {{LOG.isDebugEnabled()}} block. Not sure any particular reason to do 
> that. cc [~kasha]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856921#comment-15856921
 ] 

ASF GitHub Bot commented on YARN-6151:
--

Github user flyrain commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/188#discussion_r99944929
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
 ---
@@ -236,6 +236,29 @@ public void setFairSharePreemptionThreshold(float 
fairSharePreemptionThreshold)
   }
 
   /**
+   * Recursively check if the queue can be preempted based on whether the
+   * resource usage is greater than fair share.
+   *
+   * @return true if the queue can be preempted
+   */
+  public boolean canBePreempted() {
--- End diff --

It should be, but allowPreemptionFrom is introduced after 2.8.x. 


> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856917#comment-15856917
 ] 

ASF GitHub Bot commented on YARN-6151:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/188#discussion_r99943541
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
 ---
@@ -2036,10 +2036,10 @@ public void testPreemptionIsNotDelayedToNextRound() 
throws Exception {
 .getLeafQueue("queueA.queueA2", false), clock.getTime());
 assertEquals(3277, toPreempt.getMemorySize());
 
-// verify if the 3 containers required by queueA2 are preempted in the 
same
+// verify if the 4 containers required by queueA2 are preempted in the 
same
 // round
 scheduler.preemptResources(toPreempt);
-assertEquals(3, 
scheduler.getSchedulerApp(app1).getPreemptionContainers()
+assertEquals(4, 
scheduler.getSchedulerApp(app1).getPreemptionContainers()
 .size());
   }
--- End diff --

Can we add a new test that verifies the exact scenario in the JIRA 
description? 


> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856915#comment-15856915
 ] 

ASF GitHub Bot commented on YARN-6151:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/188#discussion_r99942869
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
 ---
@@ -236,6 +236,29 @@ public void setFairSharePreemptionThreshold(float 
fairSharePreemptionThreshold)
   }
 
   /**
+   * Recursively check if the queue can be preempted based on whether the
+   * resource usage is greater than fair share.
+   *
+   * @return true if the queue can be preempted
+   */
+  public boolean canBePreempted() {
+assert parent != null;
--- End diff --

Why is this necessary? 


> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6151) FS Preemption doesn't filter out queues which cannot be preempted

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856916#comment-15856916
 ] 

ASF GitHub Bot commented on YARN-6151:
--

Github user kambatla commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/188#discussion_r99943676
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
 ---
@@ -236,6 +236,29 @@ public void setFairSharePreemptionThreshold(float 
fairSharePreemptionThreshold)
   }
 
   /**
+   * Recursively check if the queue can be preempted based on whether the
+   * resource usage is greater than fair share.
+   *
+   * @return true if the queue can be preempted
+   */
+  public boolean canBePreempted() {
--- End diff --

Should the check of the allowPreemptionFrom flag also be part of this 
method? 


> FS Preemption doesn't filter out queues which cannot be preempted
> -
>
> Key: YARN-6151
> URL: https://issues.apache.org/jira/browse/YARN-6151
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6151.branch-2.8.001.patch
>
>
> This is preemption bug happens before 2.8.0, which also described in 
> YARN-3405.
> Queue hierarchy described as below:
> {noformat}
>   root
>/ \
>queue-1  queue-2   
>   /  \
> queue-1-1 queue-1-2
> {noformat}
> Assume cluster resource is 100 and all queues have same weights.
> # queue-1-1 and queue-2 has apps. Each get 50 usage and 50 fairshare. 
> # When queue-1-2 is active, supposedly it will preempt 25 from queue-1-1, but 
> this doesn't happen because preemption happens top-down, queue-2 could be the 
> preemption candidate as long as queue-2 is less needy than queue-1, and 
> queue-2 doesn't exceed the fair share which means preemption won't happen.
> We need to filter out queue-2 since it isn't a valid candidate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5654) Not be able to run SLS with FairScheduler

2017-02-07 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856886#comment-15856886
 ] 

Yufei Gu commented on YARN-5654:


[~wangda], sure. I'll look into this soon.

> Not be able to run SLS with FairScheduler
> -
>
> Key: YARN-5654
> URL: https://issues.apache.org/jira/browse/YARN-5654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Yufei Gu
> Attachments: YARN-5654.1.patch
>
>
> With the config:
> https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/configs/hadoop-conf-fs
> And data:
> https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/data/scheduler-load-test-data
> Capacity Scheduler runs fine, but Fair Scheduler cannot be successfully run. 
> It reports NPE from RMAppAttemptImpl



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5654) Not be able to run SLS with FairScheduler

2017-02-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856863#comment-15856863
 ] 

Wangda Tan commented on YARN-5654:
--

[~yufeigu],

It is still assigned to you :).

I think I'm not the best person to work on this fix, since I have limited 
experiences to work on FS code base. If you have interests to do this, please 
continue and feel free to make any changes to the attached preliminary fix.

Thanks,

> Not be able to run SLS with FairScheduler
> -
>
> Key: YARN-5654
> URL: https://issues.apache.org/jira/browse/YARN-5654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Yufei Gu
> Attachments: YARN-5654.1.patch
>
>
> With the config:
> https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/configs/hadoop-conf-fs
> And data:
> https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/data/scheduler-load-test-data
> Capacity Scheduler runs fine, but Fair Scheduler cannot be successfully run. 
> It reports NPE from RMAppAttemptImpl



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5654) Not be able to run SLS with FairScheduler

2017-02-07 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856852#comment-15856852
 ] 

Yufei Gu commented on YARN-5654:


Hi [~wangda], sorry for delay reply. I have the interests to work on this. But 
since you've uploaded the patch, feel free to assign to you. I can try to do 
the review, but not the best person to do that. And you are free to assign back 
to me if any further work needed and you don't want to continue. Thanks.

> Not be able to run SLS with FairScheduler
> -
>
> Key: YARN-5654
> URL: https://issues.apache.org/jira/browse/YARN-5654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Yufei Gu
> Attachments: YARN-5654.1.patch
>
>
> With the config:
> https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/configs/hadoop-conf-fs
> And data:
> https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/data/scheduler-load-test-data
> Capacity Scheduler runs fine, but Fair Scheduler cannot be successfully run. 
> It reports NPE from RMAppAttemptImpl



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6113) re-direct NM Web Service to get container logs for finished applications

2017-02-07 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-6113:

Attachment: YARN-6113.trunk.v3.patch

> re-direct NM Web Service to get container logs for finished applications
> 
>
> Key: YARN-6113
> URL: https://issues.apache.org/jira/browse/YARN-6113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-6113.branch-2.v1.patch, 
> YARN-6113.branch-2.v2.patch, YARN-6113.branch-2.v3.patch, 
> YARN-6113.trunk.v2.patch, YARN-6113.trunk.v3.patch
>
>
> In NM web ui, when we try to get container logs for finished application, it 
> would redirect to the log server based on the configuration: 
> yarn.log.server.url. We should do the similar thing for NM WebService



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6113) re-direct NM Web Service to get container logs for finished applications

2017-02-07 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856786#comment-15856786
 ] 

Xuan Gong commented on YARN-6113:
-

Fix the testcase failure and checkstyle issues

> re-direct NM Web Service to get container logs for finished applications
> 
>
> Key: YARN-6113
> URL: https://issues.apache.org/jira/browse/YARN-6113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-6113.branch-2.v1.patch, 
> YARN-6113.branch-2.v2.patch, YARN-6113.branch-2.v3.patch, 
> YARN-6113.trunk.v2.patch, YARN-6113.trunk.v3.patch
>
>
> In NM web ui, when we try to get container logs for finished application, it 
> would redirect to the log server based on the configuration: 
> yarn.log.server.url. We should do the similar thing for NM WebService



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6113) re-direct NM Web Service to get container logs for finished applications

2017-02-07 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-6113:

Attachment: YARN-6113.branch-2.v3.patch

> re-direct NM Web Service to get container logs for finished applications
> 
>
> Key: YARN-6113
> URL: https://issues.apache.org/jira/browse/YARN-6113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-6113.branch-2.v1.patch, 
> YARN-6113.branch-2.v2.patch, YARN-6113.branch-2.v3.patch, 
> YARN-6113.trunk.v2.patch
>
>
> In NM web ui, when we try to get container logs for finished application, it 
> would redirect to the log server based on the configuration: 
> yarn.log.server.url. We should do the similar thing for NM WebService



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2017-02-07 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856692#comment-15856692
 ] 

Yufei Gu edited comment on YARN-4090 at 2/7/17 8:06 PM:


Hi [~zsl2007], thanks for woking on this. I could misunderstand your patch. To 
make sure we are in the same page, let me clarify this: we are trying to remove 
the recursive computation of {{getResourceUsage()}} to improve the performance. 
Right? Base on that, here are some thoughts:
# FSParentQueue don’t do the recursive {{getResourceUsage()}} in your patch and 
resource usage updates while allocating, recovering and moving. Does 
FSParentQueue get updated on resource usage when an application finishes or its 
tasks finish? Resource usage is a critical metrics of queues, it would be nice 
to including some unit tests (or modification of existing tests) to make sure 
we do it right. 
# Any reason we don't refactor the {{getResourceUsage()}} in FSLeafQueue?
Some minor nits:
# Use {{fsQueue}} instead of {{queue}} so you don’t need to cast. 
# Need one empty line after function move() 



was (Author: yufeigu):
Hi [~zsl2007], thanks for woking on this. I could misunderstand your patch. To 
make sure we are in the same page, let me clarify this: we are trying to remove 
the recursive computation of {{getResourceUsage()}} to improve the performance. 
Right? Base on that, here are some thoughts:
1. FSParentQueue don’t do the recursive {{getResourceUsage()}} in your patch 
and resource usage updates while allocating, recovering and moving. Does 
FSParentQueue get updated on resource usage when an application finishes or its 
tasks finish? Resource usage is a critical metrics of queues, it would be nice 
to including some unit tests (or modification of existing tests) to make sure 
we do it right. 
2. Any reason we don't refactor the {{getResourceUsage()}} in FSLeafQueue?
Some minor nits:
# Use {{fsQueue}} instead of {{queue}} so you don’t need to cast. 
# Need one empty line after function move() 


> Make Collections.sort() more efficient in FSParentQueue.java
> 
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: zhangshilong
> Attachments: sampling1.jpg, sampling2.jpg, YARN-4090.001.patch, 
> YARN-4090.002.patch, YARN-4090.003.patch, YARN-4090.004.patch, 
> YARN-4090.005.patch, YARN-4090.006.patch, YARN-4090-preview.patch, 
> YARN-4090-TestResult.pdf
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2017-02-07 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856692#comment-15856692
 ] 

Yufei Gu edited comment on YARN-4090 at 2/7/17 8:06 PM:


Hi [~zsl2007], thanks for woking on this. I could misunderstand your patch. To 
make sure we are in the same page, let me clarify this: we are trying to remove 
the recursive computation of {{getResourceUsage()}} to improve the performance. 
Right? Base on that, here are some thoughts:
# FSParentQueue don’t do the recursive {{getResourceUsage()}} in your patch and 
resource usage updates while allocating, recovering and moving. Does 
FSParentQueue get updated on resource usage when an application finishes or its 
tasks finish? Resource usage is a critical metrics of queues, it would be nice 
to including some unit tests (or modification of existing tests) to make sure 
we do it right. 
# Any reason we don't refactor the {{getResourceUsage()}} in FSLeafQueue?

Some minor nits:
# Use {{fsQueue}} instead of {{queue}} so you don’t need to cast. 
# Need one empty line after function move() 



was (Author: yufeigu):
Hi [~zsl2007], thanks for woking on this. I could misunderstand your patch. To 
make sure we are in the same page, let me clarify this: we are trying to remove 
the recursive computation of {{getResourceUsage()}} to improve the performance. 
Right? Base on that, here are some thoughts:
# FSParentQueue don’t do the recursive {{getResourceUsage()}} in your patch and 
resource usage updates while allocating, recovering and moving. Does 
FSParentQueue get updated on resource usage when an application finishes or its 
tasks finish? Resource usage is a critical metrics of queues, it would be nice 
to including some unit tests (or modification of existing tests) to make sure 
we do it right. 
# Any reason we don't refactor the {{getResourceUsage()}} in FSLeafQueue?
Some minor nits:
# Use {{fsQueue}} instead of {{queue}} so you don’t need to cast. 
# Need one empty line after function move() 


> Make Collections.sort() more efficient in FSParentQueue.java
> 
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: zhangshilong
> Attachments: sampling1.jpg, sampling2.jpg, YARN-4090.001.patch, 
> YARN-4090.002.patch, YARN-4090.003.patch, YARN-4090.004.patch, 
> YARN-4090.005.patch, YARN-4090.006.patch, YARN-4090-preview.patch, 
> YARN-4090-TestResult.pdf
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2017-02-07 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856692#comment-15856692
 ] 

Yufei Gu commented on YARN-4090:


Hi [~zsl2007], thanks for woking on this. I could misunderstand your patch. To 
make sure we are in the same page, let me clarify this: we are trying to remove 
the recursive computation of {{getResourceUsage()}} to improve the performance. 
Right? Base on that, here are some thoughts:
1. FSParentQueue don’t do the recursive {{getResourceUsage()}} in your patch 
and resource usage updates while allocating, recovering and moving. Does 
FSParentQueue get updated on resource usage when an application finishes or its 
tasks finish? Resource usage is a critical metrics of queues, it would be nice 
to including some unit tests (or modification of existing tests) to make sure 
we do it right. 
2. Any reason we don't refactor the {{getResourceUsage()}} in FSLeafQueue?
Some minor nits:
# Use {{fsQueue}} instead of {{queue}} so you don’t need to cast. 
# Need one empty line after function move() 


> Make Collections.sort() more efficient in FSParentQueue.java
> 
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: zhangshilong
> Attachments: sampling1.jpg, sampling2.jpg, YARN-4090.001.patch, 
> YARN-4090.002.patch, YARN-4090.003.patch, YARN-4090.004.patch, 
> YARN-4090.005.patch, YARN-4090.006.patch, YARN-4090-preview.patch, 
> YARN-4090-TestResult.pdf
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6059) Update paused container state in the state store

2017-02-07 Thread Hitesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856654#comment-15856654
 ] 

Hitesh Sharma commented on YARN-6059:
-

Ping..[~asuresh], [~kkaranasos]..can you guys take a look at the patch? The 
current patch is a very raw implementation and before I refine it would be good 
to agree on a high level approach here. 

Thank you.

> Update paused container state in the state store
> 
>
> Key: YARN-6059
> URL: https://issues.apache.org/jira/browse/YARN-6059
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Hitesh Sharma
>Assignee: Hitesh Sharma
> Attachments: YARN-5216-YARN-6059.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-02-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856489#comment-15856489
 ] 

Wangda Tan commented on YARN-5889:
--

[~eepayne], [~sunilg],

I would rather prefer to backport this patch once we have user-limit preemption 
support. This is just a refactoring patch with minor behavior changes, it 
should be backport once we get whole feature ready (intra-queue preemption for 
user limit). We can discuss more once we get that stage. 

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, 
> YARN-5889.0006.patch, YARN-5889.0007.patch, YARN-5889.0008.patch, 
> YARN-5889.0009.patch, YARN-5889.0010.patch, YARN-5889.v0.patch, 
> YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-02-07 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856463#comment-15856463
 ] 

Sunil G commented on YARN-5889:
---

We checked with Junping and there are not major changes related to this area 
which is pending. So we should be good to go for 2.8.1. I ll first give a 
branch-2 patch which has some conflicts now. After that, 2.8 patch could be 
backported.

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, 
> YARN-5889.0006.patch, YARN-5889.0007.patch, YARN-5889.0008.patch, 
> YARN-5889.0009.patch, YARN-5889.0010.patch, YARN-5889.v0.patch, 
> YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6152) Used queue percentage not accurate in UI for 2.7 and below when using DominantResourceCalculator

2017-02-07 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856428#comment-15856428
 ] 

Jonathan Hung commented on YARN-6152:
-

Great, thanks [~eepayne] and [~sunilg]!

> Used queue percentage not accurate in UI for 2.7 and below when using 
> DominantResourceCalculator
> 
>
> Key: YARN-6152
> URL: https://issues.apache.org/jira/browse/YARN-6152
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Fix For: 2.7.4
>
> Attachments: dominantRC.png, YARN-6152-branch-2.7.001.patch, 
> YARN-6152-branch-2.7.002.patch
>
>
> YARN-4751 adds the {{getUsedCapacity}} and {{getAbsoluteUsedCapacity}} 
> methods to {{AbstractCSQueue}} which is used to display queue usage in UI for 
> branch-2.7 and below. However if there is more than one partition in the 
> cluster, with different dominant resources, then queue usage may not be 
> displayed as expected.
> Contrived example: Default partition has <90GB, 10vcores>, and "test" 
> partition has <10GB, 90vcores>. {{root}} queue in default partition uses 
> <30GB, 10vcores>. Here we expect queue usage to be 100 since it is using all 
> vcores in default partition. But the displayed usage will be 
> (30GB/100GB)/(90GB/100GB) = 33%.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6027) Improve /flows API for more flexible filters fromid, collapse, userid

2017-02-07 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856413#comment-15856413
 ] 

Varun Saxena commented on YARN-6027:


Thanks [~rohithsharma] for the patch. Few high level comments. Will take a 
detailed look at the patch a little later.
# Do we need cluster ID in fromId because we are ignoring it completely?
# If there is a / in cluster ID we may have to escape it to avoid parsing 
errors.
# If we use collapse, even with fromId, there seems to be a full table scan 
which will impact the run time of this query. Maybe we can send the last real 
ID in info field of last flow activity entity if previous query was made with 
collapse field.  UI can then send this ID and we can use it to make the query 
from that specific row instead of having a full table scan.
# In processResults method you have mentioned that fromId validation is 
happening in getResult method. Could not find it.
# In processResults we first get the result from backend while applying limit 
and then process result for collapse and fromId filters. In this case we may 
return less records than limit even if they are available. Because some 
entities maybe skipped due to fromId check. And even for collapse we will merge 
entities with same flow and user which means we will return entities less than 
limit. If we want behavior limit in this case to be different, we should 
document it.

> Improve /flows API for more flexible filters fromid, collapse, userid
> -
>
> Key: YARN-6027
> URL: https://issues.apache.org/jira/browse/YARN-6027
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-6027-YARN-5355.0001.patch
>
>
> In YARN-5585 , fromId is supported for retrieving entities. We need similar 
> filter for flows/flowRun apps and flow run and flow as well. 
> Along with supporting fromId, this JIRA should also discuss following points
> * Should we throw an exception for entities/entity retrieval if duplicates 
> found?
> * TimelieEntity :
> ** Should equals method also check for idPrefix?
> ** Does idPrefix is part of identifiers?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5703) ReservationAgents are not correctly configured

2017-02-07 Thread Manikandan R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-5703:
---
Attachment: YARN-5703.004.patch

> ReservationAgents are not correctly configured
> --
>
> Key: YARN-5703
> URL: https://issues.apache.org/jira/browse/YARN-5703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Sean Po
>Assignee: Manikandan R
> Attachments: YARN-5703.001.patch, YARN-5703.002.patch, 
> YARN-5703.003.patch, YARN-5703.004.patch
>
>
> In AbstractReservationSystem, the method that instantiates a ReservationAgent 
> does not properly initialize it with the appropriate configuration because it 
> expects the ReservationAgent to implement Configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5703) ReservationAgents are not correctly configured

2017-02-07 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856417#comment-15856417
 ] 

Manikandan R commented on YARN-5703:


Thanks [~Naganarasimha] for review. Incorporated all review comments.

Just want to highlight this point - 
With these new changes, reservation agent property 
(yarn.resourcemanager.reservation.agent) would be honored if it has been passed 
through config xml files like yarn-site.xml etc. Otherwise, we would go with 
default reservation agent (AlignedPlannerWithGreedy). Whereas, current code 
(trunk etc) always use AlignedPlannerWithGreedy as reservation agent.

> ReservationAgents are not correctly configured
> --
>
> Key: YARN-5703
> URL: https://issues.apache.org/jira/browse/YARN-5703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Sean Po
>Assignee: Manikandan R
> Attachments: YARN-5703.001.patch, YARN-5703.002.patch, 
> YARN-5703.003.patch
>
>
> In AbstractReservationSystem, the method that instantiates a ReservationAgent 
> does not properly initialize it with the appropriate configuration because it 
> expects the ReservationAgent to implement Configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-02-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856400#comment-15856400
 ] 

Eric Payne commented on YARN-5889:
--

[~leftnoteasy]
bq. Since we're close to 2.8 release now, let's try to see if this patch can go 
to 2.8.1 or not after 2.8.0 release.
Since the branch-2.8.0 branch has already been created, wouldn't it be safe to 
go into branch-2.8(.1)? Or are you concerned that if they need to pull more 
things into the 2.8.0 branch before the RC, this patch may conflict?

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, 
> YARN-5889.0006.patch, YARN-5889.0007.patch, YARN-5889.0008.patch, 
> YARN-5889.0009.patch, YARN-5889.0010.patch, YARN-5889.v0.patch, 
> YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6152) Used queue percentage not accurate in UI for 2.7 and below when using DominantResourceCalculator

2017-02-07 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-6152:
-
Fix Version/s: 2.7.4

> Used queue percentage not accurate in UI for 2.7 and below when using 
> DominantResourceCalculator
> 
>
> Key: YARN-6152
> URL: https://issues.apache.org/jira/browse/YARN-6152
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Fix For: 2.7.4
>
> Attachments: dominantRC.png, YARN-6152-branch-2.7.001.patch, 
> YARN-6152-branch-2.7.002.patch
>
>
> YARN-4751 adds the {{getUsedCapacity}} and {{getAbsoluteUsedCapacity}} 
> methods to {{AbstractCSQueue}} which is used to display queue usage in UI for 
> branch-2.7 and below. However if there is more than one partition in the 
> cluster, with different dominant resources, then queue usage may not be 
> displayed as expected.
> Contrived example: Default partition has <90GB, 10vcores>, and "test" 
> partition has <10GB, 90vcores>. {{root}} queue in default partition uses 
> <30GB, 10vcores>. Here we expect queue usage to be 100 since it is using all 
> vcores in default partition. But the displayed usage will be 
> (30GB/100GB)/(90GB/100GB) = 33%.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6152) Used queue percentage not accurate in UI for 2.7 and below when using DominantResourceCalculator

2017-02-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856365#comment-15856365
 ] 

Eric Payne commented on YARN-6152:
--

The patch LGTM. +1

The unit test failures are unrelated.

> Used queue percentage not accurate in UI for 2.7 and below when using 
> DominantResourceCalculator
> 
>
> Key: YARN-6152
> URL: https://issues.apache.org/jira/browse/YARN-6152
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: dominantRC.png, YARN-6152-branch-2.7.001.patch, 
> YARN-6152-branch-2.7.002.patch
>
>
> YARN-4751 adds the {{getUsedCapacity}} and {{getAbsoluteUsedCapacity}} 
> methods to {{AbstractCSQueue}} which is used to display queue usage in UI for 
> branch-2.7 and below. However if there is more than one partition in the 
> cluster, with different dominant resources, then queue usage may not be 
> displayed as expected.
> Contrived example: Default partition has <90GB, 10vcores>, and "test" 
> partition has <10GB, 90vcores>. {{root}} queue in default partition uses 
> <30GB, 10vcores>. Here we expect queue usage to be 100 since it is using all 
> vcores in default partition. But the displayed usage will be 
> (30GB/100GB)/(90GB/100GB) = 33%.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4675) Reorganize TimeClientImpl into TimeClientV1Impl and TimeClientV2Impl

2017-02-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856360#comment-15856360
 ] 

Hadoop QA commented on YARN-4675:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  2m 
12s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
5s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 
58s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 46s{color} | {color:orange} root: The patch generated 11 new + 744 unchanged 
- 10 fixed = 755 total (was 754) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common 
generated 0 new + 4575 unchanged - 4 fixed = 4575 total (was 4579) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} hadoop-yarn-server-tests in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} hadoop-yarn-applications-distributedshell in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
40s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
38s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| 

[jira] [Commented] (YARN-5917) Make navigation link active when selecting sub tabs in "Applications" and "Nodes" page for new UI

2017-02-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856323#comment-15856323
 ] 

Hudson commented on YARN-5917:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11217 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11217/])
YARN-5917. Make navigation link active when selecting sub tabs in (sunilg: rev 
7a8f3f237e9eb644da830a79889d530c1e8f52ba)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/application.hbs


> Make navigation link active when selecting sub tabs in "Applications" and 
> "Nodes" page for new UI
> -
>
> Key: YARN-5917
> URL: https://issues.apache.org/jira/browse/YARN-5917
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-ui-v2
>Affects Versions: 3.0.0-alpha2
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Minor
> Fix For: 3.0.0-alpha3
>
> Attachments: Screen Shot 2016-11-20 at 20.37.53.png, Screen Shot 
> 2016-11-20 at 20.38.01.png, YARN-5917.01.patch
>
>
> When we select "Long Running Services" under "Applications" and "Nodes 
> Heatmap Chart" under "Nodes", navigation links become inactive.
> They can be always active when child components are selected.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5956) Refactor ClientRMService

2017-02-07 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856302#comment-15856302
 ] 

Sunil G edited comment on YARN-5956 at 2/7/17 4:41 PM:
---

HI [~kaisasak]

Thanks for the patch. I have some doubts in this patch. 
{{GetApplicationReportResponse getApplicationReport}} is now invoking 
{{getCallerUgi}} and {{verifyUserAccessForRMApp}}. Internally 
{{verifyUserAccessForRMApp}} is already doing {{checkAccess}} check. But its 
still invoked from *getApplicationReport*.

I think same comment applies to few more apis as well. Could you please check 
the same.


was (Author: sunilg):
HI [~kaisasak]

Thanks for the patch. I have some doubts in this patch. 
{{GetApplicationReportResponse getApplicationReport}} is now invoking 
{{getCallerUgi} } and {{verifyUserAccessForRMApp}}. Internally 
{{verifyUserAccessForRMApp}} is already doing {{checkAccess}} check. But its 
still invoked from *getApplicationReport*.

I think same comment applies to few more apis as well. Could you please check 
the same.

> Refactor ClientRMService
> 
>
> Key: YARN-5956
> URL: https://issues.apache.org/jira/browse/YARN-5956
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Minor
> Attachments: YARN-5956.01.patch, YARN-5956.02.patch, 
> YARN-5956.03.patch, YARN-5956.04.patch, YARN-5956.05.patch, 
> YARN-5956.06.patch, YARN-5956.07.patch, YARN-5956.08.patch
>
>
> Some refactoring can be done in {{ClientRMService}}.
> - Remove redundant variable declaration
> - Fill in missing javadocs
> - Proper variable access modifier
> - Fix some typos in method name and exception messages



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5956) Refactor ClientRMService

2017-02-07 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856302#comment-15856302
 ] 

Sunil G commented on YARN-5956:
---

HI [~kaisasak]

Thanks for the patch. I have some doubts in this patch. 
{{GetApplicationReportResponse getApplicationReport}} is now invoking 
{{getCallerUgi} } and {{verifyUserAccessForRMApp}}. Internally 
{{verifyUserAccessForRMApp}} is already doing {{checkAccess}} check. But its 
still invoked from *getApplicationReport*.

I think same comment applies to few more apis as well. Could you please check 
the same.

> Refactor ClientRMService
> 
>
> Key: YARN-5956
> URL: https://issues.apache.org/jira/browse/YARN-5956
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Minor
> Attachments: YARN-5956.01.patch, YARN-5956.02.patch, 
> YARN-5956.03.patch, YARN-5956.04.patch, YARN-5956.05.patch, 
> YARN-5956.06.patch, YARN-5956.07.patch, YARN-5956.08.patch
>
>
> Some refactoring can be done in {{ClientRMService}}.
> - Remove redundant variable declaration
> - Fill in missing javadocs
> - Proper variable access modifier
> - Fix some typos in method name and exception messages



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >