[jira] [Assigned] (YARN-6075) Yarn top for FairScheduler

2017-01-09 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-6075:
--

Assignee: Yufei Gu

> Yarn top for FairScheduler
> --
>
> Key: YARN-6075
> URL: https://issues.apache.org/jira/browse/YARN-6075
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Prabhu Joseph
>Assignee: Yufei Gu
> Attachments: Yarn_Top_FairScheduler.png
>
>
> Yarn top output for FairScheduler shows empty values. (attached output) We 
> need to handle yarn top with FairScheduler. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6073) Misuse of format specifier in Preconditions.checkArgument

2017-01-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814173#comment-15814173
 ] 

Hudson commented on YARN-6073:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #11096 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11096/])
YARN-6073. Misuse of format specifier in Preconditions.checkArgument (templedf: 
rev 6332a318bc1e2e9d73d7159eab26347bb3f1f9b3)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java


> Misuse of format specifier in Preconditions.checkArgument
> -
>
> Key: YARN-6073
> URL: https://issues.apache.org/jira/browse/YARN-6073
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yuanbo Liu
>Priority: Trivial
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-6073.001.patch
>
>
> RMAdminCLI.java
> {code}
>  int nLabels = map.get(nodeId).size();
>   Preconditions.checkArgument(nLabels <= 1, "%d labels specified on 
> host=%s"
>   + ", please note that we do not support specifying multiple"
>   + " labels on a single host for now.", nLabels, nodeIdStr);
> {code}
> The {{%d}} should be replaced with {{%s}}, per
> https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Preconditions.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-09 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814163#comment-15814163
 ] 

zhangyubiao edited comment on YARN-5995 at 1/10/17 7:17 AM:


Thanks [~sunilg] .  IMO, Histogram belong com.codahale.metrics  package , 
MutableCounter belong org.apache.hadoop.metrics2.lib package,If we use these 
two metrics counter .it will mess up the metrics lib.  
So,I think it's better to write a MutableHistogram,MutableTimeHistogram 
reference HBase's MutableHistogram,MutableTimeHistogram.  What do you 
thought,[~sunilg] ?


was (Author: piaoyu zhang):
Thanks [~sunilg] .  IMO, Histogram belong com.codahale.metrics  package , 
MutableCounter belong org.apache.hadoop.metrics2.lib package,If we use these 
two metrics counter .it will mess up the metrics lib.
So,I think it's better to write a MutableHistogram,MutableTimeHistogram 
reference HBase's MutableHistogram,MutableTimeHistogram.  What do you 
thought,[~sunilg] ?

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-09 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814163#comment-15814163
 ] 

zhangyubiao commented on YARN-5995:
---

Thanks [~sunilg] .  IMO, Histogram belong com.codahale.metrics  package , 
MutableCounter belong org.apache.hadoop.metrics2.lib package,If we use these 
two metrics counter .it will mess up the metrics lib.
So,I think it's better to write a MutableHistogram,MutableTimeHistogram 
reference HBase's MutableHistogram,MutableTimeHistogram.  What do you 
thought,[~sunilg] ?

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6072) RM unable to start in secure mode

2017-01-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814127#comment-15814127
 ] 

Hadoop QA commented on YARN-6072:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 19s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 75 unchanged - 0 fixed = 76 total (was 75) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 40m 
52s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 63m 27s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6072 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846492/YARN-6072.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux a3c667d13668 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 945db55 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/14616/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/14616/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14616/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 

[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-09 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814049#comment-15814049
 ] 

Sunil G commented on YARN-5995:
---

Yes. Thanks for clarifying [~jianhe]

So Ideally we can support
- #total write ops
- #total failed ops
- time cost of each write op (average)
- write latency

First two are counter, hence we can use MutableCounter. Latter two may suit 
well with histogram. [~piaoyu zhang], pls share your thoughts.



> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6073) Misuse of format specifier in Preconditions.checkArgument

2017-01-09 Thread Yuanbo Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814032#comment-15814032
 ] 

Yuanbo Liu commented on YARN-6073:
--

[~templedf] Thanks for your review.

> Misuse of format specifier in Preconditions.checkArgument
> -
>
> Key: YARN-6073
> URL: https://issues.apache.org/jira/browse/YARN-6073
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yuanbo Liu
>Priority: Trivial
> Attachments: YARN-6073.001.patch
>
>
> RMAdminCLI.java
> {code}
>  int nLabels = map.get(nodeId).size();
>   Preconditions.checkArgument(nLabels <= 1, "%d labels specified on 
> host=%s"
>   + ", please note that we do not support specifying multiple"
>   + " labels on a single host for now.", nLabels, nodeIdStr);
> {code}
> The {{%d}} should be replaced with {{%s}}, per
> https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Preconditions.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6072) RM unable to start in secure mode

2017-01-09 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-6072:
--
Attachment: YARN-6072.01.branch-2.patch

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-6072.01.branch-2.patch, YARN-6072.01.patch, 
> hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and 
> invokes  {{AdminService#refreshAll()}} but {{AdminService#serviceStart()}} 
> happens after {{ActiveStandbyElectorBasedElectorService}} service start is 

[jira] [Updated] (YARN-6022) Revert changes of AbstractResourceRequest

2017-01-09 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-6022:
---
Attachment: (was: YARN-6022.branch-2.006.patch)

> Revert changes of AbstractResourceRequest
> -
>
> Key: YARN-6022
> URL: https://issues.apache.org/jira/browse/YARN-6022
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-6022.001.patch, YARN-6022.002.patch, 
> YARN-6022.003.patch, YARN-6022.004.patch, YARN-6022.005.patch, 
> YARN-6022.branch-2.005.patch, YARN-6022.branch-2.006.patch
>
>
> YARN-5774 added AbstractResourceRequest to make easier internal scheduler 
> change, this is not a correct approach: For example, with this change, we 
> need to make AbstractResourceRequest to be public/stable. And end users can 
> use it like:
> {code}
> AbstractResourceRequest request = ...
> request.setCapability(...)
> {code}
> But AbstractResourceRequest should not be visible by application at all. 
> We need to revert it from branch-2.8 / branch-2 / trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6022) Revert changes of AbstractResourceRequest

2017-01-09 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-6022:
---
Attachment: YARN-6022.branch-2.006.patch

> Revert changes of AbstractResourceRequest
> -
>
> Key: YARN-6022
> URL: https://issues.apache.org/jira/browse/YARN-6022
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-6022.001.patch, YARN-6022.002.patch, 
> YARN-6022.003.patch, YARN-6022.004.patch, YARN-6022.005.patch, 
> YARN-6022.branch-2.005.patch, YARN-6022.branch-2.006.patch
>
>
> YARN-5774 added AbstractResourceRequest to make easier internal scheduler 
> change, this is not a correct approach: For example, with this change, we 
> need to make AbstractResourceRequest to be public/stable. And end users can 
> use it like:
> {code}
> AbstractResourceRequest request = ...
> request.setCapability(...)
> {code}
> But AbstractResourceRequest should not be visible by application at all. 
> We need to revert it from branch-2.8 / branch-2 / trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6073) Misuse of format specifier in Preconditions.checkArgument

2017-01-09 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814006#comment-15814006
 ] 

Daniel Templeton commented on YARN-6073:


Patch looks good to me.  +1

> Misuse of format specifier in Preconditions.checkArgument
> -
>
> Key: YARN-6073
> URL: https://issues.apache.org/jira/browse/YARN-6073
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yuanbo Liu
>Priority: Trivial
> Attachments: YARN-6073.001.patch
>
>
> RMAdminCLI.java
> {code}
>  int nLabels = map.get(nodeId).size();
>   Preconditions.checkArgument(nLabels <= 1, "%d labels specified on 
> host=%s"
>   + ", please note that we do not support specifying multiple"
>   + " labels on a single host for now.", nLabels, nodeIdStr);
> {code}
> The {{%d}} should be replaced with {{%s}}, per
> https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Preconditions.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6062) nodemanager memory leak

2017-01-09 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813955#comment-15813955
 ] 

Varun Saxena commented on YARN-6062:


Is NM recovery enabled ?

> nodemanager memory leak
> ---
>
> Key: YARN-6062
> URL: https://issues.apache.org/jira/browse/YARN-6062
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: gehaijiang
> Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt
>
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  8986 data  20   0 21.3g  19g 7376 S  5.5 20.7   2458:09 java
> 38432 data  20   0  9.8g 7.9g 6300 S 95.5  8.4  35273:23 java
>  6653 data  20   0 4558m 3.4g  10m S  9.2  3.6   6640:37 java
> $ jps
> 6653 NodeManager
> Nodemanager memory has been up,Reach  10G。
> nodemanager   yarn-env.sh  configure  (2G)
> YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m 
> -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps 
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6008) Fetch container list for failed application attempt

2017-01-09 Thread David Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813936#comment-15813936
 ] 

David Yan commented on YARN-6008:
-

Thanks, I'm just a YARN user, not exactly a dev, at least not yet. :)

> Fetch container list for failed application attempt
> ---
>
> Key: YARN-6008
> URL: https://issues.apache.org/jira/browse/YARN-6008
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: hadoop version 2.6
>Reporter: Priyanka Gugale
>
> When we run command "yarn container -list" on using failed application 
> attempt we should either get containers from that attempt or get a back list 
> as containers are no longer in running state.
> Steps to reproduce:
> 1. Launch a yarn application. 
> 2. Kill app master, it tries to restart application with new attempt id. 
> 3. Now run yarn command,
> yarn container -list 
> Where Application Attempt ID is of failed attempt, 
> it lists the container from next attempt which is in "RUNNING" state right 
> now.
> Expected behavior:
> It should return list of killed containers from attempt 1 or empty list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3774) ZKRMStateStore should use Curator 3.0 and avail CuratorOp

2017-01-09 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813915#comment-15813915
 ] 

Tsuyoshi Ozawa commented on YARN-3774:
--

Thanks Jordan for the notification! I think we should use 3.3.0, 2.12.0 or 
later.

> ZKRMStateStore should use Curator 3.0 and avail CuratorOp
> -
>
> Key: YARN-3774
> URL: https://issues.apache.org/jira/browse/YARN-3774
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
>
> YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are 
> somewhat involved, and could be improved using CuratorOp introduced in 
> Curator 3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version 
> and make this change. 
> Curator is considering shading guava through CURATOR-200. In Hadoop 3, we 
> should upgrade to the next Curator version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-09 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813903#comment-15813903
 ] 

Jian He commented on YARN-5995:
---

sorry, I meant the external tool such as (Ambari Metrics Server) can store 
these metrics as long as RM emits it in the ideal way.. I don't actually mean 
to store these metrics in this jira.



> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6072) RM unable to start in secure mode

2017-01-09 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated YARN-6072:
--
Attachment: YARN-6072.01.patch

attaching patch for trunk. will update for branch 2 and 2.8 shortly

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-6072.01.patch, 
> hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and 
> invokes  {{AdminService#refreshAll()}} but {{AdminService#serviceStart()}} 
> happens after 

[jira] [Commented] (YARN-6062) nodemanager memory leak

2017-01-09 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813788#comment-15813788
 ] 

Bibin A Chundatt commented on YARN-6062:


[~gehaijiang]
Could you please have a look at 
[YARN-6017|https://issues.apache.org/jira/browse/YARN-6017?focusedCommentId=15812167=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15812167]

Is it possible to change the JRE version and check by any chance.

> nodemanager memory leak
> ---
>
> Key: YARN-6062
> URL: https://issues.apache.org/jira/browse/YARN-6062
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: gehaijiang
> Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt
>
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  8986 data  20   0 21.3g  19g 7376 S  5.5 20.7   2458:09 java
> 38432 data  20   0  9.8g 7.9g 6300 S 95.5  8.4  35273:23 java
>  6653 data  20   0 4558m 3.4g  10m S  9.2  3.6   6640:37 java
> $ jps
> 6653 NodeManager
> Nodemanager memory has been up,Reach  10G。
> nodemanager   yarn-env.sh  configure  (2G)
> YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m 
> -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps 
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-09 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813739#comment-15813739
 ] 

Sunil G commented on YARN-5995:
---

Thank [~jianhe] for chipping in.

I also agree about the read op. We can skip them for now.

As mentioned, i think we can redefine the list of metrics as below.
- #total write operations
- Time series metric to store the cost of each write op (time duration taken 
for each operation). Histogram will suit here I think.
- Write Latency (Data written for each op)

But I am also thinking in similar line.  At time t1 if these datas are stored 
as (#total write op, time cost of write ops, data written in last time 
interval), then we can store same info at time t2 also. If we store all these 
datas in time series way, then it may make external modules job easy. But its a 
complex metric module to design internally. Do you whether we need to jump to 
this immediately, or can do something like 3 metrics which I mentioned earlier. 
Thoughts [~jianhe].


> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3955) Support for application priority ACLs in queues of CapacityScheduler

2017-01-09 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813710#comment-15813710
 ] 

Sunil G commented on YARN-3955:
---

Thank you [~leftnoteasy] for the review and commit. Thanks [~jianhe] for 
additional reviews.

> Support for application priority ACLs in queues of CapacityScheduler
> 
>
> Key: YARN-3955
> URL: https://issues.apache.org/jira/browse/YARN-3955
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: ApplicationPriority-ACL.pdf, 
> ApplicationPriority-ACLs-v2.pdf, YARN-3955.0001.patch, YARN-3955.0002.patch, 
> YARN-3955.0003.patch, YARN-3955.0004.patch, YARN-3955.0005.patch, 
> YARN-3955.0006.patch, YARN-3955.0007.patch, YARN-3955.0008.patch, 
> YARN-3955.0009.patch, YARN-3955.0010.patch, YARN-3955.v0.patch, 
> YARN-3955.v1.patch, YARN-3955.wip1.patch
>
>
> Support will be added for User-level access permission to use different 
> application-priorities. This is to avoid situations where all users try 
> running max priority in the cluster and thus degrading the value of 
> priorities.
> Access Control Lists can be set per priority level within each queue. Below 
> is an example configuration that can be added in capacity scheduler 
> configuration
> file for each Queue level.
> yarn.scheduler.capacity.root...acl=user1,user2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-01-09 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813705#comment-15813705
 ] 

Sunil G commented on YARN-5889:
---

Thank You Eric. Yes. I missed this. Will update in next patch.

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.v0.patch, YARN-5889.v1.patch, 
> YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5980) Update documentation for single node hbase deploy

2017-01-09 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813673#comment-15813673
 ] 

Vrushali C commented on YARN-5980:
--

Thanks Sangjin! Will make these changes.

Table specific coprocessor requires dynamic loading of coprocessors to work for 
classes starting with org.apache.hadoop. That was (fixed) allowed only in hbase 
1.2.x, so we should use that now. Let me confirm that this works fine and I 
will update the patch further to include all changes. 

> Update documentation for single node hbase deploy
> -
>
> Key: YARN-5980
> URL: https://issues.apache.org/jira/browse/YARN-5980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-5980.001.patch, YARN-5980.002.patch, 
> YARN-5980.003.patch
>
>
> Per HBASE-17272, a single node hbase deployment (single jvm running daemons + 
> hdfs writes) will be added to hbase shortly. 
> We should update the timeline service documentation in the setup/deployment 
> context accordingly, this will help users who are a bit wary of hbase 
> deployments help get started with timeline service more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4148) When killing app, RM releases app's resource before they are released by NM

2017-01-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813658#comment-15813658
 ] 

Hudson commented on YARN-4148:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11095 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11095/])
YARN-4148. When killing app, RM releases app's resource before they are 
(junping_du: rev 945db55f2e6521d33d4f90bbb09179b0feba5e7a)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java


> When killing app, RM releases app's resource before they are released by NM
> ---
>
> Key: YARN-4148
> URL: https://issues.apache.org/jira/browse/YARN-4148
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jason Lowe
> Attachments: YARN-4148.001.patch, YARN-4148.002.patch, 
> YARN-4148.003.patch, YARN-4148.wip.patch, 
> free_in_scheduler_but_not_node_prototype-branch-2.7.patch
>
>
> When killing a app, RM scheduler releases app's resource as soon as possible, 
> then it might allocate these resource for new requests. But NM have not 
> released them at that time.
> The problem was found when we supported GPU as a resource(YARN-4122).  Test 
> environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 
> GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. 
> But when B tried to start container on NM, NM found it didn't have 3 GPUs to 
> allocate because it had not released A's GPUs.
> I think the problem also exists for CPU/Memory. It might cause OOM when 
> memory is overused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4148) When killing app, RM releases app's resource before they are released by NM

2017-01-09 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813644#comment-15813644
 ] 

Junping Du commented on YARN-4148:
--

I just commit the 003 patch to trunk and branch-2. For branch-2.8, there are 
several conflicts there. 
Hi [~jlowe], do we want this commit get landed in branch-2.8? If so, can you 
put a patch here for branch-2.8? Thx!

> When killing app, RM releases app's resource before they are released by NM
> ---
>
> Key: YARN-4148
> URL: https://issues.apache.org/jira/browse/YARN-4148
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jason Lowe
> Attachments: YARN-4148.001.patch, YARN-4148.002.patch, 
> YARN-4148.003.patch, YARN-4148.wip.patch, 
> free_in_scheduler_but_not_node_prototype-branch-2.7.patch
>
>
> When killing a app, RM scheduler releases app's resource as soon as possible, 
> then it might allocate these resource for new requests. But NM have not 
> released them at that time.
> The problem was found when we supported GPU as a resource(YARN-4122).  Test 
> environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 
> GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. 
> But when B tried to start container on NM, NM found it didn't have 3 GPUs to 
> allocate because it had not released A's GPUs.
> I think the problem also exists for CPU/Memory. It might cause OOM when 
> memory is overused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6075) Yarn top for FairScheduler

2017-01-09 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-6075:
---
Component/s: (was: resourcemanager)

> Yarn top for FairScheduler
> --
>
> Key: YARN-6075
> URL: https://issues.apache.org/jira/browse/YARN-6075
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Prabhu Joseph
> Attachments: Yarn_Top_FairScheduler.png
>
>
> Yarn top output for FairScheduler shows empty values. (attached output) We 
> need to handle yarn top with FairScheduler. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6062) nodemanager memory leak

2017-01-09 Thread gehaijiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813562#comment-15813562
 ] 

gehaijiang commented on YARN-6062:
--

pid  6653 process restart. Cluster nodemanager have memory leaks

> nodemanager memory leak
> ---
>
> Key: YARN-6062
> URL: https://issues.apache.org/jira/browse/YARN-6062
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: gehaijiang
> Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt
>
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  8986 data  20   0 21.3g  19g 7376 S  5.5 20.7   2458:09 java
> 38432 data  20   0  9.8g 7.9g 6300 S 95.5  8.4  35273:23 java
>  6653 data  20   0 4558m 3.4g  10m S  9.2  3.6   6640:37 java
> $ jps
> 6653 NodeManager
> Nodemanager memory has been up,Reach  10G。
> nodemanager   yarn-env.sh  configure  (2G)
> YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m 
> -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps 
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6062) nodemanager memory leak

2017-01-09 Thread gehaijiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813556#comment-15813556
 ] 

gehaijiang commented on YARN-6062:
--

java version "1.7.0_65"
Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

> nodemanager memory leak
> ---
>
> Key: YARN-6062
> URL: https://issues.apache.org/jira/browse/YARN-6062
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: gehaijiang
> Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt
>
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  8986 data  20   0 21.3g  19g 7376 S  5.5 20.7   2458:09 java
> 38432 data  20   0  9.8g 7.9g 6300 S 95.5  8.4  35273:23 java
>  6653 data  20   0 4558m 3.4g  10m S  9.2  3.6   6640:37 java
> $ jps
> 6653 NodeManager
> Nodemanager memory has been up,Reach  10G。
> nodemanager   yarn-env.sh  configure  (2G)
> YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m 
> -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps 
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6011) Add a new web service to list the files on a container in AHSWebService

2017-01-09 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813533#comment-15813533
 ] 

Junping Du commented on YARN-6011:
--

Thanks [~xgong] for delivering the patch. A couple of comments:
1. For generating URI embed in response of getContainerLogsInfo, I saw some 
very similar one in other places, like: getLogs(). Can we refactor the code a 
bit to reuse the same logic?

2. For getContainerLogsInfo(), if an app is not in running or finished state, 
here we will return bad request. However, I remember in our ATS implementation, 
RM after restart could send regressioned application state event to ATS, like 
app creation event to ATS which was running before. Can you double check ATS's 
app status won't have regression? Otherwise, we shouldn't just simply return a 
bad request.

3. For getContainerLogMeta(), I remember I have some previous comments on 
refactor code (consolidate similar logic, especially log reader) in previous 
JIRAs. How's going with that effort? If that effort is not a short term priory 
for you, please add a TODO here - may be someone else read this part of code 
could help on that.

Other looks good to me.

> Add a new web service to list the files on a container in AHSWebService
> ---
>
> Key: YARN-6011
> URL: https://issues.apache.org/jira/browse/YARN-6011
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-6011.1.patch, YARN-6011.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5937) stop-yarn.sh is not able to gracefully stop node managers

2017-01-09 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813490#comment-15813490
 ] 

Weiwei Yang commented on YARN-5937:
---

Perfect, thank you [~Naganarasimha] :)

> stop-yarn.sh is not able to gracefully stop node managers
> -
>
> Key: YARN-5937
> URL: https://issues.apache.org/jira/browse/YARN-5937
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>  Labels: script
> Attachments: YARN-5937.01.patch, nm_shutdown.log
>
>
> stop-yarn.sh always gives following output
> {code}
> ./sbin/stop-yarn.sh
> Stopping resourcemanager
> Stopping nodemanagers
> : WARNING: nodemanager did not stop gracefully after 5 seconds: 
> Trying to kill with kill -9
> : ERROR: Unable to kill 18097
> {code}
> this was because resource manager is stopped before node managers, when the 
> shutdown hook manager tries to gracefully stop NM services, NM needs to 
> unregister with RM, and it gets timeout as NM could not connect to RM 
> (already stopped). See log (stop RM then run kill )
> {code}
> 16/11/28 08:26:43 ERROR nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM
> ...
> 16/11/28 08:26:53 WARN util.ShutdownHookManager: ShutdownHook 
> 'CompositeServiceShutdownHook' timeout, java.util.concurrent.TimeoutException
> java.util.concurrent.TimeoutException
>   at java.util.concurrent.FutureTask.get(FutureTask.java:205)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67)
> ...
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:291)
> ...
> 16/11/28 08:27:13 ERROR util.ShutdownHookManager: ShutdownHookManger shutdown 
> forcefully.
> {code}
> the shutdown hooker has a default of 10s timeout, so if RM is stopped before 
> NMs, they always took more than 10s to stop (in java code). However 
> stop-yarn.sh only gives 5s timeout, so NM is always killed instead of stopped.
> It would make sense to stop NMs before RMs in this script, in a graceful way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5980) Update documentation for single node hbase deploy

2017-01-09 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813393#comment-15813393
 ] 

Sangjin Lee commented on YARN-5980:
---

Thanks [~vrushalic] for your patch! Most of the comments are minor (except the 
last one).

- l.184: it might be better if the URL is enclosed in parenthesis: "for your 
setup (http://...)."
- l.191: "on standalone HBase" -> "on the standalone HBase setup"
- l.192: "they persist" -> "it persists"
- l.194: for consistency, let's use the fixed width font for hbase-site.xml: 
`hbase-site.xml`
- l.195: likewise, the hbase.rootdir -> the `hbase.rootdir` property
- l.196: likewise, hbase.cluster.distributed -> `hbase.cluster.distributed`
- l.212: let's add a period at the end
- l.234: Have we changed the coprocessor to a dynamic one? I don't think we 
made that change (yet)?


> Update documentation for single node hbase deploy
> -
>
> Key: YARN-5980
> URL: https://issues.apache.org/jira/browse/YARN-5980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-5980.001.patch, YARN-5980.002.patch, 
> YARN-5980.003.patch
>
>
> Per HBASE-17272, a single node hbase deployment (single jvm running daemons + 
> hdfs writes) will be added to hbase shortly. 
> We should update the timeline service documentation in the setup/deployment 
> context accordingly, this will help users who are a bit wary of hbase 
> deployments help get started with timeline service more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6076) Backport YARN-4752 (FS preemption changes) to branch-2

2017-01-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813388#comment-15813388
 ] 

Hadoop QA commented on YARN-6076:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
22s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
14s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
25s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
33s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
37s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
32s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
23s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
41s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
37s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 46s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 13 new + 173 unchanged - 145 fixed = 186 total (was 318) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
25s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
21s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_111. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_111
 with JDK v1.8.0_111 generated 0 new + 913 unchanged - 8 fixed = 913 total (was 
921) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
42s{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_121. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 25s{color} 
| {color:red} 

[jira] [Commented] (YARN-5980) Update documentation for single node hbase deploy

2017-01-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813380#comment-15813380
 ] 

Hadoop QA commented on YARN-5980:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5980 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846457/YARN-5980.003.patch |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux a619eb2f90fc 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 603cbcd |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14615/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Update documentation for single node hbase deploy
> -
>
> Key: YARN-5980
> URL: https://issues.apache.org/jira/browse/YARN-5980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-5980.001.patch, YARN-5980.002.patch, 
> YARN-5980.003.patch
>
>
> Per HBASE-17272, a single node hbase deployment (single jvm running daemons + 
> hdfs writes) will be added to hbase shortly. 
> We should update the timeline service documentation in the setup/deployment 
> context accordingly, this will help users who are a bit wary of hbase 
> deployments help get started with timeline service more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2017-01-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813375#comment-15813375
 ] 

Wangda Tan commented on YARN-5764:
--

Thanks [~devaraj.k] for updating design doc and patch, some questions/comments:

1) What is the benefit to manually specify NUMA node? Since this is potentially 
complex for end user to specify, I think it's better to directly read data from 
OS.
2) Does the changes work on platform other than Linux?
3) I'm not quite sure about if this could happen: with this patch, YARN will 
launch process one by one on each NUMA node to bind memory/cpu. Is it possible 
that there's another process (outside of YARN) uses memory of NUMA node which 
causes processes launched by YARN failed to bind or run?
4) This patch uses hard binding (get allocated resource on specified node or 
fail), is it better to specify soft binding (prefer to allocate and can also 
accept other node). I think soft binding should be default behavior to support 
NUMA.

Thoughts?

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
> Attachments: NUMA Awareness for YARN Containers.pdf, 
> YARN-5764-v0.patch, YARN-5764-v1.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5980) Update documentation for single node hbase deploy

2017-01-09 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-5980:
-
Attachment: YARN-5980.003.patch


Uploading 003 that fixes the whitespace at the end of line as reported by 
Hadoop QA.

> Update documentation for single node hbase deploy
> -
>
> Key: YARN-5980
> URL: https://issues.apache.org/jira/browse/YARN-5980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-5980.001.patch, YARN-5980.002.patch, 
> YARN-5980.003.patch
>
>
> Per HBASE-17272, a single node hbase deployment (single jvm running daemons + 
> hdfs writes) will be added to hbase shortly. 
> We should update the timeline service documentation in the setup/deployment 
> context accordingly, this will help users who are a bit wary of hbase 
> deployments help get started with timeline service more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5980) Update documentation for single node hbase deploy

2017-01-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813313#comment-15813313
 ] 

Hadoop QA commented on YARN-5980:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 58s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5980 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846451/YARN-5980.002.patch |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 217c144f598e 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 603cbcd |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/14613/artifact/patchprocess/whitespace-eol.txt
 |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14613/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Update documentation for single node hbase deploy
> -
>
> Key: YARN-5980
> URL: https://issues.apache.org/jira/browse/YARN-5980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-5980.001.patch, YARN-5980.002.patch
>
>
> Per HBASE-17272, a single node hbase deployment (single jvm running daemons + 
> hdfs writes) will be added to hbase shortly. 
> We should update the timeline service documentation in the setup/deployment 
> context accordingly, this will help users who are a bit wary of hbase 
> deployments help get started with timeline service more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6072) RM unable to start in secure mode

2017-01-09 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813302#comment-15813302
 ] 

Junping Du commented on YARN-6072:
--

[~jianhe]'s comments are pretty persuasive. I will wait this issue get resolved 
before kicking off 2.8.0 RC.
[~ajithshetty], as [~ka...@cloudera.com] mentioned above, please let us know 
your plan and we can help to take over if you have other priorities. Thanks!

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During 

[jira] [Updated] (YARN-6022) Revert changes of AbstractResourceRequest

2017-01-09 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-6022:
---
Attachment: YARN-6022.branch-2.006.patch

Oops.  Screwed up patch5.  Hopefully this works better.

> Revert changes of AbstractResourceRequest
> -
>
> Key: YARN-6022
> URL: https://issues.apache.org/jira/browse/YARN-6022
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-6022.001.patch, YARN-6022.002.patch, 
> YARN-6022.003.patch, YARN-6022.004.patch, YARN-6022.005.patch, 
> YARN-6022.branch-2.005.patch, YARN-6022.branch-2.006.patch
>
>
> YARN-5774 added AbstractResourceRequest to make easier internal scheduler 
> change, this is not a correct approach: For example, with this change, we 
> need to make AbstractResourceRequest to be public/stable. And end users can 
> use it like:
> {code}
> AbstractResourceRequest request = ...
> request.setCapability(...)
> {code}
> But AbstractResourceRequest should not be visible by application at all. 
> We need to revert it from branch-2.8 / branch-2 / trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5980) Update documentation for single node hbase deploy

2017-01-09 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-5980:
-
Attachment: YARN-5980.002.patch

Uploading patch 002. This updates the hbase version as well as makes the steps 
a bit more clearer I think.


> Update documentation for single node hbase deploy
> -
>
> Key: YARN-5980
> URL: https://issues.apache.org/jira/browse/YARN-5980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-5980.001.patch, YARN-5980.002.patch
>
>
> Per HBASE-17272, a single node hbase deployment (single jvm running daemons + 
> hdfs writes) will be added to hbase shortly. 
> We should update the timeline service documentation in the setup/deployment 
> context accordingly, this will help users who are a bit wary of hbase 
> deployments help get started with timeline service more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads

2017-01-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813265#comment-15813265
 ] 

Hadoop QA commented on YARN-6061:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 16s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: The patch generated 1 new + 
0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
21s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6061 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846437/YARN-6061.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux e4b5d55dccba 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 91bf504 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/14611/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14611/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14611/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add a customized uncaughtexceptionhandler for critical threads
> --
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: 

[jira] [Commented] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2017-01-09 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813260#comment-15813260
 ] 

Daniel Templeton commented on YARN-5554:


Sorry.  Looks like I started repeating myself.  TMJ!  (Too Many JIRAs!)

Fine by me.  +1.  I'll commit when I get a chance.

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-medium
> Attachments: YARN-5554.10.patch, YARN-5554.11.patch, 
> YARN-5554.12.patch, YARN-5554.13.patch, YARN-5554.14.patch, 
> YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, YARN-5554.5.patch, 
> YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6076) Backport YARN-4752 (FS preemption changes) to branch-2

2017-01-09 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-6076:
---
Attachment: yarn-6076-branch-2.1.patch

> Backport YARN-4752 (FS preemption changes) to branch-2
> --
>
> Key: YARN-6076
> URL: https://issues.apache.org/jira/browse/YARN-6076
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-6076-branch-2.1.patch
>
>
> YARN-4752 was merged to trunk a while ago, and has been stable. Creating this 
> JIRA to merge it branch-2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3427) Remove deprecated methods from ResourceCalculatorProcessTree

2017-01-09 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3427:
---
Target Version/s: 3.0.0-beta1  (was: )
Priority: Blocker  (was: Major)
Hadoop Flags: Incompatible change

> Remove deprecated methods from ResourceCalculatorProcessTree
> 
>
> Key: YARN-3427
> URL: https://issues.apache.org/jira/browse/YARN-3427
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
>
> In 2.7, we made ResourceCalculatorProcessTree Public and exposed some 
> existing ill-formed methods as deprecated ones for use by Tez.
> We should remove it in 3.0.0, considering that the methods have been 
> deprecated for the all 2.x.y releases that it is marked Public in. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-453) Optimize FS internal data structures

2017-01-09 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-453:
--
Assignee: (was: Karthik Kambatla)

> Optimize FS internal data structures
> 
>
> Key: YARN-453
> URL: https://issues.apache.org/jira/browse/YARN-453
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Karthik Kambatla
>
> FS uses lists to store internal queues and sorts them on every heartbeat 
> leading to unnecessary scheduling overhead.
> Choice of better data structures should reduce the latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-809) Enable better parallelism in the Fair Scheduler

2017-01-09 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-809:
--
Assignee: (was: Karthik Kambatla)

> Enable better parallelism in the Fair Scheduler
> ---
>
> Key: YARN-809
> URL: https://issues.apache.org/jira/browse/YARN-809
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Sandy Ryza
>
> Currently, the Fair Scheduler is locked on pretty much every operation, node 
> updates, application additions and removals, every time the update thread 
> runs, and every time the RM queries it for information.  Most of this locking 
> is unnecessary, especially as only the core scheduling operations like 
> application additions, removals, and node updates need a consistent view of 
> scheduler state.
> We can probably increase parallelism by using concurrent data structures when 
> applicable, as well as keeping a slightly stale view to serve via the RM 
> APIs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4148) When killing app, RM releases app's resource before they are released by NM

2017-01-09 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813168#comment-15813168
 ] 

Junping Du commented on YARN-4148:
--

Yes, I agree that the two test failures are not related to the patch. Thanks 
[~jlowe] for reminding me. Committing it now.

> When killing app, RM releases app's resource before they are released by NM
> ---
>
> Key: YARN-4148
> URL: https://issues.apache.org/jira/browse/YARN-4148
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jason Lowe
> Attachments: YARN-4148.001.patch, YARN-4148.002.patch, 
> YARN-4148.003.patch, YARN-4148.wip.patch, 
> free_in_scheduler_but_not_node_prototype-branch-2.7.patch
>
>
> When killing a app, RM scheduler releases app's resource as soon as possible, 
> then it might allocate these resource for new requests. But NM have not 
> released them at that time.
> The problem was found when we supported GPU as a resource(YARN-4122).  Test 
> environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 
> GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. 
> But when B tried to start container on NM, NM found it didn't have 3 GPUs to 
> allocate because it had not released A's GPUs.
> I think the problem also exists for CPU/Memory. It might cause OOM when 
> memory is overused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads

2017-01-09 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813135#comment-15813135
 ] 

Yufei Gu commented on YARN-6061:


Had an offline discussion with [~kasha]. We agreed to enlarge the scope to yarn 
project. A new patch uploaded.

> Add a customized uncaughtexceptionhandler for critical threads
> --
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads

2017-01-09 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6061:
---
Attachment: YARN-6061.001.patch

> Add a customized uncaughtexceptionhandler for critical threads
> --
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6061.001.patch
>
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6061) Add a customized uncaughtexceptionhandler for critical thread

2017-01-09 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6061:
---
Summary: Add a customized uncaughtexceptionhandler for critical thread  
(was: Add a customized uncaughtexceptionhandler for fair scheduler)

> Add a customized uncaughtexceptionhandler for critical thread
> -
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6061) Add a customized uncaughtexceptionhandler for critical threads

2017-01-09 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6061:
---
Summary: Add a customized uncaughtexceptionhandler for critical threads  
(was: Add a customized uncaughtexceptionhandler for critical thread)

> Add a customized uncaughtexceptionhandler for critical threads
> --
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6061) Add a customized uncaughtexceptionhandler for fair scheduler

2017-01-09 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6061:
---
Component/s: (was: fairscheduler)

> Add a customized uncaughtexceptionhandler for fair scheduler
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6061) Add a customized uncaughtexceptionhandler for fair scheduler

2017-01-09 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6061:
---
Labels:   (was: fairscheduler)

> Add a customized uncaughtexceptionhandler for fair scheduler
> 
>
> Key: YARN-6061
> URL: https://issues.apache.org/jira/browse/YARN-6061
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, yarn
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>
> There are several threads in fair scheduler. The thread will quit when there 
> is a runtime exception inside it. We should bring down the RM when that 
> happens. Otherwise, there may be some weird behavior in RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4148) When killing app, RM releases app's resource before they are released by NM

2017-01-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813092#comment-15813092
 ] 

Jason Lowe commented on YARN-4148:
--

The unit test failures appear to be unrelated.  They pass for me locally with 
the patch applied, and there are JIRAs that are tracking those failures.  The 
TestDelegationTokenRenewer failure is being tracked by YARN-5816 and the 
TestRMRestart failure is tracked by YARN-5548.

Thanks for the review, [~djp]!  If you agree the failures are unrelated then 
feel free to commit, or I'll do so in a few days unless I hear otherwise.

> When killing app, RM releases app's resource before they are released by NM
> ---
>
> Key: YARN-4148
> URL: https://issues.apache.org/jira/browse/YARN-4148
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jason Lowe
> Attachments: YARN-4148.001.patch, YARN-4148.002.patch, 
> YARN-4148.003.patch, YARN-4148.wip.patch, 
> free_in_scheduler_but_not_node_prototype-branch-2.7.patch
>
>
> When killing a app, RM scheduler releases app's resource as soon as possible, 
> then it might allocate these resource for new requests. But NM have not 
> released them at that time.
> The problem was found when we supported GPU as a resource(YARN-4122).  Test 
> environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 
> GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. 
> But when B tried to start container on NM, NM found it didn't have 3 GPUs to 
> allocate because it had not released A's GPUs.
> I think the problem also exists for CPU/Memory. It might cause OOM when 
> memory is overused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6072) RM unable to start in secure mode

2017-01-09 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813047#comment-15813047
 ] 

Karthik Kambatla edited comment on YARN-6072 at 1/9/17 10:32 PM:
-

My vote would be to play it safe and fix it in 2.8.0. I am happy to review the 
changes.

[~ajithshetty] - if you are unable to get to this in the next couple of days, 
please let me know so I can pick it up. 


was (Author: kasha):
My vote would be to safe and fix it in 2.8.0. I am happy to review the changes.

[~ajithshetty] - if you are unable to get to this in the next couple of days, 
please let me know so I can pick it up. 

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> 

[jira] [Commented] (YARN-6072) RM unable to start in secure mode

2017-01-09 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813047#comment-15813047
 ] 

Karthik Kambatla commented on YARN-6072:


My vote would be to safe and fix it in 2.8.0. I am happy to review the changes.

[~ajithshetty] - if you are unable to get to this in the next couple of days, 
please let me know so I can pick it up. 

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts 

[jira] [Updated] (YARN-5976) Update hbase version to 1.2 (removes phoenix dependencies)

2017-01-09 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-5976:
-
Summary: Update hbase version to 1.2 (removes phoenix dependencies)  (was: 
Update hbase version to 1.2)

> Update hbase version to 1.2 (removes phoenix dependencies)
> --
>
> Key: YARN-5976
> URL: https://issues.apache.org/jira/browse/YARN-5976
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-5355-merge-blocker
> Fix For: 3.0.0-alpha2
>
> Attachments: YARN-5976-YARN-5355.004.patch, YARN-5976.001.wip.patch, 
> YARN-5976.002.wip.patch, YARN-5976.004.patch
>
>
> I believe phoenix now works with hbase 1.2. We should now upgrade timeline 
> service to use hbase 1.2 now. 
> And also update documentation in timelineservice to reflect that hbase mode 
> of all daemons in single jvm but writing to hdfs is supported. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6000) Make AllocationFileLoaderService.Listener public

2017-01-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812994#comment-15812994
 ] 

Sergey Shelukhin edited comment on YARN-6000 at 1/9/17 10:06 PM:
-

Thanks! As for Hive requirements, see the code snippet above. We are using the 
listener because that seems to be the only way to get the updated value out. We 
just need to get the allocConf that we use to get queuepolicy, and then get the 
queue


was (Author: sershe):
Thanks! As for Hive requirements, see the code snippet above. We are using the 
listener because that seems to be the only way to get the updated value out. We 
just need to get the allocConf/queue

> Make AllocationFileLoaderService.Listener public
> 
>
> Key: YARN-6000
> URL: https://issues.apache.org/jira/browse/YARN-6000
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Tao Jie
>Assignee: Tao Jie
> Fix For: 3.0.0-alpha2
>
> Attachments: YARN-6000.001.patch
>
>
> We removed public modifier of {{AllocationFileLoaderService.Listener}} in 
> YARN-4997 since it trigger a findbugs warning. However it breaks Hive code in 
> {{FairSchedulerShim}}. 
> {code}
> AllocationFileLoaderService allocsLoader = new AllocationFileLoaderService();
> allocsLoader.init(conf);
> allocsLoader.setReloadListener(new AllocationFileLoaderService.Listener() 
> {
>   @Override
>   public void onReload(AllocationConfiguration allocs) {
> allocConf.set(allocs);
>   }
> });
> try {
>   allocsLoader.reloadAllocations();
> } catch (Exception ex) {
>   throw new IOException("Failed to load queue allocations", ex);
> }
> if (allocConf.get() == null) {
>   allocConf.set(new AllocationConfiguration(conf));
> }
> QueuePlacementPolicy queuePolicy = allocConf.get().getPlacementPolicy();
> if (queuePolicy != null) {
>   requestedQueue = queuePolicy.assignAppToQueue(requestedQueue, userName);
> {code}
> As a result we should set the modifier back to public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6000) Make AllocationFileLoaderService.Listener public

2017-01-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812994#comment-15812994
 ] 

Sergey Shelukhin commented on YARN-6000:


Thanks! As for Hive requirements, see the code snippet above. We are using the 
listener because that seems to be the only way to get the updated value out. We 
just need to get the allocConf/queue

> Make AllocationFileLoaderService.Listener public
> 
>
> Key: YARN-6000
> URL: https://issues.apache.org/jira/browse/YARN-6000
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Tao Jie
>Assignee: Tao Jie
> Fix For: 3.0.0-alpha2
>
> Attachments: YARN-6000.001.patch
>
>
> We removed public modifier of {{AllocationFileLoaderService.Listener}} in 
> YARN-4997 since it trigger a findbugs warning. However it breaks Hive code in 
> {{FairSchedulerShim}}. 
> {code}
> AllocationFileLoaderService allocsLoader = new AllocationFileLoaderService();
> allocsLoader.init(conf);
> allocsLoader.setReloadListener(new AllocationFileLoaderService.Listener() 
> {
>   @Override
>   public void onReload(AllocationConfiguration allocs) {
> allocConf.set(allocs);
>   }
> });
> try {
>   allocsLoader.reloadAllocations();
> } catch (Exception ex) {
>   throw new IOException("Failed to load queue allocations", ex);
> }
> if (allocConf.get() == null) {
>   allocConf.set(new AllocationConfiguration(conf));
> }
> QueuePlacementPolicy queuePolicy = allocConf.get().getPlacementPolicy();
> if (queuePolicy != null) {
>   requestedQueue = queuePolicy.assignAppToQueue(requestedQueue, userName);
> {code}
> As a result we should set the modifier back to public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml

2017-01-09 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812907#comment-15812907
 ] 

Yufei Gu commented on YARN-4022:


Hi [~forrestchen], are you still work on this? If not, can I take it?

> queue not remove from webpage(/cluster/scheduler) when delete queue in 
> xxx-scheduler.xml
> 
>
> Key: YARN-4022
> URL: https://issues.apache.org/jira/browse/YARN-4022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: forrestchen
>  Labels: oct16-medium, scheduler
> Attachments: YARN-4022.001.patch, YARN-4022.002.patch, 
> YARN-4022.003.patch, YARN-4022.004.patch
>
>
> When I delete an existing queue by modify the xxx-schedule.xml, I can still 
> see the queue information block in webpage(/cluster/scheduler) though the 
> 'Min Resources' items all become to zero and have no item of 'Max Running 
> Applications'.
> I can still submit an application to the deleted queue and the application 
> will run using 'root.default' queue instead, but submit to an un-exist queue 
> will cause an exception.
> My expectation is the deleted queue will not displayed in webpage and submit 
> application to the deleted queue will act just like the queue doesn't exist.
> PS: There's no application running in the queue I delete.
> Some related config in yarn-site.xml:
> {code}
> 
> yarn.scheduler.fair.user-as-default-queue
> false
> 
> 
> yarn.scheduler.fair.allow-undeclared-pools
> false
> 
> {code}
> a related question is here: 
> http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6072) RM unable to start in secure mode

2017-01-09 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812903#comment-15812903
 ] 

Jian He commented on YARN-6072:
---

YARN-5709 actually affected the sequence of start. Before YARN-5709, 
ActiveStandbyElector is created inside AdminService, so it is guaranteed that 
the server variable is instantiated before ActiveStandbyElector is started. 
After YARN-5709, this is not the case any more.

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During 

[jira] [Commented] (YARN-6057) yarn.scheduler.minimum-allocation-* descriptions are incorrect about behavior when a request is out of bounds

2017-01-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812832#comment-15812832
 ] 

Hadoop QA commented on YARN-6057:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
30s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6057 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846407/YARN-6057.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  |
| uname | Linux 0ed77c2a909e 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 91bf504 |
| Default Java | 1.8.0_111 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14610/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14610/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> yarn.scheduler.minimum-allocation-* descriptions are incorrect about behavior 
> when a request is out of bounds
> -
>
> Key: YARN-6057
> URL: https://issues.apache.org/jira/browse/YARN-6057
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Julia Sommer
>Priority: Minor
> Attachments: YARN-6057.001.patch, YARN-6057.002.patch
>
>
> {code}
>   
> The minimum allocation for every container request at the RM,
> in terms of virtual CPU cores. Requests lower than this will throw a
> InvalidResourceRequestException.
> yarn.scheduler.minimum-allocation-vcores
> 1
>   
> {code}
> *Requests lower than this will throw a   

[jira] [Updated] (YARN-6057) yarn.scheduler.minimum-allocation-* descriptions are incorrect about behavior when a request is out of bounds

2017-01-09 Thread Julia Sommer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julia Sommer updated YARN-6057:
---
Attachment: YARN-6057.002.patch

Corrected version attached. Thanks!

> yarn.scheduler.minimum-allocation-* descriptions are incorrect about behavior 
> when a request is out of bounds
> -
>
> Key: YARN-6057
> URL: https://issues.apache.org/jira/browse/YARN-6057
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Julia Sommer
>Priority: Minor
> Attachments: YARN-6057.001.patch, YARN-6057.002.patch
>
>
> {code}
>   
> The minimum allocation for every container request at the RM,
> in terms of virtual CPU cores. Requests lower than this will throw a
> InvalidResourceRequestException.
> yarn.scheduler.minimum-allocation-vcores
> 1
>   
> {code}
> *Requests lower than this will throw a   InvalidResourceRequestException.* 
> Only incase of maximum allocation vcore and memory 
> InvalidResourceRequestException is thrown



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6077) /bin/bash path is hardcoded in node manager

2017-01-09 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-6077:
-
Description: 
We need a configuration entry similar to MRJobConfig.MAPRED_ADMIN_USER_SHELL to 
support multiple environments like FreeBSD.


  was:
There should be a configuration similar to MRJobConfig.MAPRED_ADMIN_USER_SHELL



> /bin/bash path is hardcoded in node manager
> ---
>
> Key: YARN-6077
> URL: https://issues.apache.org/jira/browse/YARN-6077
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>
> We need a configuration entry similar to MRJobConfig.MAPRED_ADMIN_USER_SHELL 
> to support multiple environments like FreeBSD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-01-09 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812737#comment-15812737
 ] 

Eric Payne commented on YARN-5889:
--

[~sunilg],
bq. However, I do think {{reComputeUserLimits}} needs to be modified to update 
{{preComputedUserLimit}} when recomputing user limits. That is not done 
anywhare in the current patch.
Actually, we may want to have separate methods for {{reComputeUserLimits}} and 
{{reComputeActiveUserLimits}}

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.v0.patch, YARN-5889.v1.patch, 
> YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6077) /bin/bash path is hardcoded in node manager

2017-01-09 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812693#comment-15812693
 ] 

Miklos Szegedi commented on YARN-6077:
--

This bug was opened based on the discussion with [~aw] in YARN-6060.

> /bin/bash path is hardcoded in node manager
> ---
>
> Key: YARN-6077
> URL: https://issues.apache.org/jira/browse/YARN-6077
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>
> There should be a configuration similar to MRJobConfig.MAPRED_ADMIN_USER_SHELL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec

2017-01-09 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812689#comment-15812689
 ] 

Miklos Szegedi commented on YARN-6060:
--

I opened HADOOP-13963 and YARN-6077. Since I am only familiar with Yarn, do you 
mind, if I pick YARN-6077?

> Linux container executor fails to run container on directories mounted as 
> noexec
> 
>
> Key: YARN-6060
> URL: https://issues.apache.org/jira/browse/YARN-6060
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Reporter: Miklos Szegedi
> Attachments: YARN-6060.000.patch, YARN-6060.001.patch
>
>
> If node manager directories are mounted as noexec, LCE fails with the 
> following error:
> Launching container...
> Couldn't execute the container launch file 
> /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh
>  - Permission denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-09 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812676#comment-15812676
 ] 

Jian He commented on YARN-5995:
---

Actually, it can be most useful if this is a time series metrics that can be 
used by external framework to show the metrics over time - to show when RM 
incurs high write latencies, as we always do postmortem analysis. 

If so, we can merely output the absolute value of 'time cost for each store op' 
or 'amount of data written for each op',  external tool can use this metrics to 
plot metrics over time.  

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6057) yarn.scheduler.minimum-allocation-* descriptions are incorrect about behavior when a request is out of bounds

2017-01-09 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-6057:
---
Summary: yarn.scheduler.minimum-allocation-* descriptions are incorrect 
about behavior when a request is out of bounds  (was: 
yarn.scheduler.minimum-allocation-* and yarn.scheduler.maximum-allocation-* 
descriptions are incorrect about behavior when a request is out of bounds)

> yarn.scheduler.minimum-allocation-* descriptions are incorrect about behavior 
> when a request is out of bounds
> -
>
> Key: YARN-6057
> URL: https://issues.apache.org/jira/browse/YARN-6057
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Julia Sommer
>Priority: Minor
> Attachments: YARN-6057.001.patch
>
>
> {code}
>   
> The minimum allocation for every container request at the RM,
> in terms of virtual CPU cores. Requests lower than this will throw a
> InvalidResourceRequestException.
> yarn.scheduler.minimum-allocation-vcores
> 1
>   
> {code}
> *Requests lower than this will throw a   InvalidResourceRequestException.* 
> Only incase of maximum allocation vcore and memory 
> InvalidResourceRequestException is thrown



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6077) /bin/bash path is hardcoded in node manager

2017-01-09 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6077:


 Summary: /bin/bash path is hardcoded in node manager
 Key: YARN-6077
 URL: https://issues.apache.org/jira/browse/YARN-6077
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


There should be a configuration similar to MRJobConfig.MAPRED_ADMIN_USER_SHELL




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6057) yarn.scheduler.minimum-allocation-* and yarn.scheduler.maximum-allocation-* descriptions are incorrect about behavior when a request is out of bounds

2017-01-09 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812637#comment-15812637
 ] 

Daniel Templeton commented on YARN-6057:


That's my bad.  I foolishly assumed that the max and min would be used 
symmetrically and didn't track down use of the max value independently. :)  
[~Juliasommer], can you please drop the changes to the maximum values and post 
a new patch?

> yarn.scheduler.minimum-allocation-* and yarn.scheduler.maximum-allocation-* 
> descriptions are incorrect about behavior when a request is out of bounds
> -
>
> Key: YARN-6057
> URL: https://issues.apache.org/jira/browse/YARN-6057
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Julia Sommer
>Priority: Minor
> Attachments: YARN-6057.001.patch
>
>
> {code}
>   
> The minimum allocation for every container request at the RM,
> in terms of virtual CPU cores. Requests lower than this will throw a
> InvalidResourceRequestException.
> yarn.scheduler.minimum-allocation-vcores
> 1
>   
> {code}
> *Requests lower than this will throw a   InvalidResourceRequestException.* 
> Only incase of maximum allocation vcore and memory 
> InvalidResourceRequestException is thrown



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6072) RM unable to start in secure mode

2017-01-09 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812626#comment-15812626
 ] 

Naganarasimha G R commented on YARN-6072:
-

+1 to unblock for 2.8.0

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and 
> invokes  {{AdminService#refreshAll()}} but {{AdminService#serviceStart()}} 
> happens after {{ActiveStandbyElectorBasedElectorService}} service start is 
> 

[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-09 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812620#comment-15812620
 ] 

Jian He commented on YARN-5995:
---

As said earlier, read won't be that useful as it only happens on RM start up to 
load the data. It's a one-time value which dos not require metrics.

IMO, We need to think about how the metrics can actually be used for 
performance analysis, that is, how much impact the store operation can affect 
RM's execution, i.e. how much delay it can incur. Metrics like data written per 
sec looks more like measuring ZK throughput which may not be that useful. 
I think what we need is to surface the time spent on write operation. With that 
in mind, we may have 
1) a Histogram for the time spent for each write op ? 
2) total no of write operations 


> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5937) stop-yarn.sh is not able to gracefully stop node managers

2017-01-09 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812614#comment-15812614
 ] 

Naganarasimha G R commented on YARN-5937:
-

Thanks [~cheersyang],
Sorry for the delay just verified the trunk code, it was happening due to my 
trunk code, your approach is fine will commit it shortly.

> stop-yarn.sh is not able to gracefully stop node managers
> -
>
> Key: YARN-5937
> URL: https://issues.apache.org/jira/browse/YARN-5937
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>  Labels: script
> Attachments: YARN-5937.01.patch, nm_shutdown.log
>
>
> stop-yarn.sh always gives following output
> {code}
> ./sbin/stop-yarn.sh
> Stopping resourcemanager
> Stopping nodemanagers
> : WARNING: nodemanager did not stop gracefully after 5 seconds: 
> Trying to kill with kill -9
> : ERROR: Unable to kill 18097
> {code}
> this was because resource manager is stopped before node managers, when the 
> shutdown hook manager tries to gracefully stop NM services, NM needs to 
> unregister with RM, and it gets timeout as NM could not connect to RM 
> (already stopped). See log (stop RM then run kill )
> {code}
> 16/11/28 08:26:43 ERROR nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM
> ...
> 16/11/28 08:26:53 WARN util.ShutdownHookManager: ShutdownHook 
> 'CompositeServiceShutdownHook' timeout, java.util.concurrent.TimeoutException
> java.util.concurrent.TimeoutException
>   at java.util.concurrent.FutureTask.get(FutureTask.java:205)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67)
> ...
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:291)
> ...
> 16/11/28 08:27:13 ERROR util.ShutdownHookManager: ShutdownHookManger shutdown 
> forcefully.
> {code}
> the shutdown hooker has a default of 10s timeout, so if RM is stopped before 
> NMs, they always took more than 10s to stop (in java code). However 
> stop-yarn.sh only gives 5s timeout, so NM is always killed instead of stopped.
> It would make sense to stop NMs before RMs in this script, in a graceful way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6076) Backport YARN-4752 (FS preemption changes) to branch-2

2017-01-09 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-6076:
--

 Summary: Backport YARN-4752 (FS preemption changes) to branch-2
 Key: YARN-6076
 URL: https://issues.apache.org/jira/browse/YARN-6076
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.8.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


YARN-4752 was merged to trunk a while ago, and has been stable. Creating this 
JIRA to merge it branch-2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5556) Support for deleting queues without requiring a RM restart

2017-01-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812579#comment-15812579
 ] 

Hadoop QA commented on YARN-5556:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 22s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 6 new + 207 unchanged - 2 fixed = 213 total (was 209) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 42m 21s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m 18s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5556 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846311/YARN-5556.v2.005.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 452e00e08ed0 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 287d3d6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/14608/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/14608/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14608/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 

[jira] [Commented] (YARN-6072) RM unable to start in secure mode

2017-01-09 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812546#comment-15812546
 ] 

Junping Du commented on YARN-6072:
--

bq. YARN-5333 and YARN-5988 are not available in 2.8.0. So issue shouldn't 
happen in 2.8.0
I see. Sounds like YARN-5333 is the root cause. However, someone said YARN-5709 
could be related, but from my quick check, it doesn't affect sequence of 
service start. [~ka...@cloudera.com] and [~jianhe], can you confirm YARN-5709 
is not related? If so, we can drop 2.8.0 from affected version and target 
version to unblock our 2.8.0 RC.

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> 

[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-09 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812539#comment-15812539
 ] 

zhangyubiao commented on YARN-5995:
---

Thanks. [~sunilg]

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5188) FairScheduler performance bug

2017-01-09 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812533#comment-15812533
 ] 

zhangyubiao edited comment on YARN-5188 at 1/9/17 6:56 PM:
---

[~chenfolin], It is ok for me to take over this JIRA?


was (Author: piaoyu zhang):
@ChenFolin it is ok for me to take over this JIRA?

> FairScheduler performance bug
> -
>
> Key: YARN-5188
> URL: https://issues.apache.org/jira/browse/YARN-5188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.5.0
>Reporter: ChenFolin
> Attachments: YARN-5188-1.patch
>
>
>  My Hadoop Cluster has recently encountered a performance problem. Details as 
> Follows.
> There are two point which can cause this performance issue.
> 1: application sort before assign container at FSLeafQueue. TreeSet is not 
> the best, Why not keep orderly ? and then we can use binary search to help 
> keep orderly when a application's resource usage has changed.
> 2: queue sort and assignContainerPreCheck will lead to compute all leafqueue 
> resource usage ,Why can we store the leafqueue usage at memory and update it 
> when assign container op release container happen?
>
>The efficiency of assign container in the Resourcemanager may fall 
> when the number of running and pending application grows. And the fact is the 
> cluster has too many PendingMB or PengdingVcore , and the Cluster 
> current utilization rate may below 20%.
>I checked the resourcemanager logs, I found that every assign 
> container may cost 5 ~ 10 ms, but just 0 ~ 1 ms at usual time.
>  
>I use TestFairScheduler to reproduce the scene:
>  
>Just one queue: root.defalut
>  10240 apps.
>  
>assign container avg time:  6753.9 us ( 6.7539 ms)  
>  apps sort time (FSLeafQueue : Collections.sort(runnableApps, 
> comparator); ): 4657.01 us ( 4.657 ms )
>  compute LeafQueue Resource usage : 905.171 us ( 0.905171 ms )
>  
>  When just root.default, one assign container op contains : ( one apps 
> sort op ) + 2 * ( compute leafqueue usage op )
>According to the above situation, I think the assign container op has 
> a performance problem  . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5188) FairScheduler performance bug

2017-01-09 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812533#comment-15812533
 ] 

zhangyubiao commented on YARN-5188:
---

@ChenFolin it is ok for me to take over this JIRA?

> FairScheduler performance bug
> -
>
> Key: YARN-5188
> URL: https://issues.apache.org/jira/browse/YARN-5188
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.5.0
>Reporter: ChenFolin
> Attachments: YARN-5188-1.patch
>
>
>  My Hadoop Cluster has recently encountered a performance problem. Details as 
> Follows.
> There are two point which can cause this performance issue.
> 1: application sort before assign container at FSLeafQueue. TreeSet is not 
> the best, Why not keep orderly ? and then we can use binary search to help 
> keep orderly when a application's resource usage has changed.
> 2: queue sort and assignContainerPreCheck will lead to compute all leafqueue 
> resource usage ,Why can we store the leafqueue usage at memory and update it 
> when assign container op release container happen?
>
>The efficiency of assign container in the Resourcemanager may fall 
> when the number of running and pending application grows. And the fact is the 
> cluster has too many PendingMB or PengdingVcore , and the Cluster 
> current utilization rate may below 20%.
>I checked the resourcemanager logs, I found that every assign 
> container may cost 5 ~ 10 ms, but just 0 ~ 1 ms at usual time.
>  
>I use TestFairScheduler to reproduce the scene:
>  
>Just one queue: root.defalut
>  10240 apps.
>  
>assign container avg time:  6753.9 us ( 6.7539 ms)  
>  apps sort time (FSLeafQueue : Collections.sort(runnableApps, 
> comparator); ): 4657.01 us ( 4.657 ms )
>  compute LeafQueue Resource usage : 905.171 us ( 0.905171 ms )
>  
>  When just root.default, one assign container op contains : ( one apps 
> sort op ) + 2 * ( compute leafqueue usage op )
>According to the above situation, I think the assign container op has 
> a performance problem  . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5980) Update documentation for single node hbase deploy

2017-01-09 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812517#comment-15812517
 ] 

Vrushali C commented on YARN-5980:
--

Thanks [~varun_saxena] and [~sjlee0], yes I will update the patch. It's out 
dated actually, since we now use 1.2.4 of HBase. I will upload another patch 
shortly. 

> Update documentation for single node hbase deploy
> -
>
> Key: YARN-5980
> URL: https://issues.apache.org/jira/browse/YARN-5980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-5980.001.patch
>
>
> Per HBASE-17272, a single node hbase deployment (single jvm running daemons + 
> hdfs writes) will be added to hbase shortly. 
> We should update the timeline service documentation in the setup/deployment 
> context accordingly, this will help users who are a bit wary of hbase 
> deployments help get started with timeline service more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.

2017-01-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812498#comment-15812498
 ] 

Hadoop QA commented on YARN-6054:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  2m 43s{color} 
| {color:red} hadoop-yarn-server-applicationhistoryservice in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.timeline.webapp.TestTimelineWebServices |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6054 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846380/YARN-6054.03.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux a7d68c595185 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 287d3d6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/14609/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14609/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14609/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> TimelineServer fails to start when some LevelDb state files are missing.
> 
>
> Key: 

[jira] [Commented] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec

2017-01-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812488#comment-15812488
 ] 

Allen Wittenauer commented on YARN-6060:


bq. should we create a JIRA to make /bin/bash configurable in 
UnixShellScriptBuilder?

Yes, please. /bin/bash is hard-coded in a few places in the Java code and they 
should all be changed to either be /usr/bin/env bash or pull from a system 
property/configuration entry so that the shell code can define where exactly 
bash is located on startup.

> Linux container executor fails to run container on directories mounted as 
> noexec
> 
>
> Key: YARN-6060
> URL: https://issues.apache.org/jira/browse/YARN-6060
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Reporter: Miklos Szegedi
> Attachments: YARN-6060.000.patch, YARN-6060.001.patch
>
>
> If node manager directories are mounted as noexec, LCE fails with the 
> following error:
> Launching container...
> Couldn't execute the container launch file 
> /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh
>  - Permission denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.

2017-01-09 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812459#comment-15812459
 ] 

Naganarasimha G R commented on YARN-6054:
-

Thanks for the patch [~raviprakashu], 
bq. Also, as pointed out by Jason, (e.g. in the case of NM) graceful 
degradation would be a very hard thing to achieve. More likely, the state is 
corrupt and will cause undefined behavior.
Agree, but may be we can give some kind of tool and set of steps which can be 
taken to over come it as we too faced it once.  but agree its not within this 
jira's scope !
Changes look good enough will wait for the jenkins report and if no further 
comments will commit it tomorrow !

> TimelineServer fails to start when some LevelDb state files are missing.
> 
>
> Key: YARN-6054
> URL: https://issues.apache.org/jira/browse/YARN-6054
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: YARN-6054.01.patch, YARN-6054.02.patch, 
> YARN-6054.03.patch
>
>
> We encountered an issue recently where the TimelineServer failed to start 
> because some state files went missing.
> {code}
> 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
>  failed in state INITED
> ; cause: org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: /timelines
> erver/leveldb-timeline-store.ldb/127897.sst
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: /timelineserver/lev
> eldb-timeline-store.ldb/127897.sst
> 2016-11-21 20:46:43,135 FATAL 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
>  Error starting ApplicationHistoryServer
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: 
> /timelineserver/leveldb-timeline-store.ldb/127897.sst
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182)
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: 
> Corruption: 9 missing files; e.g.: 
> /timelineserver/leveldb-timeline-store.ldb/127897.sst
> at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
> at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
> at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status -1
> {code}
> Ideally we shouldn't have any missing state files. However I'd posit that the 
> TimelineServer should have graceful degradation instead of failing to start 
> at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec

2017-01-09 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi resolved YARN-6060.
--
Resolution: Won't Fix
  Assignee: (was: Miklos Szegedi)

> Linux container executor fails to run container on directories mounted as 
> noexec
> 
>
> Key: YARN-6060
> URL: https://issues.apache.org/jira/browse/YARN-6060
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Reporter: Miklos Szegedi
> Attachments: YARN-6060.000.patch, YARN-6060.001.patch
>
>
> If node manager directories are mounted as noexec, LCE fails with the 
> following error:
> Launching container...
> Couldn't execute the container launch file 
> /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh
>  - Permission denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-01-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812451#comment-15812451
 ] 

Wangda Tan commented on YARN-5889:
--

bq. Let's please do these as separate JIRAs. We are extremely anxious to move 
this JIRA forward since it is blocking YARN-2113 (user limit-based intra-queue 
preemption).
I understand, however I prefer to do the refactoring together with the patch, 
If we don't actively do refactoring to make a clean code structure with major 
behavior changes, it will cause a lot of trouble to maintain the code and add 
new functionalities. I'm OK with moving other changes like partition-related 
changes to a separate JIRA if it needs considerable effort. 

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.v0.patch, YARN-5889.v1.patch, 
> YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.

2017-01-09 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated YARN-6054:
---
Attachment: YARN-6054.03.patch

Here's a patch with the improvements suggested by Naganarasimha.

> TimelineServer fails to start when some LevelDb state files are missing.
> 
>
> Key: YARN-6054
> URL: https://issues.apache.org/jira/browse/YARN-6054
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: YARN-6054.01.patch, YARN-6054.02.patch, 
> YARN-6054.03.patch
>
>
> We encountered an issue recently where the TimelineServer failed to start 
> because some state files went missing.
> {code}
> 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
>  failed in state INITED
> ; cause: org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: /timelines
> erver/leveldb-timeline-store.ldb/127897.sst
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: /timelineserver/lev
> eldb-timeline-store.ldb/127897.sst
> 2016-11-21 20:46:43,135 FATAL 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
>  Error starting ApplicationHistoryServer
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: 
> /timelineserver/leveldb-timeline-store.ldb/127897.sst
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182)
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: 
> Corruption: 9 missing files; e.g.: 
> /timelineserver/leveldb-timeline-store.ldb/127897.sst
> at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
> at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
> at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status -1
> {code}
> Ideally we shouldn't have any missing state files. However I'd posit that the 
> TimelineServer should have graceful degradation instead of failing to start 
> at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.

2017-01-09 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812432#comment-15812432
 ] 

Ravi Prakash commented on YARN-6054:


Thanks Naganarasimha for your careful review! As I posted in the first comment, 
the repair did indeed fix the issue for us (we had a production incident.) As 
I'm sure you'll understand, we can't post the leveldb files in the open source.
# I feel this JIRA is very specific to the TimelineServer so I am hesitant to 
include other daemons. Also, as pointed out by Jason, (e.g. in the case of NM) 
graceful degradation would be a very hard thing to achieve. More likely, the 
state is corrupt and will cause undefined behavior.
# Fair point. Will do.
# Great idea. Will do.

> TimelineServer fails to start when some LevelDb state files are missing.
> 
>
> Key: YARN-6054
> URL: https://issues.apache.org/jira/browse/YARN-6054
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: YARN-6054.01.patch, YARN-6054.02.patch
>
>
> We encountered an issue recently where the TimelineServer failed to start 
> because some state files went missing.
> {code}
> 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
>  failed in state INITED
> ; cause: org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: /timelines
> erver/leveldb-timeline-store.ldb/127897.sst
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: /timelineserver/lev
> eldb-timeline-store.ldb/127897.sst
> 2016-11-21 20:46:43,135 FATAL 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
>  Error starting ApplicationHistoryServer
> org.apache.hadoop.service.ServiceStateException: 
> org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 
> missing files; e.g.: 
> /timelineserver/leveldb-timeline-store.ldb/127897.sst
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182)
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: 
> Corruption: 9 missing files; e.g.: 
> /timelineserver/leveldb-timeline-store.ldb/127897.sst
> at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
> at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
> at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
> at 
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status -1
> {code}
> Ideally we shouldn't have any missing state files. However I'd posit that the 
> TimelineServer should have graceful degradation instead of failing to start 
> at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec

2017-01-09 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812382#comment-15812382
 ] 

Miklos Szegedi commented on YARN-6060:
--

Thank you, [~vvasudev], [~aw] and [~templedf] for the comments. Since the error 
is a configuration error, I agree we should cancel the patch.
[~aw], should we create a JIRA to make /bin/bash configurable in 
UnixShellScriptBuilder?

> Linux container executor fails to run container on directories mounted as 
> noexec
> 
>
> Key: YARN-6060
> URL: https://issues.apache.org/jira/browse/YARN-6060
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-6060.000.patch, YARN-6060.001.patch
>
>
> If node manager directories are mounted as noexec, LCE fails with the 
> following error:
> Launching container...
> Couldn't execute the container launch file 
> /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh
>  - Permission denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6064) Support fromId for flowRuns and flow/flowRun apps REST API's

2017-01-09 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812379#comment-15812379
 ] 

Sangjin Lee commented on YARN-6064:
---

Thanks [~rohithsharma] for the patch!

(ApplicationRowKeyPrefix.java)
- I'm not sure if this change is necessary. By providing the app id, we're 
specifying a complete row key, not a partial prefix. And I think adding this 
constructor for the prefix makes things somewhat confusing. We should simply 
use the {{ApplicationRowKey}} instance instead to handle the fromId.

(ApplicationEntityReader.java)
- as mentioned above, for the start row let's use something like this:
{code}
  ApplicationRowKey startRow = new ApplicationRowKey(
  context.getClusterId(), context.getUserId(), context.getFlowName(),
  flowRunId, getFilters().getFromId());

  // set start row
  scan.setStartRow(startRow.getRowKey());
{code}

(FlowRunRowKeyPrefix.java)
- the same comment applies here

(FlowRunEntityReader.java)
- the same comment applies here

In addition, I think many of the checkstyle, javadoc, and whitespace issues are 
actionable and easy to fix, including the following:
{noformat}
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1005:
  verifyFlowEntites(client, uri, 3, new int[] { 3, 2, 1 },:52: '{' is 
followed by whitespace.
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1006:
  new String[] { "flow1", "flow_name", "flow_name2" });:25: '{' is 
followed by whitespace.
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1011:
  verifyFlowEntites(client, uri, 3, new int[] { 3, 2, 1 },:52: '{' is 
followed by whitespace.
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1012:
  new String[] { "flow1", "flow_name", "flow_name2" });:25: '{' is 
followed by whitespace.
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1016:
  verifyFlowEntites(client, uri, 1, new int[] { 3 },:52: '{' is followed by 
whitespace.
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1017:
  new String[] { "flow1" });:25: '{' is followed by whitespace.
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1027:
  verifyFlowEntites(client, uri, 3, new int[] { 3, 2, 1 },:52: '{' is 
followed by whitespace.
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1028:
  new String[] { "flow1", "flow_name", "flow_name2" });:25: '{' is 
followed by whitespace.
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1038:
  verifyFlowEntites(client, uri, 3, new int[] { 3, 2, 1 },:52: '{' is 
followed by whitespace.
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1039:
  new String[] { "flow1", "flow_name", "flow_name2" });:25: '{' is 
followed by whitespace.
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1044:
  verifyFlowEntites(client, uri, 3, new int[] { 3, 2, 1 },:52: '{' is 
followed by whitespace.
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderWebServicesHBaseStorage.java:1045:
  new String[] { "flow1", "flow_name", "flow_name2" 

[jira] [Commented] (YARN-6072) RM unable to start in secure mode

2017-01-09 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812321#comment-15812321
 ] 

Bibin A Chundatt commented on YARN-6072:


+1 from my side too. Tried the same on my local cluster seems to be working 
fine.

[~ajithshetty]
In addition to changing order please do update logging and exception thrown. 
Currently we are losing trace.

{code}
@@ -708,7 +708,7 @@ void refreshAll() throws ServiceFailedException {
   }
   refreshClusterMaxPriority();
 } catch (Exception ex) {
+ LOG.error(ex);
-  throw new ServiceFailedException(ex.getMessage());
+  throw new ServiceFailedException(ex.getMessage(), ex);
 }

{code}



> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> 

[jira] [Commented] (YARN-6074) FlowRunEntity does not deserialize long values correctly

2017-01-09 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812308#comment-15812308
 ] 

Sangjin Lee commented on YARN-6074:
---

Thanks for catching this [~rohithsharma].

> FlowRunEntity does not deserialize long values correctly
> 
>
> Key: YARN-6074
> URL: https://issues.apache.org/jira/browse/YARN-6074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Fix For: 3.0.0-alpha2, YARN-5355, YARN-5355-branch-2
>
> Attachments: YARN-6074.patch
>
>
> I see that FlowRunEntity methods *getRunId()* and *getMaxEndTime()* does not 
> deserialize in efficient way which causes class cast exception based on the 
> number.
> {code}
>   public long getRunId() {
> Object runId = getInfo().get(FLOW_RUN_ID_INFO_KEY);
> return runId == null ? 0L : (Long) runId;
>   }
> {code} 
> and 
> {code}
>   public long getMaxEndTime() {
> Object time = getInfo().get(FLOW_RUN_END_TIME);
> return time == null ? 0L : (Long)time;
>   }
> {code} 
> The reason for class caste exception is Json has data type Number which 
> includes all java primitive types. So, if number with in the range of Integer 
> max, then Object is converted to Integer which fails  to type cast to Long. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-01-09 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812251#comment-15812251
 ] 

Eric Payne commented on YARN-5889:
--

Thanks [~leftnoteasy] for your review and comments.

bq. If everybody agrees with approach in my comment
I agree. I think [~jlowe] summed it up well when he said your "... proposal ... 
preserves the FIFO/priority behavior at least up through usage < MULP and then 
becomes fair once a user is beyond MULP."

{quote}
And in addition, we can add a UsersManager to LQ
...
And for last, we can look at logics for #active-users per partition, now we 
have an identical #active-users for all partition
{quote}
Let's please do these as separate JIRAs. We are extremely anxious to move this 
JIRA forward since it is blocking YARN-2113 (user limit-based intra-queue 
preemption).

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.v0.patch, YARN-5889.v1.patch, 
> YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3955) Support for application priority ACLs in queues of CapacityScheduler

2017-01-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812228#comment-15812228
 ] 

Hudson commented on YARN-3955:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #11090 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11090/])
YARN-3955. Support for application priority ACLs in queues of (wangda: rev 
287d3d6804a869723ae36605a3c2d2b3eae3941e)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/AppPriorityACLsManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AccessType.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ACLsTestBase.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AppPriorityACLGroup.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerQueueManager.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriorityACLs.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriority.java
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf/capacity-scheduler.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AppPriorityACLConfigurationParser.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationPriorityACLConfiguration.java


> Support for application priority ACLs in queues of CapacityScheduler
> 
>
> Key: YARN-3955
> URL: https://issues.apache.org/jira/browse/YARN-3955
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.9.0, 

[jira] [Commented] (YARN-6062) nodemanager memory leak

2017-01-09 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812206#comment-15812206
 ] 

Bibin A Chundatt commented on YARN-6062:


[~gehaijiang]
Could you provide java version used

> nodemanager memory leak
> ---
>
> Key: YARN-6062
> URL: https://issues.apache.org/jira/browse/YARN-6062
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: gehaijiang
> Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt
>
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  8986 data  20   0 21.3g  19g 7376 S  5.5 20.7   2458:09 java
> 38432 data  20   0  9.8g 7.9g 6300 S 95.5  8.4  35273:23 java
>  6653 data  20   0 4558m 3.4g  10m S  9.2  3.6   6640:37 java
> $ jps
> 6653 NodeManager
> Nodemanager memory has been up,Reach  10G。
> nodemanager   yarn-env.sh  configure  (2G)
> YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m 
> -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps 
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3955) Support for application priority ACLs in queues of CapacityScheduler

2017-01-09 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3955:
-
Summary: Support for application priority ACLs in queues of 
CapacityScheduler  (was: Support for priority ACLs in CapacityScheduler)

> Support for application priority ACLs in queues of CapacityScheduler
> 
>
> Key: YARN-3955
> URL: https://issues.apache.org/jira/browse/YARN-3955
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: ApplicationPriority-ACL.pdf, 
> ApplicationPriority-ACLs-v2.pdf, YARN-3955.0001.patch, YARN-3955.0002.patch, 
> YARN-3955.0003.patch, YARN-3955.0004.patch, YARN-3955.0005.patch, 
> YARN-3955.0006.patch, YARN-3955.0007.patch, YARN-3955.0008.patch, 
> YARN-3955.0009.patch, YARN-3955.0010.patch, YARN-3955.v0.patch, 
> YARN-3955.v1.patch, YARN-3955.wip1.patch
>
>
> Support will be added for User-level access permission to use different 
> application-priorities. This is to avoid situations where all users try 
> running max priority in the cluster and thus degrading the value of 
> priorities.
> Access Control Lists can be set per priority level within each queue. Below 
> is an example configuration that can be added in capacity scheduler 
> configuration
> file for each Queue level.
> yarn.scheduler.capacity.root...acl=user1,user2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-01-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812187#comment-15812187
 ] 

Wangda Tan commented on YARN-5889:
--

Thanks [~sunilg] for providing the patch and reviews from [~eepayne].

Quickly scanned the patch,

If everybody agrees with approach in my comment: 
https://issues.apache.org/jira/browse/YARN-5889?focusedCommentId=15749129=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15749129

{code} 
active-user-limit = max(
   resource-used-by-active-users / #active-users,
   queue-capacity * MULP
)
{code} 

We can:
1) Update {{resource-used-by-active-users}} when any user enters/exits set of 
active users.
2) Update #active-users (using approaches like ActiveUsersManager approach)

Both of #1/#2 can be done within the same thread, and since we will have an 
identical UL for all the active users, I think this can be done without adding 
a new thread. Please correct me if I'm wrong.

It looks like Sunil and Eric have different implementation suggestions, I have 
looked at details of the two approaches. However I think the goals are:
a. Can always get up-to-date UL
b. Avoid re-computation of UL as much as possible.
Any approach can achieve above goals is good to me.

And in addition, we can add a UsersManager to LQ to manager all user-related 
information, such as user-limit / active-user-limit / #active-users. 
Ideally, LQ/PCPP should get activeUserLimit / userLimit from UsersManager

And for last, we can look at logics for #active-users per partition, now we 
have an identical #active-users for all partition, which causes some problems. 
Since we will make a major changes to logics around this, it could be a good 
chance to fix the problem as well.

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.v0.patch, YARN-5889.v1.patch, 
> YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6017) node manager physical memory leak

2017-01-09 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812167#comment-15812167
 ] 

Bibin A Chundatt edited comment on YARN-6017 at 1/9/17 4:36 PM:


[~chenrongwei]

This could be because of JDK bug. Found the following link 
[JDK-8054841|http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8054841] 
during reading.
As per the defect description {{ProcessBuilder leaks native memory}} . 
{{Shell.java}} do use {{ProcessBuilder}} for all child process launch
{quote}
For the Oracle JDK, it does not appear to be present in 7u45, but does appear 
to be present in 7u55.
{quote}
Looks like the issue is available in 7u65 . If possible could you try changing 
JRE version. 

[GITHUB|https://github.com/cprice404/jvm-processbuilder-leak/blob/master/README.md]
{quote}
For Oracle JDK, the leak does not appear to be present in 7u45, but does appear 
to be present in 7u55. (I believe, though I have less data, that the leak is 
not present in Oracle 7u51. The leak definitely appears to be present in Oracle 
7u65 and 7u67 as well.) For OpenJDK, the leak does not appear to be present in 
7u55, but does appear to be present in 7u65.
{quote}


was (Author: bibinchundatt):
[~chenrongwei]

This could be because of JDK bug. Found the following link 
[JDK-8054841|http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8054841] 
during reading.
As per the defect description {{ProcessBuilder leaks native memory}} . 
{{Shell.java}} do use {{ProcessBuilder}} for all child process launch
{quote}
For the Oracle JDK, it does not appear to be present in 7u45, but does appear 
to be present in 7u55.
{quote}
Looks like the issue is available in 7u65 . If possible could you try changing 
JRE version. 

> node manager physical memory leak
> -
>
> Key: YARN-6017
> URL: https://issues.apache.org/jira/browse/YARN-6017
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
> Environment: OS:
> Linux guomai124041 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 
> x86_64 x86_64 x86_64 GNU/Linux
> jvm:
> java version "1.7.0_65"
> Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
>Reporter: chenrongwei
> Attachments: 31169_smaps.txt, 31169_smaps.txt
>
>
> In our produce environment, node manager's jvm memory has been set to 
> '-Xmx2048m',but we notice that after a long time running the process' actual 
> physical memory size had been reached to 12g (we got this value by top 
> command as follow).
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 31169 data  20   0 13.2g  12g 6092 S 16.9 13.0  49183:13 java
> 31169:   /usr/local/jdk/bin/java -Dproc_nodemanager -Xmx2048m 
> -Dhadoop.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs 
> -Dyarn.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs 
> -Dhadoop.log.file=yarn-data-nodemanager.log 
> -Dyarn.log.file=yarn-data-nodemanager.log -Dyarn.home.dir= -Dyarn.id.str=data 
> -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA 
> -Djava.library.path=/home/data/programs/apache-hadoop-2.7.1/lib/native 
> -Dyarn.policy.file=hadoop-policy.xml -XX:PermSize=128M -XX:MaxPermSize=256M 
> -XX:+UseC
> Address   Kbytes Mode  Offset   DeviceMapping
> 0040   4 r-x--  008:1 java
> 0060   4 rw---  008:1 java
> 00601000 10094936 rw---  000:0   [ anon ]
> 00077000 2228224 rw---  000:0   [ anon ]
> 0007f800  131072 rw---  000:0   [ anon ]
> 00325ee0 128 r-x--  008:1 ld-2.12.so
> 00325f01f000   4 r 0001f000 008:1 ld-2.12.so
> 00325f02   4 rw--- 0002 008:1 ld-2.12.so
> 00325f021000   4 rw---  000:0   [ anon ]
> 00325f201576 r-x--  008:1 libc-2.12.so
> 00325f38a0002048 - 0018a000 008:1 libc-2.12.so
> 00325f58a000  16 r 0018a000 008:1 libc-2.12.so
> 00325f58e000   4 rw--- 0018e000 008:1 libc-2.12.so
> 00325f58f000  20 rw---  000:0   [ anon ]
> 00325f60  92 r-x--  008:1 libpthread-2.12.so
> 00325f6170002048 - 00017000 008:1 libpthread-2.12.so
> 00325f817000   4 r 00017000 008:1 libpthread-2.12.so
> 00325f818000   4 rw--- 00018000 008:1 libpthread-2.12.so
> 00325f819000  16 rw---  000:0   [ anon ]
> 00325fa0   8 r-x--  008:1 libdl-2.12.so
> 00325fa020002048 

[jira] [Commented] (YARN-6017) node manager physical memory leak

2017-01-09 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812167#comment-15812167
 ] 

Bibin A Chundatt commented on YARN-6017:


[~chenrongwei]

This could be because of JDK bug. Found the following link 
[JDK-8054841|http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8054841] 
during reading.
As per the defect description {{ProcessBuilder leaks native memory}} . 
{{Shell.java}} do use {{ProcessBuilder}} for all child process launch
{quote}
For the Oracle JDK, it does not appear to be present in 7u45, but does appear 
to be present in 7u55.
{quote}
Looks like the issue is available in 7u65 . If possible could you try changing 
JRE version. 

> node manager physical memory leak
> -
>
> Key: YARN-6017
> URL: https://issues.apache.org/jira/browse/YARN-6017
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
> Environment: OS:
> Linux guomai124041 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 
> x86_64 x86_64 x86_64 GNU/Linux
> jvm:
> java version "1.7.0_65"
> Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
>Reporter: chenrongwei
> Attachments: 31169_smaps.txt, 31169_smaps.txt
>
>
> In our produce environment, node manager's jvm memory has been set to 
> '-Xmx2048m',but we notice that after a long time running the process' actual 
> physical memory size had been reached to 12g (we got this value by top 
> command as follow).
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 31169 data  20   0 13.2g  12g 6092 S 16.9 13.0  49183:13 java
> 31169:   /usr/local/jdk/bin/java -Dproc_nodemanager -Xmx2048m 
> -Dhadoop.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs 
> -Dyarn.log.dir=/home/data/programs/apache-hadoop-2.7.1/logs 
> -Dhadoop.log.file=yarn-data-nodemanager.log 
> -Dyarn.log.file=yarn-data-nodemanager.log -Dyarn.home.dir= -Dyarn.id.str=data 
> -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA 
> -Djava.library.path=/home/data/programs/apache-hadoop-2.7.1/lib/native 
> -Dyarn.policy.file=hadoop-policy.xml -XX:PermSize=128M -XX:MaxPermSize=256M 
> -XX:+UseC
> Address   Kbytes Mode  Offset   DeviceMapping
> 0040   4 r-x--  008:1 java
> 0060   4 rw---  008:1 java
> 00601000 10094936 rw---  000:0   [ anon ]
> 00077000 2228224 rw---  000:0   [ anon ]
> 0007f800  131072 rw---  000:0   [ anon ]
> 00325ee0 128 r-x--  008:1 ld-2.12.so
> 00325f01f000   4 r 0001f000 008:1 ld-2.12.so
> 00325f02   4 rw--- 0002 008:1 ld-2.12.so
> 00325f021000   4 rw---  000:0   [ anon ]
> 00325f201576 r-x--  008:1 libc-2.12.so
> 00325f38a0002048 - 0018a000 008:1 libc-2.12.so
> 00325f58a000  16 r 0018a000 008:1 libc-2.12.so
> 00325f58e000   4 rw--- 0018e000 008:1 libc-2.12.so
> 00325f58f000  20 rw---  000:0   [ anon ]
> 00325f60  92 r-x--  008:1 libpthread-2.12.so
> 00325f6170002048 - 00017000 008:1 libpthread-2.12.so
> 00325f817000   4 r 00017000 008:1 libpthread-2.12.so
> 00325f818000   4 rw--- 00018000 008:1 libpthread-2.12.so
> 00325f819000  16 rw---  000:0   [ anon ]
> 00325fa0   8 r-x--  008:1 libdl-2.12.so
> 00325fa020002048 - 2000 008:1 libdl-2.12.so
> 00325fc02000   4 r 2000 008:1 libdl-2.12.so
> 00325fc03000   4 rw--- 3000 008:1 libdl-2.12.so
> 00325fe0  28 r-x--  008:1 librt-2.12.so
> 00325fe070002044 - 7000 008:1 librt-2.12.so
> 003260006000   4 r 6000 008:1 librt-2.12.so
> 003260007000   4 rw--- 7000 008:1 librt-2.12.so
> 00326020 524 r-x--  008:1 libm-2.12.so
> 0032602830002044 - 00083000 008:1 libm-2.12.so
> 003260482000   4 r 00082000 008:1 libm-2.12.so
> 003260483000   4 rw--- 00083000 008:1 libm-2.12.so
> 00326120  88 r-x--  008:1 libresolv-2.12.so
> 0032612160002048 - 00016000 008:1 libresolv-2.12.so
> 003261416000   4 r 00016000 008:1 libresolv-2.12.so
> 003261417000   4 rw--- 00017000 

[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-01-09 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812128#comment-15812128
 ] 

Eric Payne commented on YARN-5889:
--

bq. Since number of active users are changed, we need to recalculate all active 
users limits, correct?
Correct. Comparing each user's "cached recompute count" value against the the 
queue's value should trigger that to happen when {{getComputedActiveUserLimit}} 
or {{getComputedUserLimit}} is called for each specific user.

bq. I need to have 2 cacheLimit in user data structure (one for active user and 
another for all users).
Are you asking if we need two versions of {{cachedRecalcULCount}} in the 
{{User}} class? If that's the question, then, no, I don't think so. The queue 
value will change for all of the conditions outlined in [~jlowe]'s [comment 
(above)|https://issues.apache.org/jira/browse/YARN-5889?focusedCommentId=15745552=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15745552],
 and that will trigger the recalculation.

However, I do think {{reComputeUserLimits}} needs to be modified to update 
{{preComputedUserLimit.get}} when recomputing user limits. That is not done 
anywhare in the current patch.

> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.v0.patch, YARN-5889.v1.patch, 
> YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-09 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15811877#comment-15811877
 ] 

Sunil G commented on YARN-5995:
---

Few suggestions here
||Op Name||Metric||
|#total no: of write operations| MutableCounter|
|#total no: of read operations| MutableCounter|
|no: of write ops per sec for past 1,5,15 mins.| Histogram|
|no: of read ops per sec for past 1,5,15 mins.| Histogram|
|amount of data written per sec/minute|Histogram|
|amount of data read per sec/minute|Histogram|

[~jianhe] and [~templedf], could you also pls take a look on the proposed 
metric choices.

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6064) Support fromId for flowRuns and flow/flowRun apps REST API's

2017-01-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15811871#comment-15811871
 ] 

Hadoop QA commented on YARN-6064:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
40s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
30s{color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
39s{color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
33s{color} | {color:green} YARN-5355 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests
 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} YARN-5355 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 19 new 
+ 23 unchanged - 6 fixed = 42 total (was 29) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests
 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
15s{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
53s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  5m  
0s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase-tests in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | YARN-6064 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846324/YARN-6064-YARN-5355.0002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  

[jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler

2017-01-09 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15811817#comment-15811817
 ] 

Sunil G commented on YARN-5889:
---

Thanks [~eepayne] for the detailed explanation.

I was also having similar thought. Jus one point to clarify here, 
bq. if number of active users has increased or decreased, all active users in 
preComputedActiveUserLimit are invalided, and not just the one that was 
activated/deactivated. This requires recalculation for other users when it is 
not necessary.

Since number of active users are changed, we need to recalculate all active 
users limits, correct?. Because we divide total-resource-used-byactive-user 
with active-user count. In the proposed changed patch also, cached limit will 
be different with actual user count when we query user-limit for that user.
In my patch, i cleared all map because of that. could you please help to 
elaborate a little more.

I also feel cachedLimit make code more simpler, hence no issue in making 
change. However I need to have 2 cacheLimit in user data structure (one for 
active user and another for all users). Is my thinking in line with yours. pls 
help to clarify. Thank You.


> Improve user-limit calculation in capacity scheduler
> 
>
> Key: YARN-5889
> URL: https://issues.apache.org/jira/browse/YARN-5889
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.v0.patch, YARN-5889.v1.patch, 
> YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >