[jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

2016-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281194#comment-15281194
 ] 

Hadoop QA commented on YARN-2888:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 52s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
51s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 42s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common in 
trunk has 3 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 36s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 5s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 37s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_91 with JDK v1.8.0_91 
generated 1 new + 22 unchanged - 0 fixed = 23 total (was 22) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 5m 42s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.7.0_95 with JDK v1.7.0_95 
generated 1 new + 25 unchanged - 0 fixed = 26 total (was 25) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 43s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 16 new + 
467 unchanged - 66 fixed = 483 total (was 533) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 19s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 59s 
{color} | 

[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-05-11 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4002:

Attachment: YARN-4002-rwlock-v3-rebase.patch

Rebased v3 patch trigger jenkin.
[~leftnoteasy] can you look into patch please. 

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, 
> YARN-4002-rwlock-v2.patch, YARN-4002-rwlock-v2.patch, 
> YARN-4002-rwlock-v3-rebase.patch, YARN-4002-rwlock-v3.patch, 
> YARN-4002-rwlock.patch, YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5072) Support comma separated list of includes and excludes files

2016-05-11 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281157#comment-15281157
 ] 

Ming Ma commented on YARN-5072:
---

Thanks [~raviprak]. I have updated the description based on your input.

> Support comma separated list of includes and excludes files
> ---
>
> Key: YARN-5072
> URL: https://issues.apache.org/jira/browse/YARN-5072
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>
> When a yarn cluster shares the same hosts as the underlying HDFS cluster, we 
> have {{yarn.resourcemanager.nodes.include-path}} point to the same file or 
> symlink of the {{dfs.hosts}} file used by HDFS to make admin easier.
> If we want to set up a yarn cluster to run on the same hosts of several HDFS 
> clusters combined, it means {{yarn.resourcemanager.nodes.include-path}} 
> should be able to point to a list of files each of which belongs to one HDFS 
> cluster.
> Backward compatibility, it seems ok to continue to reuse 
> {{yarn.resourcemanager.nodes.include-path}} as long as it can still take a 
> single file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2684) FairScheduler: When failing an application due to changes in queue config or placement policy, indicate the cause.

2016-05-11 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-2684:

Target Version/s: 2.9.0  (was: 2.8.0)

> FairScheduler: When failing an application due to changes in queue config or 
> placement policy, indicate the cause.
> --
>
> Key: YARN-2684
> URL: https://issues.apache.org/jira/browse/YARN-2684
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: 0001-YARN-2684.patch, 0002-YARN-2684.patch
>
>
> YARN-2308 fixes this issue for CS, this JIRA is to fix it for FS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2684) FairScheduler: When failing an application due to changes in queue config or placement policy, indicate the cause.

2016-05-11 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281152#comment-15281152
 ] 

Rohith Sharma K S commented on YARN-2684:
-

This require more discussion, I will move this to 2.9

> FairScheduler: When failing an application due to changes in queue config or 
> placement policy, indicate the cause.
> --
>
> Key: YARN-2684
> URL: https://issues.apache.org/jira/browse/YARN-2684
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: 0001-YARN-2684.patch, 0002-YARN-2684.patch
>
>
> YARN-2308 fixes this issue for CS, this JIRA is to fix it for FS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4882) Change the log level to DEBUG for recovering completed applications

2016-05-11 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281150#comment-15281150
 ] 

Rohith Sharma K S commented on YARN-4882:
-

One of the thought I had for separate log file is after changing log level to 
DEBUG, there is no way to identify to which application recovery failed. Ex : 
If one of the application state got corrupted. Basically, we need at least 
application id to which recovery has failed so that Admin can remove that 
application from state store and make RM cluster up. 
 And difficult task is since recovery flow is same for both completed and 
running applications, changing log to DEBUG impact both. So should be very 
cautious about which logs are changed to DEBUG.

> Change the log level to DEBUG for recovering completed applications
> ---
>
> Key: YARN-4882
> URL: https://issues.apache.org/jira/browse/YARN-4882
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Daniel Templeton
>
> I think for recovering completed applications no need to log as INFO, rather 
> it can be made it as DEBUG.  The problem seen from large cluster is if any 
> issue happens during RM start up and continuously switching , then  RM logs 
> are filled with most with recovering applications only. 
> There are 6 lines are logged for 1 applications as I shown in below logs, 
> then consider RM default value for max-completed applications is 10K. So for 
> each switch 10K*6=60K lines will be added which is not useful I feel.
> {noformat}
> 2016-03-01 10:20:59,077 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Default priority 
> level is set to application:application_1456298208485_21507
> 2016-03-01 10:20:59,094 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering 
> app: application_1456298208485_21507 with 1 attempts and final state = 
> FINISHED
> 2016-03-01 10:20:59,100 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Recovering attempt: appattempt_1456298208485_21507_01 with final state: 
> FINISHED
> 2016-03-01 10:20:59,107 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1456298208485_21507_01 State change from NEW to FINISHED
> 2016-03-01 10:20:59,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
> application_1456298208485_21507 State change from NEW to FINISHED
> 2016-03-01 10:20:59,112 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=rohith   
> OPERATION=Application Finished - Succeeded  TARGET=RMAppManager 
> RESULT=SUCCESS  APPID=application_1456298208485_21507
> {noformat}
> The main problem is missing important information's from the logs before RM 
> unstable. Even though log roll back is 50 or 100, in a short period all these 
> logs will be rolled out and all the logs contains only RM switching 
> information that too recovering applications!!. 
> I suggest at least completed applications recovery should be logged as DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3344) procfs stat file is not in the expected format warning

2016-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281145#comment-15281145
 ] 

Hadoop QA commented on YARN-3344:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 34s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 27s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: patch 
generated 4 new + 153 unchanged - 7 fixed = 157 total (was 160) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 10s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 58s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 50s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:cf2ee45 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12803545/YARN-3344.07.patch |
| JIRA Issue | YARN-3344 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux bfe6eba973fb 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-05-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281140#comment-15281140
 ] 

Xuan Gong commented on YARN-4577:
-

Thanks for the review. Uploaded a new patch which made the 
AuxiliaryServiceWithCustomClassLoader as final

> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch, 
> YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, 
> YARN-4577.20160428.patch, YARN-4577.20160509.patch, YARN-4577.20160510.patch, 
> YARN-4577.20160511.1.patch, YARN-4577.20160511.patch, YARN-4577.3.patch, 
> YARN-4577.3.rebase.patch, YARN-4577.4.patch, YARN-4577.5.patch, 
> YARN-4577.poc.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-05-11 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4577:

Attachment: YARN-4577.20160511.1.patch

> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch, 
> YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, 
> YARN-4577.20160428.patch, YARN-4577.20160509.patch, YARN-4577.20160510.patch, 
> YARN-4577.20160511.1.patch, YARN-4577.20160511.patch, YARN-4577.3.patch, 
> YARN-4577.3.rebase.patch, YARN-4577.4.patch, YARN-4577.5.patch, 
> YARN-4577.poc.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-05-11 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281133#comment-15281133
 ] 

Varun Vasudev commented on YARN-4577:
-

Agree with [~sjlee0]; [~xgong] - can you please fix the checkstyle issue and we 
can commit it.

> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch, 
> YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, 
> YARN-4577.20160428.patch, YARN-4577.20160509.patch, YARN-4577.20160510.patch, 
> YARN-4577.20160511.patch, YARN-4577.3.patch, YARN-4577.3.rebase.patch, 
> YARN-4577.4.patch, YARN-4577.5.patch, YARN-4577.poc.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5049) Extend NMStateStore to save queued container information

2016-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281128#comment-15281128
 ] 

Hudson commented on YARN-5049:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9749 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9749/])
YARN-5049. Extend NMStateStore to save queued container information. (arun 
suresh: rev d464f4d1c4dec483852fc8c0496787cba0af8f57)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/queuing/QueuingContainerManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java


> Extend NMStateStore to save queued container information
> 
>
> Key: YARN-5049
> URL: https://issues.apache.org/jira/browse/YARN-5049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Fix For: 3.0.0
>
> Attachments: YARN-5049.001.patch, YARN-5049.002.patch, 
> YARN-5049.003.patch
>
>
> This JIRA is about extending the NMStateStore to save queued container 
> information whenever a new container is added to the NM queue. 
> It also removes the information from the state store when the queued 
> container starts its execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4995) FairScheduler: Display per-queue demand on the scheduler page

2016-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281129#comment-15281129
 ] 

Hudson commented on YARN-4995:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9749 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9749/])
YARN-4995. FairScheduler: Display per-queue demand on the scheduler (kasha: rev 
4b4e4c6ba83bc5c41d7bb69bb2483bcfe894a260)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java


> FairScheduler: Display per-queue demand on the scheduler page
> -
>
> Key: YARN-4995
> URL: https://issues.apache.org/jira/browse/YARN-4995
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: xupeng
>Assignee: xupeng
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-4995.001.patch, YARN-4995.002.patch, 
> demo_screenshot.png
>
>
> For now there is no demand resource information for queues on the scheduler 
> page. 
> Just using used resource information, it is hard for us to judge whether the 
> queue is needy (demand > used , but cluster has no available resource). And 
> without demand resource information, modifying min/max resource for queue is 
> not accurate. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4484) Available Resource calculation for a queue is not correct when used with labels

2016-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281115#comment-15281115
 ] 

Hadoop QA commented on YARN-4484:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 27s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 14 new + 96 unchanged - 0 fixed = 110 total (was 96) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 36m 35s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 36m 6s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 94m 34s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 

[jira] [Commented] (YARN-5049) Extend NMStateStore to save queued container information

2016-05-11 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281108#comment-15281108
 ] 

Konstantinos Karanasos commented on YARN-5049:
--

Jenkins just completed. There is a single checkstyle issue that is not related 
to my code, so I think we are good.

> Extend NMStateStore to save queued container information
> 
>
> Key: YARN-5049
> URL: https://issues.apache.org/jira/browse/YARN-5049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-5049.001.patch, YARN-5049.002.patch, 
> YARN-5049.003.patch
>
>
> This JIRA is about extending the NMStateStore to save queued container 
> information whenever a new container is added to the NM queue. 
> It also removes the information from the state store when the queued 
> container starts its execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3997) An Application requesting multiple core containers can't preempt running application made of single core containers

2016-05-11 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281107#comment-15281107
 ] 

Arun Suresh commented on YARN-3997:
---

[~leftnoteasy], please go ahead and move it to 2.9

> An Application requesting multiple core containers can't preempt running 
> application made of single core containers
> ---
>
> Key: YARN-3997
> URL: https://issues.apache.org/jira/browse/YARN-3997
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.7.1
> Environment: Ubuntu 14.04, Hadoop 2.7.1, Physical Machines
>Reporter: Dan Shechter
>Assignee: Arun Suresh
>Priority: Critical
>
> When our cluster is configured with preemption, and is fully loaded with an 
> application consuming 1-core containers, it will not kill off these 
> containers when a new application kicks in requesting containers with a size 
> > 1, for example 4 core containers.
> When the "second" application attempts to us 1-core containers as well, 
> preemption proceeds as planned and everything works properly.
> It is my assumption, that the fair-scheduler, while recognizing it needs to 
> kill off some container to make room for the new application, fails to find a 
> SINGLE container satisfying the request for a 4-core container (since all 
> existing containers are 1-core containers), and isn't "smart" enough to 
> realize it needs to kill off 4 single-core containers (in this case) on a 
> single node, for the new application to be able to proceed...
> The exhibited affect is that the new application is hung indefinitely and 
> never gets the resources it requires.
> This can easily be replicated with any yarn application.
> Our "goto" scenario in this case is running pyspark with 1-core executors 
> (containers) while trying to launch h20.ai framework which INSISTS on having 
> at least 4 cores per container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5049) Extend NMStateStore to save queued container information

2016-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281101#comment-15281101
 ] 

Hadoop QA commented on YARN-5049:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 patch generated 1 new + 153 unchanged - 0 fixed = 154 total (was 153) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 32s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 57s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 39m 11s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:cf2ee45 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12803498/YARN-5049.003.patch |
| JIRA Issue | YARN-5049 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 8e9ab3dbc585 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (YARN-4971) RM fails to re-bind to wildcard IP after failover in multi homed clusters

2016-05-11 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281097#comment-15281097
 ] 

Karthik Kambatla commented on YARN-4971:


I must be missing something, but can't figure out why not setting the variable 
helps here. If I understand the code correctly, the individual variables 
{{clientBindAddress}} and {{masterServiceAddress}} are used only in tests and 
the one other place in {{DelegationTokenRenewer}} that Daniel pointed out. 

Both ClientRMService and ApplicationMasterService are part of RMActiveServices. 
On transition to standby, both services are inited again to be started when the 
RM transitions back to active. This code path, in theory at least, shouldn't be 
different from the first time around. 

Am I missing something or misreading the code? 

> RM fails to re-bind to wildcard IP after failover in multi homed clusters
> -
>
> Key: YARN-4971
> URL: https://issues.apache.org/jira/browse/YARN-4971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-4971.1.patch
>
>
> If the RM has the {{yarn.resourcemanager.bind-host}} set to 0.0.0.0 the first 
> time the service becomes active binding to the wildcard works as expected. If 
> the service has transitioned from active to standby and then becomes active 
> again after failovers the service only binds to one of the ip addresses.
> There is a difference between the services inside the RM: it only seem to 
> happen for the services listening on ports: 8030 and 8032



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4969) Fix more loggings in CapacityScheduler

2016-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281068#comment-15281068
 ] 

Hadoop QA commented on YARN-4969:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 27s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 1 new + 212 unchanged - 0 fixed = 213 total (was 212) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 35m 23s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 35m 57s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 92m 11s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestChildQueueOrder 
|
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
| JDK v1.7.0_95 Failed junit tests | 

[jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

2016-05-11 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281059#comment-15281059
 ] 

Konstantinos Karanasos commented on YARN-2888:
--

Thanks, [~asuresh]!

I went over the patch again.. +1 from me for the current version.

> Corrective mechanisms for rebalancing NM container queues
> -
>
> Key: YARN-2888
> URL: https://issues.apache.org/jira/browse/YARN-2888
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2888-yarn-2877.001.patch, 
> YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch, 
> YARN-2888.005.patch, YARN-2888.006.patch, YARN-2888.007.patch, 
> YARN-2888.008.patch
>
>
> Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of 
> the scheduling decisions or due to having a stale image of the system) may 
> lead to an imbalance in the waiting times of the NM container queues. This 
> can in turn have an impact in job execution times and cluster utilization.
> To this end, we introduce corrective mechanisms that may remove (whenever 
> needed) container requests from overloaded queues, adding them to less-loaded 
> ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1815) RM should record final state for unmanaged AMs

2016-05-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1815:
--
Target Version/s: 2.8.0  (was: 2.9.0)

> RM  should record final state for unmanaged AMs
> ---
>
> Key: YARN-1815
> URL: https://issues.apache.org/jira/browse/YARN-1815
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Subru Krishnan
>Priority: Critical
> Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, 
> yarn-1815-2.patch, yarn-1815-2.patch
>
>
> RM  should record final state for unmanaged AMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4995) FairScheduler: Display per-queue demand on the scheduler page

2016-05-11 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4995:
---
Summary: FairScheduler: Display per-queue demand on the scheduler page  
(was: FairScheduler: Display demand resource for queues on the scheduler page)

> FairScheduler: Display per-queue demand on the scheduler page
> -
>
> Key: YARN-4995
> URL: https://issues.apache.org/jira/browse/YARN-4995
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: xupeng
>Assignee: xupeng
>Priority: Minor
> Attachments: YARN-4995.001.patch, YARN-4995.002.patch, 
> demo_screenshot.png
>
>
> For now there is no demand resource information for queues on the scheduler 
> page. 
> Just using used resource information, it is hard for us to judge whether the 
> queue is needy (demand > used , but cluster has no available resource). And 
> without demand resource information, modifying min/max resource for queue is 
> not accurate. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4995) FairScheduler: Display demand resource for queues on the scheduler page

2016-05-11 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4995:
---
Summary: FairScheduler: Display demand resource for queues on the scheduler 
page  (was: Fair Scheduler: Display demand resource for queues on the scheduler 
page)

> FairScheduler: Display demand resource for queues on the scheduler page
> ---
>
> Key: YARN-4995
> URL: https://issues.apache.org/jira/browse/YARN-4995
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: xupeng
>Assignee: xupeng
>Priority: Minor
> Attachments: YARN-4995.001.patch, YARN-4995.002.patch, 
> demo_screenshot.png
>
>
> For now there is no demand resource information for queues on the scheduler 
> page. 
> Just using used resource information, it is hard for us to judge whether the 
> queue is needy (demand > used , but cluster has no available resource). And 
> without demand resource information, modifying min/max resource for queue is 
> not accurate. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1815) RM should record final state for unmanaged AMs

2016-05-11 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281057#comment-15281057
 ] 

Jian He commented on YARN-1815:
---

I see, I thought there was not progress since no patch was uploaded. could you 
post the patch ? I can help the review and get it in for 2.8

> RM  should record final state for unmanaged AMs
> ---
>
> Key: YARN-1815
> URL: https://issues.apache.org/jira/browse/YARN-1815
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Subru Krishnan
>Priority: Critical
> Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, 
> yarn-1815-2.patch, yarn-1815-2.patch
>
>
> RM  should record final state for unmanaged AMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2684) FairScheduler: When failing an application due to changes in queue config or placement policy, indicate the cause.

2016-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281049#comment-15281049
 ] 

Wangda Tan commented on YARN-2684:
--

[~rohithsharma]/[~kasha], do you have bandwidth to finish this recently? May I 
move this to 2.9 if you don't?

> FairScheduler: When failing an application due to changes in queue config or 
> placement policy, indicate the cause.
> --
>
> Key: YARN-2684
> URL: https://issues.apache.org/jira/browse/YARN-2684
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: 0001-YARN-2684.patch, 0002-YARN-2684.patch
>
>
> YARN-2308 fixes this issue for CS, this JIRA is to fix it for FS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2919) Potential race between renew and cancel in DelegationTokenRenwer

2016-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281048#comment-15281048
 ] 

Wangda Tan commented on YARN-2919:
--

[~Naganarasimha], do you have bandwidth to finish this recently? May I move 
this to 2.9 if you don't?

> Potential race between renew and cancel in DelegationTokenRenwer 
> -
>
> Key: YARN-2919
> URL: https://issues.apache.org/jira/browse/YARN-2919
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-2919.20141209-1.patch
>
>
> YARN-2874 fixes a deadlock in DelegationTokenRenewer, but there is still a 
> race because of which a renewal in flight isn't interrupted by a cancel. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio

2016-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281047#comment-15281047
 ] 

Wangda Tan commented on YARN-3153:
--

Move this to 3.0.0 release.

> Capacity Scheduler max AM resource limit for queues is defined as percentage 
> but used as ratio
> --
>
> Key: YARN-3153
> URL: https://issues.apache.org/jira/browse/YARN-3153
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
>
> In existing Capacity Scheduler, it can limit max applications running within 
> a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, 
> but actually, it is used as "ratio", in implementation, it assumes input will 
> be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x 
> of queue capacity. We should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio

2016-05-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3153:
-
Target Version/s: 3.0.0  (was: 2.8.0)

> Capacity Scheduler max AM resource limit for queues is defined as percentage 
> but used as ratio
> --
>
> Key: YARN-3153
> URL: https://issues.apache.org/jira/browse/YARN-3153
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
>
> In existing Capacity Scheduler, it can limit max applications running within 
> a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, 
> but actually, it is used as "ratio", in implementation, it assumes input will 
> be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x 
> of queue capacity. We should fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1815) RM should record final state for unmanaged AMs

2016-05-11 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281045#comment-15281045
 ] 

Subru Krishnan commented on YARN-1815:
--

[~jianhe], there are actually two problems I noticed while working on this:
  * With work-preserving restart in place, UAM works across RM restarts but all 
it's running containers are killed during recovery. I have a fix for this. I 
tested this with [~ellenfkh] and UAM works across RM restarts _without_ loosing 
any work.
  * We then hit the issue of UAM final state not being recorded as subsequent 
failovers of RM brings the Unmanaged apps back to _ACCEPTED_ state even though 
they had _COMPLETED_ in the past.

> RM  should record final state for unmanaged AMs
> ---
>
> Key: YARN-1815
> URL: https://issues.apache.org/jira/browse/YARN-1815
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Subru Krishnan
>Priority: Critical
> Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, 
> yarn-1815-2.patch, yarn-1815-2.patch
>
>
> RM  should record final state for unmanaged AMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3997) An Application requesting multiple core containers can't preempt running application made of single core containers

2016-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281043#comment-15281043
 ] 

Wangda Tan commented on YARN-3997:
--

[~asuresh], do you have plan to finish this recently? May I move this to 2.9 
release if you don't have bandwidth recently?

> An Application requesting multiple core containers can't preempt running 
> application made of single core containers
> ---
>
> Key: YARN-3997
> URL: https://issues.apache.org/jira/browse/YARN-3997
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.7.1
> Environment: Ubuntu 14.04, Hadoop 2.7.1, Physical Machines
>Reporter: Dan Shechter
>Assignee: Arun Suresh
>Priority: Critical
>
> When our cluster is configured with preemption, and is fully loaded with an 
> application consuming 1-core containers, it will not kill off these 
> containers when a new application kicks in requesting containers with a size 
> > 1, for example 4 core containers.
> When the "second" application attempts to us 1-core containers as well, 
> preemption proceeds as planned and everything works properly.
> It is my assumption, that the fair-scheduler, while recognizing it needs to 
> kill off some container to make room for the new application, fails to find a 
> SINGLE container satisfying the request for a 4-core container (since all 
> existing containers are 1-core containers), and isn't "smart" enough to 
> realize it needs to kill off 4 single-core containers (in this case) on a 
> single node, for the new application to be able to proceed...
> The exhibited affect is that the new application is hung indefinitely and 
> never gets the resources it requires.
> This can easily be replicated with any yarn application.
> Our "goto" scenario in this case is running pyspark with 1-core executors 
> (containers) while trying to launch h20.ai framework which INSISTS on having 
> at least 4 cores per container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281041#comment-15281041
 ] 

Wangda Tan commented on YARN-4002:
--

[~rohithsharma], is there any update for the latest patch? Cleaning up 2.8.0 
tickets now. 

Thanks,

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, 
> YARN-4002-rwlock-v2.patch, YARN-4002-rwlock-v2.patch, 
> YARN-4002-rwlock-v3.patch, YARN-4002-rwlock.patch, YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4995) Fair Scheduler: Display demand resource for queues on the scheduler page

2016-05-11 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281038#comment-15281038
 ] 

Karthik Kambatla commented on YARN-4995:


Cool. Looks good per my previous review. Checking this in. 

> Fair Scheduler: Display demand resource for queues on the scheduler page
> 
>
> Key: YARN-4995
> URL: https://issues.apache.org/jira/browse/YARN-4995
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: xupeng
>Assignee: xupeng
>Priority: Minor
> Attachments: YARN-4995.001.patch, YARN-4995.002.patch, 
> demo_screenshot.png
>
>
> For now there is no demand resource information for queues on the scheduler 
> page. 
> Just using used resource information, it is hard for us to judge whether the 
> queue is needy (demand > used , but cluster has no available resource). And 
> without demand resource information, modifying min/max resource for queue is 
> not accurate. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281035#comment-15281035
 ] 

Wangda Tan commented on YARN-4963:
--

Thanks [~nroberts], few comments:

1) Configuration name:
How about call it: 
yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments?
offswitch-per-node-limit is a little confusing to me. And in the future we can 
add other limits under per-node-heartbeat if needed.

2) We may only need to add getOffSwitchNodeLimit to ParentQueue (instead of 
adding it to AbstractCSQueue)

3) (Minor) Logics in ParentQueue:
Add isDebugEnabled for:
{code}
500 LOG.debug("Not assigning more than " + getOffSwitchNodeLimit() +
501 " off-switch containers," +
{code} 

And it's better to print offswitchCount with the debug log.

Thoughts?

> capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat 
> configurable
> 
>
> Key: YARN-4963
> URL: https://issues.apache.org/jira/browse/YARN-4963
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4963.001.patch, YARN-4963.002.patch
>
>
> Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment 
> per heartbeat. With more and more non MapReduce workloads coming along, the 
> degree of locality is declining, causing scheduling to be significantly 
> slower. It's still important to limit the number of OFF_SWITCH assignments to 
> avoid densely packing OFF_SWITCH containers onto nodes. 
> Proposal is to add a simple config that makes the number of OFF_SWITCH 
> assignments configurable.
> Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2506) TimelineClient should NOT be in yarn-common project

2016-05-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2506:
--
Target Version/s: 3.0.0  (was: 2.8.0)

bq. Why not do this in trunk?
Having two diverged code base to maintain is not helpful. This is not a simple 
move, because of module cycle dependency. 
bq. Seems like something useful as part of 3.x 
sure keep it for 3.x then.

> TimelineClient should NOT be in yarn-common project
> ---
>
> Key: YARN-2506
> URL: https://issues.apache.org/jira/browse/YARN-2506
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>Priority: Critical
>
> YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't 
> belong there, we should move it back to yarn-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-2506) TimelineClient should NOT be in yarn-common project

2016-05-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reopened YARN-2506:
---

> TimelineClient should NOT be in yarn-common project
> ---
>
> Key: YARN-2506
> URL: https://issues.apache.org/jira/browse/YARN-2506
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>Priority: Critical
>
> YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't 
> belong there, we should move it back to yarn-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3344) procfs stat file is not in the expected format warning

2016-05-11 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-3344:

Attachment: YARN-3344.07.patch

Nice catch, [~templedf]. Updated the patch to address the comment.

> procfs stat file is not in the expected format warning
> --
>
> Key: YARN-3344
> URL: https://issues.apache.org/jira/browse/YARN-3344
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jon Bringhurst
>Assignee: Akira AJISAKA
> Attachments: YARN-3344-trunk.005.patch, YARN-3344.06.patch, 
> YARN-3344.07.patch
>
>
> Although this doesn't appear to be causing any functional issues, it is 
> spamming our log files quite a bit. :)
> It appears that the regex in ProcfsBasedProcessTree doesn't work for all 
> /proc//stat files.
> Here's the error I'm seeing:
> {noformat}
> "source_host": "asdf",
> "method": "constructProcessInfo",
> "level": "WARN",
> "message": "Unexpected: procfs stat file is not in the expected format 
> for process with pid 6953"
> "file": "ProcfsBasedProcessTree.java",
> "line_number": "514",
> "class": "org.apache.hadoop.yarn.util.ProcfsBasedProcessTree",
> {noformat}
> And here's the basic info on process with pid 6953:
> {noformat}
> [asdf ~]$ cat /proc/6953/stat
> 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 
> 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 
> 2 18446744073709551615 0 0 17 13 0 0 0 0 0
> [asdf ~]$ ps aux|grep 6953
> root  6953  0.0  0.0 200484 23424 ?S21:44   0:00 python2.6 
> /export/apps/salt/minion-scripts/module-sync.py
> jbringhu 13481  0.0  0.0 105312   872 pts/0S+   22:13   0:00 grep -i 6953
> [asdf ~]$ 
> {noformat}
> This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

2016-05-11 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-2888:
--
Attachment: YARN-2888.008.patch

Updating patch with some minor renaming changes suggested by [~kkaranasos]

> Corrective mechanisms for rebalancing NM container queues
> -
>
> Key: YARN-2888
> URL: https://issues.apache.org/jira/browse/YARN-2888
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2888-yarn-2877.001.patch, 
> YARN-2888-yarn-2877.002.patch, YARN-2888.003.patch, YARN-2888.004.patch, 
> YARN-2888.005.patch, YARN-2888.006.patch, YARN-2888.007.patch, 
> YARN-2888.008.patch
>
>
> Bad queuing decisions by the LocalRMs (e.g., due to the distributed nature of 
> the scheduling decisions or due to having a stale image of the system) may 
> lead to an imbalance in the waiting times of the NM container queues. This 
> can in turn have an impact in job execution times and cluster utilization.
> To this end, we introduce corrective mechanisms that may remove (whenever 
> needed) container requests from overloaded queues, adding them to less-loaded 
> ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5074) RM cycles through container ids for an app that is waiting for resources.

2016-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281019#comment-15281019
 ] 

Wangda Tan commented on YARN-5074:
--

CC: [~jlowe]/[~nroberts]

> RM cycles through container ids for an app that is waiting for resources. 
> --
>
> Key: YARN-5074
> URL: https://issues.apache.org/jira/browse/YARN-5074
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Sidharta Seethana
> Attachments: YARN-5074-test-case.patch
>
>
> /cc [~wangda], [~vinodkv]
> This was observed on a cluster running a 2.7.x build. Here is the scenario :
> 1. A YARN cluster has applications running that almost entirely consume the 
> cluster, with little available resources.
> 2. A new app is submitted - the resources required for the AM exceed what is 
> available in the cluster. The app stays in the 'ACCEPTED' state till 
> resources are available.
> 3. Once resources are available and the AM container comes up, the AM 
> container has a id that indicates that the RM has been cycling through 
> containers. There are no errors in the logs of any kind. One example id for 
> such an AM container is : container_e3788_1462916288781_0012_01_000302 . This 
> indicates that while the app was in the 'ACCEPTED' state, the RM cycled 
> through 301 containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5074) RM cycles through container ids for an app that is waiting for resources.

2016-05-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5074:
-
Attachment: YARN-5074-test-case.patch

> RM cycles through container ids for an app that is waiting for resources. 
> --
>
> Key: YARN-5074
> URL: https://issues.apache.org/jira/browse/YARN-5074
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Sidharta Seethana
> Attachments: YARN-5074-test-case.patch
>
>
> /cc [~wangda], [~vinodkv]
> This was observed on a cluster running a 2.7.x build. Here is the scenario :
> 1. A YARN cluster has applications running that almost entirely consume the 
> cluster, with little available resources.
> 2. A new app is submitted - the resources required for the AM exceed what is 
> available in the cluster. The app stays in the 'ACCEPTED' state till 
> resources are available.
> 3. Once resources are available and the AM container comes up, the AM 
> container has a id that indicates that the RM has been cycling through 
> containers. There are no errors in the logs of any kind. One example id for 
> such an AM container is : container_e3788_1462916288781_0012_01_000302 . This 
> indicates that while the app was in the 'ACCEPTED' state, the RM cycled 
> through 301 containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5074) RM cycles through container ids for an app that is waiting for resources.

2016-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281017#comment-15281017
 ] 

Wangda Tan commented on YARN-5074:
--

Thanks [~sidharta-s] reporting this issue.  

This happens when:

- A multiple nodes cluster (# >= 2)
- App1 takes almost of the cluster
- AM Request of app2 can be reserved but cannot get allocated
- If app2 get resource from a different node other than reserved node (IAW, 
reservation cancellation happens). App2 can get a container-id with number > 1.

>From what I can see, there're two issues that container id could be skipped 
>when works with reservation-continuous-looking:

*Issue#1, multiple containerId will be skipped*
In LeafQueue#assignContainer 
{code}
// Create the container if necessary
Container container = 
getContainer(rmContainer, application, node, capability, priority);
{code}

Happens before successfully allocate or reserve container.

So if LeafQueue relaxed checks considered reserved resource, it is possible 
that unnecessary getContainer call happens.

This issue only exists in branch-2.7. Branch-2.8/branch-2/trunk will not create 
containerId unless it allocate or reserve new container.

*Issue#2, single container id will be skipped:*
This issue exists in both of branch-2.7 and branch-2.8+.

When one container (c1) is reserved at host1, and later it is cancelled to 
allocate another container (c2) at a different host, containerId of c1 will be 
skipped.

Uploading a demo test to reproduce this issue in branch-2.7:

> RM cycles through container ids for an app that is waiting for resources. 
> --
>
> Key: YARN-5074
> URL: https://issues.apache.org/jira/browse/YARN-5074
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Sidharta Seethana
> Attachments: YARN-5074-test-case.patch
>
>
> /cc [~wangda], [~vinodkv]
> This was observed on a cluster running a 2.7.x build. Here is the scenario :
> 1. A YARN cluster has applications running that almost entirely consume the 
> cluster, with little available resources.
> 2. A new app is submitted - the resources required for the AM exceed what is 
> available in the cluster. The app stays in the 'ACCEPTED' state till 
> resources are available.
> 3. Once resources are available and the AM container comes up, the AM 
> container has a id that indicates that the RM has been cycling through 
> containers. There are no errors in the logs of any kind. One example id for 
> such an AM container is : container_e3788_1462916288781_0012_01_000302 . This 
> indicates that while the app was in the 'ACCEPTED' state, the RM cycled 
> through 301 containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-867) Isolation of failures in aux services

2016-05-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-867:
-
Target Version/s: 2.9.0  (was: 2.8.0)
Priority: Major  (was: Critical)

This is an improvement,. Unlikely this will get done. move out

> Isolation of failures in aux services 
> --
>
> Key: YARN-867
> URL: https://issues.apache.org/jira/browse/YARN-867
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hitesh Shah
>Assignee: Xuan Gong
> Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
> YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, 
> YARN-867.sampleCode.2.patch
>
>
> Today, a malicious application can bring down the NM by sending bad data to a 
> service. For example, sending data to the ShuffleService such that it results 
> any non-IOException will cause the NM's async dispatcher to exit as the 
> service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-05-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281003#comment-15281003
 ] 

Sangjin Lee commented on YARN-4577:
---

I think it's quite close. The hadoop-common test failure is unrelated. The 
checkstyle issue should be trivial to fix. Hi [~xgong], could you please fix 
the checkstyle issue? Then I think it's good to go.

[~vvasudev], let me know if you're OK with that.

> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch, 
> YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, 
> YARN-4577.20160428.patch, YARN-4577.20160509.patch, YARN-4577.20160510.patch, 
> YARN-4577.20160511.patch, YARN-4577.3.patch, YARN-4577.3.rebase.patch, 
> YARN-4577.4.patch, YARN-4577.5.patch, YARN-4577.poc.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280993#comment-15280993
 ] 

Hadoop QA commented on YARN-4577:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 52s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 2s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 52s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 24s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 50s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 47s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 47s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 29s 
{color} | {color:red} root: patch generated 1 new + 316 unchanged - 2 fixed = 
317 total (was 318) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 21s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 4s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 56s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 45s 
{color} | {color:green} 

[jira] [Commented] (YARN-2506) TimelineClient should NOT be in yarn-common project

2016-05-11 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280992#comment-15280992
 ] 

Hitesh Shah commented on YARN-2506:
---

Why not do this in trunk? 

> TimelineClient should NOT be in yarn-common project
> ---
>
> Key: YARN-2506
> URL: https://issues.apache.org/jira/browse/YARN-2506
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>Priority: Critical
>
> YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't 
> belong there, we should move it back to yarn-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2506) TimelineClient should NOT be in yarn-common project

2016-05-11 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280994#comment-15280994
 ] 

Hitesh Shah commented on YARN-2506:
---

Seems like something useful as part of 3.x given that the client library is 
meant to be part of yarn-client-api

> TimelineClient should NOT be in yarn-common project
> ---
>
> Key: YARN-2506
> URL: https://issues.apache.org/jira/browse/YARN-2506
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>Priority: Critical
>
> YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't 
> belong there, we should move it back to yarn-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5075) Fix findbugs warning in hadoop-yarn-common module

2016-05-11 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-5075:

Attachment: findbugs.html

Attaching a html file for the detail.

> Fix findbugs warning in hadoop-yarn-common module
> -
>
> Key: YARN-5075
> URL: https://issues.apache.org/jira/browse/YARN-5075
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akira AJISAKA
> Attachments: findbugs.html
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5075) Fix findbugs warning in hadoop-yarn-common module

2016-05-11 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created YARN-5075:
---

 Summary: Fix findbugs warning in hadoop-yarn-common module
 Key: YARN-5075
 URL: https://issues.apache.org/jira/browse/YARN-5075
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Akira AJISAKA






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5024) TestContainerResourceUsage#testUsageAfterAMRestartWithMultipleContainers random failure

2016-05-11 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280920#comment-15280920
 ] 

Daniel Templeton commented on YARN-5024:


[~bibinchundatt], thanks for the patch.  I have a few comments:

# Looks like your {{complete}} parameter is unused in the latest patch.
# It would also be nice to have the sleep time be shorter.  Something more like 
100ms.  If you're looping, there's no penalty for a short sleep.
# If at all possible, I'd rather see you use the existing {{MorkRM}} methods.  
Duplicated code is the devil's playground.

I timed the {{rm.waitForState(node, am1ContainerID, RMContainerState.COMPLETED, 
30 * 1000)}} call in {{testResourceRequestRecoveryToTheRightAppAttempt}}, and 
it only took 101 ms for me.  Sounds like there's something different about your 
configuration.  I've had odd issues like that before that were solved by a 
reboot...

> TestContainerResourceUsage#testUsageAfterAMRestartWithMultipleContainers 
> random failure
> ---
>
> Key: YARN-5024
> URL: https://issues.apache.org/jira/browse/YARN-5024
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-5024.patch, 0002-YARN-5024.patch, 
> 0003-YARN-5024.patch
>
>
> Random Testcase failure for 
> {{TestContainerResourceUsage#testUsageAfterAMRestartWithMultipleContainers}}
> {noformat}
> java.lang.AssertionError: Unexcpected MemorySeconds value 
> expected:<-1497214794931> but was:<1913>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage.amRestartTests(TestContainerResourceUsage.java:395)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage.testUsageAfterAMRestartWithMultipleContainers(TestContainerResourceUsage.java:252)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4006) YARN ATS Alternate Kerberos HTTP Authentication Changes

2016-05-11 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280917#comment-15280917
 ] 

Jian He commented on YARN-4006:
---

[~gss2002], could you please describe what the current problem is and how your 
patch resolved the problem ?

> YARN ATS Alternate Kerberos HTTP Authentication Changes
> ---
>
> Key: YARN-4006
> URL: https://issues.apache.org/jira/browse/YARN-4006
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, timelineserver
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.6.1, 2.8.0, 2.7.1, 2.7.2
>Reporter: Greg Senia
>Assignee: Greg Senia
>Priority: Blocker
> Attachments: YARN-4006-branch-trunk.patch, 
> YARN-4006-branch2.6.0.patch, sample-ats-alt-auth.patch
>
>
> When attempting to use The Hadoop Alternate Authentication Classes. They do 
> not exactly work with what was built with YARN-1935.
> I went ahead and made the following changes to support using a Custom 
> AltKerberos DelegationToken custom class.
> Changes to: TimelineAuthenticationFilterInitializer.class
> {code}
>String authType = filterConfig.get(AuthenticationFilter.AUTH_TYPE);
> LOG.info("AuthType Configured: "+authType);
> if (authType.equals(PseudoAuthenticationHandler.TYPE)) {
>   filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   PseudoDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: PseudoDelegationTokenAuthenticationHandler");
> } else if (authType.equals(KerberosAuthenticationHandler.TYPE) || 
> (UserGroupInformation.isSecurityEnabled() && 
> conf.get("hadoop.security.authentication").equals(KerberosAuthenticationHandler.TYPE)))
>  {
>   if (!(authType.equals(KerberosAuthenticationHandler.TYPE))) {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   authType);
> LOG.info("AuthType: "+authType);
>   } else {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   KerberosDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: KerberosDelegationTokenAuthenticationHandler");
>   } 
>   // Resolve _HOST into bind address
>   String bindAddress = conf.get(HttpServer2.BIND_ADDRESS);
>   String principal =
>   filterConfig.get(KerberosAuthenticationHandler.PRINCIPAL);
>   if (principal != null) {
> try {
>   principal = SecurityUtil.getServerPrincipal(principal, bindAddress);
> } catch (IOException ex) {
>   throw new RuntimeException(
>   "Could not resolve Kerberos principal name: " + ex.toString(), 
> ex);
> }
> filterConfig.put(KerberosAuthenticationHandler.PRINCIPAL,
> principal);
>   }
> }
>  {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5070) upgrade HBase version for first merge

2016-05-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280916#comment-15280916
 ] 

Sangjin Lee commented on YARN-5070:
---

Thanks [~enis] for the info. We're keenly interested in getting HBASE-13706, 
but it is committed only to 1.2. I'm lobbying for an inclusion in 1.1.x.

> upgrade HBase version for first merge
> -
>
> Key: YARN-5070
> URL: https://issues.apache.org/jira/browse/YARN-5070
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>
> Currently we set the HBase version for the timeline service storage to 1.0.1. 
> This is a fairly old version, and there are reasons to upgrade to a newer 
> version. We should upgrade it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4866) FairScheduler: AMs can consume all vcores leading to a livelock when using FAIR policy

2016-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280912#comment-15280912
 ] 

Hadoop QA commented on YARN-4866:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 1 new + 212 unchanged - 1 fixed = 213 total (was 213) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 37m 33s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 36m 9s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 96m 14s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage |
| JDK v1.7.0_95 Failed junit tests | 

[jira] [Resolved] (YARN-3760) Log aggregation failures

2016-05-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-3760.
---
Resolution: Cannot Reproduce

Without any logs, not sure we are can debug further, close.

> Log aggregation failures 
> -
>
> Key: YARN-3760
> URL: https://issues.apache.org/jira/browse/YARN-3760
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> The aggregated log file does not appear to be properly closed when writes 
> fail.  This leaves a lease renewer active in the NM that spams the NN with 
> lease renewals.  If the token is marked not to be cancelled, the renewals 
> appear to continue until the token expires.  If the token is cancelled, the 
> periodic renew spam turns into a flood of failed connections until the lease 
> renewer gives up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-2506) TimelineClient should NOT be in yarn-common project

2016-05-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-2506.
---
Resolution: Won't Fix

I think this is too old code to move. close

> TimelineClient should NOT be in yarn-common project
> ---
>
> Key: YARN-2506
> URL: https://issues.apache.org/jira/browse/YARN-2506
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>Priority: Critical
>
> YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't 
> belong there, we should move it back to yarn-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5074) RM cycles through container ids for an app that is waiting for resources.

2016-05-11 Thread Sidharta Seethana (JIRA)
Sidharta Seethana created YARN-5074:
---

 Summary: RM cycles through container ids for an app that is 
waiting for resources. 
 Key: YARN-5074
 URL: https://issues.apache.org/jira/browse/YARN-5074
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.2
Reporter: Sidharta Seethana


/cc [~wangda], [~vinodkv]

This was observed on a cluster running a 2.7.x build. Here is the scenario :

1. A YARN cluster has applications running that almost entirely consume the 
cluster, with little available resources.
2. A new app is submitted - the resources required for the AM exceed what is 
available in the cluster. The app stays in the 'ACCEPTED' state till resources 
are available.
3. Once resources are available and the AM container comes up, the AM container 
has a id that indicates that the RM has been cycling through containers. There 
are no errors in the logs of any kind. One example id for such an AM container 
is : container_e3788_1462916288781_0012_01_000302 . This indicates that while 
the app was in the 'ACCEPTED' state, the RM cycled through 301 containers. 







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2836) RM behaviour on token renewal failures is broken

2016-05-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2836:
--
Target Version/s: 2.9.0  (was: 2.8.0)
Priority: Major  (was: Blocker)

> RM behaviour on token renewal failures is broken
> 
>
> Key: YARN-2836
> URL: https://issues.apache.org/jira/browse/YARN-2836
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Found this while reviewing YARN-2834.
> We now completely ignore token renewal failures. For things like Timeline 
> tokens which are automatically obtained whether the app needs it or not (we 
> should fix this to be user driven), we can ignore failures. But for HDFS 
> Tokens etc, ignoring failures is bad because it (1) wastes resources as AMs 
> will continue and eventually fail (2) app doesn't know what happened it fails 
> eventually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4484) Available Resource calculation for a queue is not correct when used with labels

2016-05-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4484:
-
Target Version/s: 2.8.0

> Available Resource calculation for a queue is not correct when used with 
> labels
> ---
>
> Key: YARN-4484
> URL: https://issues.apache.org/jira/browse/YARN-4484
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4484.patch, 0002-YARN-4484.patch, 
> 0003-YARN-4484-v2.patch, 0003-YARN-4484.patch, 0004-YARN-4484.patch, 
> 0005-YARN-4484-rebased.patch, 0005-YARN-4484.patch
>
>
> To calculate available resource for a queue, we have to get the total 
> resource allocated for all labels in queue compare to its usage. 
> Also address the comments given in 
> [YARN-4304-comments|https://issues.apache.org/jira/browse/YARN-4304?focusedCommentId=15064874=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15064874
>  ] given by [~leftnoteasy] for same.
> ClusterMetrics related issues will also get handled once we fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4484) Available Resource calculation for a queue is not correct when used with labels

2016-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280860#comment-15280860
 ] 

Wangda Tan edited comment on YARN-4484 at 5/11/16 9:29 PM:
---

Uploaded patch to latest trunk to kick Jenkins.


was (Author: leftnoteasy):
Uploaded patch to latest trunk to tick Jenkins.

> Available Resource calculation for a queue is not correct when used with 
> labels
> ---
>
> Key: YARN-4484
> URL: https://issues.apache.org/jira/browse/YARN-4484
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4484.patch, 0002-YARN-4484.patch, 
> 0003-YARN-4484-v2.patch, 0003-YARN-4484.patch, 0004-YARN-4484.patch, 
> 0005-YARN-4484-rebased.patch, 0005-YARN-4484.patch
>
>
> To calculate available resource for a queue, we have to get the total 
> resource allocated for all labels in queue compare to its usage. 
> Also address the comments given in 
> [YARN-4304-comments|https://issues.apache.org/jira/browse/YARN-4304?focusedCommentId=15064874=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15064874
>  ] given by [~leftnoteasy] for same.
> ClusterMetrics related issues will also get handled once we fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4484) Available Resource calculation for a queue is not correct when used with labels

2016-05-11 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4484:
-
Attachment: 0005-YARN-4484-rebased.patch

Uploaded patch to latest trunk to tick Jenkins.

> Available Resource calculation for a queue is not correct when used with 
> labels
> ---
>
> Key: YARN-4484
> URL: https://issues.apache.org/jira/browse/YARN-4484
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4484.patch, 0002-YARN-4484.patch, 
> 0003-YARN-4484-v2.patch, 0003-YARN-4484.patch, 0004-YARN-4484.patch, 
> 0005-YARN-4484-rebased.patch, 0005-YARN-4484.patch
>
>
> To calculate available resource for a queue, we have to get the total 
> resource allocated for all labels in queue compare to its usage. 
> Also address the comments given in 
> [YARN-4304-comments|https://issues.apache.org/jira/browse/YARN-4304?focusedCommentId=15064874=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15064874
>  ] given by [~leftnoteasy] for same.
> ClusterMetrics related issues will also get handled once we fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1946) need Public interface for WebAppUtils.getProxyHostAndPort

2016-05-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1946:
--
Target Version/s: 2.9.0  (was: 2.8.0)
Priority: Major  (was: Critical)

> need Public interface for WebAppUtils.getProxyHostAndPort
> -
>
> Key: YARN-1946
> URL: https://issues.apache.org/jira/browse/YARN-1946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, webapp
>Affects Versions: 2.4.0
>Reporter: Thomas Graves
>
> ApplicationMasters are supposed to go through the ResourceManager web app 
> proxy if they have web UI's so they are properly secured.  There is currently 
> no public interface for Application Masters to conveniently get the proxy 
> host and port.  There is a function in WebAppUtils, but that class is 
> private.  
> We should provide this as a utility since any properly written AM will need 
> to do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2016-05-11 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280855#comment-15280855
 ] 

Yufei Gu commented on YARN-4090:


Another minor nit: there are two spaces between "synchronized" and "void" in 
{{public synchronized  void incResourceUsage(Resource res)}}. 


> Make Collections.sort() more efficient in FSParentQueue.java
> 
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, 
> YARN-4090.001.patch, sampling1.jpg, sampling2.jpg
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1815) RM should record final state for unmanaged AMs

2016-05-11 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1815:
--
Target Version/s: 2.9.0  (was: 2.8.0)
 Description: RM  should record final state for unmanaged AMs  (was: RM 
doesn't recover unmanaged AMs into its memory after restart)
 Summary: RM  should record final state for unmanaged AMs  (was: RM 
doesn't recover unmanaged AMs into its memory after restart)

Move to 2.9, with work-preserving restart in place, I think Unmanaged AM will 
work across RM restart, but just that the final state is not recorded. 

The title is incorrect, RM does reload Unmanaged AM after restart. 
Changed the title.

> RM  should record final state for unmanaged AMs
> ---
>
> Key: YARN-1815
> URL: https://issues.apache.org/jira/browse/YARN-1815
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Subru Krishnan
>Priority: Critical
> Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, 
> yarn-1815-2.patch, yarn-1815-2.patch
>
>
> RM  should record final state for unmanaged AMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5046) [Umbrella] Refactor scheduler code

2016-05-11 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280842#comment-15280842
 ] 

Karthik Kambatla commented on YARN-5046:


Thanks for picking this up, [~rchiang]. There are definitely various parts of 
the scheduler that could be moved to AbstractYarnScheduler. 

> [Umbrella] Refactor scheduler code
> --
>
> Key: YARN-5046
> URL: https://issues.apache.org/jira/browse/YARN-5046
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: capacity scheduler, fairscheduler, resourcemanager, 
> scheduler
>Affects Versions: 3.0.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: technical_debt
>
> At this point in time, there are several places where code common to the 
> schedulers can be moved from one or more of the schedulers into 
> AbstractYARNScheduler or a related interface.
> Creating this umbrella JIRA to track this refactoring.  In general, it is 
> preferable to create a subtask JIRA on a per-method basis.
> This may need some coordination with [YARN-3091  \[Umbrella\] Improve and fix 
> locks of RM scheduler|https://issues.apache.org/jira/browse/YARN-3091].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5053) More informative diagnostics when applications killed by a user

2016-05-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280835#comment-15280835
 ] 

Jason Lowe commented on YARN-5053:
--

+1 lgtm.  Will commit this tomorrow if there are no objections.

> More informative diagnostics when applications killed by a user
> ---
>
> Key: YARN-5053
> URL: https://issues.apache.org/jira/browse/YARN-5053
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Eric Badger
> Attachments: YARN-5053.001.patch, YARN-5053.002.patch, 
> YARN-5053.003.patch, YARN-5053.004.patch, YARN-5053.005.patch
>
>
> When an application kill request is processed by the ClientRMService it sets 
> the diagnostics to "Application killed by user".  It would be nice to report 
> the user and host that issued the kill request in the app diagnostics so it 
> is clear where the kill originated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5070) upgrade HBase version for first merge

2016-05-11 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280808#comment-15280808
 ] 

Enis Soztutar commented on YARN-5070:
-

Phoenix will support HBase-1.2.0 only in upcoming 4.8.0 releases. Right now it 
works with 1.1.x. HBase-1.1.x is also a pretty stable code base that if you do 
not want to wait for the phoenix release. 

> upgrade HBase version for first merge
> -
>
> Key: YARN-5070
> URL: https://issues.apache.org/jira/browse/YARN-5070
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>
> Currently we set the HBase version for the timeline service storage to 1.0.1. 
> This is a fairly old version, and there are reasons to upgrade to a newer 
> version. We should upgrade it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2016-05-11 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280807#comment-15280807
 ] 

Karthik Kambatla commented on YARN-4599:


Yes, I meant containers. 

My proposal was to have a single parent cgroup for all Yarn containers. When 
that goes over the hard limit, all containers are paused. The NM should 
identify a victim for preemption, it could use different heuristics to 
prioritize between different containers all over their individual limits. 

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5053) More informative diagnostics when applications killed by a user

2016-05-11 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280787#comment-15280787
 ] 

Sunil G commented on YARN-5053:
---

Thanks for the clarification. It makes sense to me.

> More informative diagnostics when applications killed by a user
> ---
>
> Key: YARN-5053
> URL: https://issues.apache.org/jira/browse/YARN-5053
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Eric Badger
> Attachments: YARN-5053.001.patch, YARN-5053.002.patch, 
> YARN-5053.003.patch, YARN-5053.004.patch, YARN-5053.005.patch
>
>
> When an application kill request is processed by the ClientRMService it sets 
> the diagnostics to "Application killed by user".  It would be nice to report 
> the user and host that issued the kill request in the app diagnostics so it 
> is clear where the kill originated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4515) [YARN-3368] Support hosting web UI framework inside YARN RM

2016-05-11 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4515:
--
Attachment: YARN-4515-YARN-3368.2.patch

Uploading a rebased patch. This is still combined with YARN-5000 for easiness 
in testing. Will attach an independent patch after a round of test/review.

> [YARN-3368] Support hosting web UI framework inside YARN RM
> ---
>
> Key: YARN-4515
> URL: https://issues.apache.org/jira/browse/YARN-4515
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-4515.patch, YARN-4515-YARN-3368.1.patch, 
> YARN-4515-YARN-3368.2.patch, preliminary-YARN-4515-host_rm_web_ui_v2.patch
>
>
> Currently it can be only launched outside of YARN, we should make it runnable 
> inside YARN for easier testing and we should have a configuration to 
> enable/disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4635) Add global blacklist tracking for AM container failure.

2016-05-11 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280745#comment-15280745
 ] 

Jian He commented on YARN-4635:
---

I guess this is a bit late for 2.8, let's move to 2.9 ?

> Add global blacklist tracking for AM container failure.
> ---
>
> Key: YARN-4635
> URL: https://issues.apache.org/jira/browse/YARN-4635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4635-v2.patch, YARN-4635.patch
>
>
> We need a global blacklist in addition to each app’s blacklist to track AM 
> container failures in global 
> affection. That means we need to differentiate the non­-succeed 
> ContainerExitStatus reasoning from 
> NM or more related to App. 
> For more details, please refer the document in YARN-4576.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases

2016-05-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280731#comment-15280731
 ] 

Jason Lowe commented on YARN-4325:
--

Thanks, Junping!  The test failure is related.  In addition to the javac 
warning that should be cleaned up, it looks like there's an unlikely code path 
in NonAggregatingLogHandler where if we fail to lookup the appId then it 
doesn't respond to the APPLICATION_FINISHED event.

> Purge app state from NM state-store should cover more LOG_HANDLING cases
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: ApplicationImpl.PNG, YARN-4325-v1.1.patch, 
> YARN-4325-v1.patch, YARN-4325-v2.patch, YARN-4325.patch
>
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. 
> After investigating, there are three issues cause app state leak in NM 
> state-store:
> 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in 
> NMStateStore.
> 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit 
> aggregator's doAppLogAggregation() exception case.
> 3. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has 
> transition to remove app in NM state store. Application in other status - 
> like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to 
> remove this app from NM state store even after app get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5053) More informative diagnostics when applications killed by a user

2016-05-11 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280729#comment-15280729
 ] 

Daniel Templeton commented on YARN-5053:


I can answer that one.  He's referring to {{org.apache.hadoop.ipc.Server}}.

> More informative diagnostics when applications killed by a user
> ---
>
> Key: YARN-5053
> URL: https://issues.apache.org/jira/browse/YARN-5053
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Eric Badger
> Attachments: YARN-5053.001.patch, YARN-5053.002.patch, 
> YARN-5053.003.patch, YARN-5053.004.patch, YARN-5053.005.patch
>
>
> When an application kill request is processed by the ClientRMService it sets 
> the diagnostics to "Application killed by user".  It would be nice to report 
> the user and host that issued the kill request in the app diagnostics so it 
> is clear where the kill originated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5053) More informative diagnostics when applications killed by a user

2016-05-11 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280724#comment-15280724
 ] 

Eric Badger commented on YARN-5053:
---

[~sunilg], my attempt is to print out the address from where the kill request 
came. However, I could definitely be wrong with what function I need to use to 
achieve that. I was mirroring the diagnostics handling from MRClientService, 
where it calls Server.getRemoteAddress(). Is this incorrect? 

> More informative diagnostics when applications killed by a user
> ---
>
> Key: YARN-5053
> URL: https://issues.apache.org/jira/browse/YARN-5053
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Eric Badger
> Attachments: YARN-5053.001.patch, YARN-5053.002.patch, 
> YARN-5053.003.patch, YARN-5053.004.patch, YARN-5053.005.patch
>
>
> When an application kill request is processed by the ClientRMService it sets 
> the diagnostics to "Application killed by user".  It would be nice to report 
> the user and host that issued the kill request in the app diagnostics so it 
> is clear where the kill originated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4515) [YARN-3368] Support hosting web UI framework inside YARN RM

2016-05-11 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4515:
--
Attachment: YARN-4515-YARN-3368.1.patch

A combined patch with YARN-5000. This will help to test UI overall. But I found 
that its  not rebased to latest branch (YARN-5019 url hyphen fix). I will now 
work on rebasing the same.
[~leftnoteasy], pls help to check the same w/o YARN-5019 to test the same 
immediately. I will upload a clean patch in some hours.

> [YARN-3368] Support hosting web UI framework inside YARN RM
> ---
>
> Key: YARN-4515
> URL: https://issues.apache.org/jira/browse/YARN-4515
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-4515.patch, YARN-4515-YARN-3368.1.patch, 
> preliminary-YARN-4515-host_rm_web_ui_v2.patch
>
>
> Currently it can be only launched outside of YARN, we should make it runnable 
> inside YARN for easier testing and we should have a configuration to 
> enable/disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5049) Extend NMStateStore to save queued container information

2016-05-11 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280707#comment-15280707
 ] 

Arun Suresh commented on YARN-5049:
---

Looks good..
+1 pending Jenkins..

> Extend NMStateStore to save queued container information
> 
>
> Key: YARN-5049
> URL: https://issues.apache.org/jira/browse/YARN-5049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-5049.001.patch, YARN-5049.002.patch, 
> YARN-5049.003.patch
>
>
> This JIRA is about extending the NMStateStore to save queued container 
> information whenever a new container is added to the NM queue. 
> It also removes the information from the state store when the queued 
> container starts its execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5049) Extend NMStateStore to save queued container information

2016-05-11 Thread Konstantinos Karanasos (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Karanasos updated YARN-5049:
-
Attachment: YARN-5049.003.patch

Thanks for the review, [~asuresh]!

Fixed the VERSION problem you mentioned and attaching new version of the patch.

> Extend NMStateStore to save queued container information
> 
>
> Key: YARN-5049
> URL: https://issues.apache.org/jira/browse/YARN-5049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-5049.001.patch, YARN-5049.002.patch, 
> YARN-5049.003.patch
>
>
> This JIRA is about extending the NMStateStore to save queued container 
> information whenever a new container is added to the NM queue. 
> It also removes the information from the state store when the queued 
> container starts its execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases

2016-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280689#comment-15280689
 ] 

Hadoop QA commented on YARN-4325:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 2m 38s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.8.0_91
 with JDK v1.8.0_91 generated 1 new + 15 unchanged - 0 fixed = 16 total (was 
15) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 4s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.7.0_95
 with JDK v1.7.0_95 generated 1 new + 17 unchanged - 0 fixed = 18 total (was 
17) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 41s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 51s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 40m 0s {color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_91 Failed junit tests | 

[jira] [Commented] (YARN-5053) More informative diagnostics when applications killed by a user

2016-05-11 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280669#comment-15280669
 ] 

Sunil G commented on YARN-5053:
---

HI [~ebadger]
I have one doubt here, please correct me if I am wrong.
{code}
   if(null != Server.getRemoteAddress()) {
  message = message.concat(" at " + Server.getRemoteAddress());
}
{code}
Are you intending to refer to {{getServer()}} or {{this.server}}?

> More informative diagnostics when applications killed by a user
> ---
>
> Key: YARN-5053
> URL: https://issues.apache.org/jira/browse/YARN-5053
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Eric Badger
> Attachments: YARN-5053.001.patch, YARN-5053.002.patch, 
> YARN-5053.003.patch, YARN-5053.004.patch, YARN-5053.005.patch
>
>
> When an application kill request is processed by the ClientRMService it sets 
> the diagnostics to "Application killed by user".  It would be nice to report 
> the user and host that issued the kill request in the app diagnostics so it 
> is clear where the kill originated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3344) procfs stat file is not in the expected format warning

2016-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280663#comment-15280663
 ] 

Hadoop QA commented on YARN-3344:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 8s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: patch 
generated 4 new + 153 unchanged - 7 fixed = 157 total (was 160) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 4s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 20s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 2s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:cf2ee45 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12788521/YARN-3344.06.patch |
| JIRA Issue | YARN-3344 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 8e2c8b2af1d6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2016-05-11 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280653#comment-15280653
 ] 

Yufei Gu commented on YARN-4090:


Thanks, [~xinxianyin]! Looks really good. All three previous test failures are
solved. The override of {{move()}} is a reasonable solution. 

Minor nit, do you mean "bring down" or "decrease" when you said "write
down" in this comment? 

  // do not decResource when the container exited in the preemptionMap
  // before because we have written down the resource when adding the
  // container to preemptionMap in this#addPreemption.
  
[~kasha], wanna take a look?

> Make Collections.sort() more efficient in FSParentQueue.java
> 
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
> Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, 
> YARN-4090.001.patch, sampling1.jpg, sampling2.jpg
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5029) RM needs to send update event with YarnApplicationState as Running to ATS/AHS

2016-05-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280620#comment-15280620
 ] 

Hudson commented on YARN-5029:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9746 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9746/])
YARN-5029. RM needs to send update event with YarnApplicationState as 
(junping_du: rev 39f2bac38b111f90d3402229201cdb4315f5d4f5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicaitonStateUpdatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java


> RM needs to send update event with YarnApplicationState as Running to ATS/AHS
> -
>
> Key: YARN-5029
> URL: https://issues.apache.org/jira/browse/YARN-5029
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: YARN-5029.1.patch, YARN-5029.2.patch
>
>
> Right now, Application in AHS/ATS is alway in ACCEPTED state until the 
> application finishes/Fails/is killed. This is because RM did not send any 
> other YarnApplicationState information, except FINISHED/FAILED/KILLED, to 
> ATS.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5053) More informative diagnostics when applications killed by a user

2016-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280614#comment-15280614
 ] 

Hadoop QA commented on YARN-5053:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 37m 0s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 36m 13s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 94m 2s {color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 

[jira] [Updated] (YARN-5072) Support comma separated list of includes and excludes files

2016-05-11 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-5072:
--
Description: 
When a yarn cluster shares the same hosts as the underlying HDFS cluster, we 
have {{yarn.resourcemanager.nodes.include-path}} point to the same file or 
symlink of the {{dfs.hosts}} file used by HDFS to make admin easier.

If we want to set up a yarn cluster to run on the same hosts of several HDFS 
clusters combined, it means {{yarn.resourcemanager.nodes.include-path}} should 
be able to point to a list of files each of which belongs to one HDFS cluster.

Backward compatibility, it seems ok to continue to reuse 
{{yarn.resourcemanager.nodes.include-path}} as long as it can still take a 
single file. 

  was:
Normally a yarn cluster shares the same hosts as the underlying HDFS cluster. 
To make admin easier, we have {{yarn.resourcemanager.nodes.include-path}} point 
to the same file or symlink of the {{dfs.hosts}} file used by HDFS.

If we want to set up a yarn cluster to run on the same hosts of several HDFS 
clusters combined, it means {{yarn.resourcemanager.nodes.include-path}} should 
be able to point to a list of files each of which belongs to one HDFS cluster.

Backward compatibility, it seems ok to continue to reuse 
{{yarn.resourcemanager.nodes.include-path}} as long as it can still take a 
single file. 


> Support comma separated list of includes and excludes files
> ---
>
> Key: YARN-5072
> URL: https://issues.apache.org/jira/browse/YARN-5072
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>
> When a yarn cluster shares the same hosts as the underlying HDFS cluster, we 
> have {{yarn.resourcemanager.nodes.include-path}} point to the same file or 
> symlink of the {{dfs.hosts}} file used by HDFS to make admin easier.
> If we want to set up a yarn cluster to run on the same hosts of several HDFS 
> clusters combined, it means {{yarn.resourcemanager.nodes.include-path}} 
> should be able to point to a list of files each of which belongs to one HDFS 
> cluster.
> Backward compatibility, it seems ok to continue to reuse 
> {{yarn.resourcemanager.nodes.include-path}} as long as it can still take a 
> single file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5072) Support comma separated list of includes and excludes files

2016-05-11 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280599#comment-15280599
 ] 

Ravi Prakash commented on YARN-5072:


Just fyi, not all of us run with the same nodes for HDFS and YARN.

> Support comma separated list of includes and excludes files
> ---
>
> Key: YARN-5072
> URL: https://issues.apache.org/jira/browse/YARN-5072
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>
> Normally a yarn cluster shares the same hosts as the underlying HDFS cluster. 
> To make admin easier, we have {{yarn.resourcemanager.nodes.include-path}} 
> point to the same file or symlink of the {{dfs.hosts}} file used by HDFS.
> If we want to set up a yarn cluster to run on the same hosts of several HDFS 
> clusters combined, it means {{yarn.resourcemanager.nodes.include-path}} 
> should be able to point to a list of files each of which belongs to one HDFS 
> cluster.
> Backward compatibility, it seems ok to continue to reuse 
> {{yarn.resourcemanager.nodes.include-path}} as long as it can still take a 
> single file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2888) Corrective mechanisms for rebalancing NM container queues

2016-05-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280587#comment-15280587
 ] 

Hadoop QA commented on YARN-2888:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 12s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
47s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 58s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
54s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 45s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common in 
trunk has 3 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 48s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 45s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_91 with JDK v1.8.0_91 
generated 1 new + 22 unchanged - 0 fixed = 23 total (was 22) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 5m 52s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.7.0_95 with JDK v1.7.0_95 
generated 1 new + 25 unchanged - 0 fixed = 26 total (was 25) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 41s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 12 new + 
468 unchanged - 65 fixed = 480 total (was 533) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 17s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 4s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 

[jira] [Updated] (YARN-4866) FairScheduler: AMs can consume all vcores leading to a livelock when using FAIR policy

2016-05-11 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-4866:
---
Attachment: YARN-4866.003.patch

> FairScheduler: AMs can consume all vcores leading to a livelock when using 
> FAIR policy
> --
>
> Key: YARN-4866
> URL: https://issues.apache.org/jira/browse/YARN-4866
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
> Attachments: YARN-4866.001.patch, YARN-4866.002.patch, 
> YARN-4866.003.patch
>
>
> The maxAMShare uses the queue's policy for enforcing limits. When using FAIR 
> policy, this considers only memory. If there are fewer vcores on the cluster, 
> the AMs can end up taking all the vcores leading to a livelock. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-05-11 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280472#comment-15280472
 ] 

Xuan Gong commented on YARN-4577:
-

[~vvasudev], [~sjlee0]
Sorry about that. I have uploaded a new patch for this

> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch, 
> YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, 
> YARN-4577.20160428.patch, YARN-4577.20160509.patch, YARN-4577.20160510.patch, 
> YARN-4577.20160511.patch, YARN-4577.3.patch, YARN-4577.3.rebase.patch, 
> YARN-4577.4.patch, YARN-4577.5.patch, YARN-4577.poc.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4866) FairScheduler: AMs can consume all vcores leading to a livelock when using FAIR policy

2016-05-11 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-4866:
---
Attachment: (was: YARN-4866.003.patch)

> FairScheduler: AMs can consume all vcores leading to a livelock when using 
> FAIR policy
> --
>
> Key: YARN-4866
> URL: https://issues.apache.org/jira/browse/YARN-4866
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
> Attachments: YARN-4866.001.patch, YARN-4866.002.patch
>
>
> The maxAMShare uses the queue's policy for enforcing limits. When using FAIR 
> policy, this considers only memory. If there are fewer vcores on the cluster, 
> the AMs can end up taking all the vcores leading to a livelock. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-05-11 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4577:

Attachment: YARN-4577.20160511.patch

> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch, 
> YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, 
> YARN-4577.20160428.patch, YARN-4577.20160509.patch, YARN-4577.20160510.patch, 
> YARN-4577.20160511.patch, YARN-4577.3.patch, YARN-4577.3.rebase.patch, 
> YARN-4577.4.patch, YARN-4577.5.patch, YARN-4577.poc.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4866) FairScheduler: AMs can consume all vcores leading to a livelock when using FAIR policy

2016-05-11 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-4866:
---
Attachment: YARN-4866.003.patch

> FairScheduler: AMs can consume all vcores leading to a livelock when using 
> FAIR policy
> --
>
> Key: YARN-4866
> URL: https://issues.apache.org/jira/browse/YARN-4866
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
> Attachments: YARN-4866.001.patch, YARN-4866.002.patch, 
> YARN-4866.003.patch
>
>
> The maxAMShare uses the queue's policy for enforcing limits. When using FAIR 
> policy, this considers only memory. If there are fewer vcores on the cluster, 
> the AMs can end up taking all the vcores leading to a livelock. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases

2016-05-11 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4325:
-
Attachment: YARN-4325-v2.patch

Okay. Your proposal sounds more clean.
I update v2 patch. Can you take a look at again? Thx!

> Purge app state from NM state-store should cover more LOG_HANDLING cases
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: ApplicationImpl.PNG, YARN-4325-v1.1.patch, 
> YARN-4325-v1.patch, YARN-4325-v2.patch, YARN-4325.patch
>
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. 
> After investigating, there are three issues cause app state leak in NM 
> state-store:
> 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in 
> NMStateStore.
> 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit 
> aggregator's doAppLogAggregation() exception case.
> 3. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has 
> transition to remove app in NM state store. Application in other status - 
> like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to 
> remove this app from NM state store even after app get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5053) More informative diagnostics when applications killed by a user

2016-05-11 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5053:
--
Attachment: YARN-5053.005.patch

[~templedf], the last few patches have been small fixes. One was a checkstyle 
thing, the other was a findbugs. This patch has the changes that you suggested. 

> More informative diagnostics when applications killed by a user
> ---
>
> Key: YARN-5053
> URL: https://issues.apache.org/jira/browse/YARN-5053
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Eric Badger
> Attachments: YARN-5053.001.patch, YARN-5053.002.patch, 
> YARN-5053.003.patch, YARN-5053.004.patch, YARN-5053.005.patch
>
>
> When an application kill request is processed by the ClientRMService it sets 
> the diagnostics to "Application killed by user".  It would be nice to report 
> the user and host that issued the kill request in the app diagnostics so it 
> is clear where the kill originated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3344) procfs stat file is not in the expected format warning

2016-05-11 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280359#comment-15280359
 ] 

Daniel Templeton commented on YARN-3344:


Looks good to me in general.  Thanks for picking the torch back up, [~ajisakaa].

In the regex, should we tighten the {{.}} to {{[^)]}}?  Since there is only one 
close paren in the format, {{.}} is technically fine, but I find it generally 
better to avoid {{.}} unless you really need it.


> procfs stat file is not in the expected format warning
> --
>
> Key: YARN-3344
> URL: https://issues.apache.org/jira/browse/YARN-3344
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jon Bringhurst
>Assignee: Akira AJISAKA
> Attachments: YARN-3344-trunk.005.patch, YARN-3344.06.patch
>
>
> Although this doesn't appear to be causing any functional issues, it is 
> spamming our log files quite a bit. :)
> It appears that the regex in ProcfsBasedProcessTree doesn't work for all 
> /proc//stat files.
> Here's the error I'm seeing:
> {noformat}
> "source_host": "asdf",
> "method": "constructProcessInfo",
> "level": "WARN",
> "message": "Unexpected: procfs stat file is not in the expected format 
> for process with pid 6953"
> "file": "ProcfsBasedProcessTree.java",
> "line_number": "514",
> "class": "org.apache.hadoop.yarn.util.ProcfsBasedProcessTree",
> {noformat}
> And here's the basic info on process with pid 6953:
> {noformat}
> [asdf ~]$ cat /proc/6953/stat
> 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 
> 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 
> 2 18446744073709551615 0 0 17 13 0 0 0 0 0
> [asdf ~]$ ps aux|grep 6953
> root  6953  0.0  0.0 200484 23424 ?S21:44   0:00 python2.6 
> /export/apps/salt/minion-scripts/module-sync.py
> jbringhu 13481  0.0  0.0 105312   872 pts/0S+   22:13   0:00 grep -i 6953
> [asdf ~]$ 
> {noformat}
> This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5039) Applications ACCEPTED but not starting

2016-05-11 Thread Miles Crawford (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miles Crawford resolved YARN-5039.
--
Resolution: Not A Bug

> Applications ACCEPTED but not starting
> --
>
> Key: YARN-5039
> URL: https://issues.apache.org/jira/browse/YARN-5039
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Miles Crawford
> Attachments: Screen Shot 2016-05-04 at 1.57.19 PM.png, Screen Shot 
> 2016-05-04 at 2.41.22 PM.png, capacity-scheduler-at-debug.log.gz, 
> queue-config.log, resource-manager-application-starts.log.gz, 
> whole-scheduler-at-debug.log.gz, 
> yarn-yarn-resourcemanager-ip-10-12-47-144.log.gz
>
>
> Often when we submit applications to an incompletely utilized cluster, they 
> sit, unable to start for no apparent reason.
> There are multiple nodes in the cluster with available resources, but the 
> resourcemanger logs show that scheduling is being skipped. The scheduling is 
> skipped because the application itself has reserved the node? I'm not sure 
> how to interpret this log output:
> {code}
> 2016-05-04 20:19:21,315 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:21,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used= vCores:33> cluster=
> 2016-05-04 20:19:21,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_01
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-53.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used= vCores:33> cluster=
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-53.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_01
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used= vCores:33> cluster=
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_01
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (YARN-5039) Applications ACCEPTED but not starting

2016-05-11 Thread Miles Crawford (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280357#comment-15280357
 ] 

Miles Crawford commented on YARN-5039:
--

Okay, wow that is wonderful to know. We have a highly variable workload and 
make almost no use of HDFS, so we use a tiny number of CORE nodes and remove 
TASK nodes when things are idle to save on costs. We do not use spot instances 
at all (because of https://issues.apache.org/jira/browse/SPARK-14209)

I cannot seem to find any mention of this behavior in the EMR documentation, so 
it's a bit of a blindside.

Additionally, the Node Labels page of the Hadoop UI does not distinguish 
between spot and task, so I wasn't even aware labeling was going on.

I guess things are working as designed, so I'm sorry to take up all your time. 
Thanks very much for helping. I think I'll follow up with AWS and request a 
documentation fix.

> Applications ACCEPTED but not starting
> --
>
> Key: YARN-5039
> URL: https://issues.apache.org/jira/browse/YARN-5039
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Miles Crawford
> Attachments: Screen Shot 2016-05-04 at 1.57.19 PM.png, Screen Shot 
> 2016-05-04 at 2.41.22 PM.png, capacity-scheduler-at-debug.log.gz, 
> queue-config.log, resource-manager-application-starts.log.gz, 
> whole-scheduler-at-debug.log.gz, 
> yarn-yarn-resourcemanager-ip-10-12-47-144.log.gz
>
>
> Often when we submit applications to an incompletely utilized cluster, they 
> sit, unable to start for no apparent reason.
> There are multiple nodes in the cluster with available resources, but the 
> resourcemanger logs show that scheduling is being skipped. The scheduling is 
> skipped because the application itself has reserved the node? I'm not sure 
> how to interpret this log output:
> {code}
> 2016-05-04 20:19:21,315 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:21,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used= vCores:33> cluster=
> 2016-05-04 20:19:21,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_01
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-53.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used= vCores:33> cluster=
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-53.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_01
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.7126589, 
> 

[jira] [Commented] (YARN-4996) Make TestNMReconnect.testCompareRMNodeAfterReconnect() scheduler agnostic, or better yet parameterized

2016-05-11 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280355#comment-15280355
 ] 

Daniel Templeton commented on YARN-4996:


Do you think you'd be willing to move the scheduler creation code into a method 
in the {{ParameterizedSchedulerTestBase}} class?  I'm concerned that if someone 
decides to add another scheduler type later, the code in this test class might 
not be updated.

> Make TestNMReconnect.testCompareRMNodeAfterReconnect() scheduler agnostic, or 
> better yet parameterized
> --
>
> Key: YARN-4996
> URL: https://issues.apache.org/jira/browse/YARN-4996
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, test
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Kai Sasaki
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-4996.01.patch, YARN-4996.02.patch, 
> YARN-4996.03.patch
>
>
> The test tests only the capacity scheduler.  It should also test fair 
> scheduler.  At a bare minimum, it should use the default scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5053) More informative diagnostics when applications killed by a user

2016-05-11 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280346#comment-15280346
 ] 

Daniel Templeton commented on YARN-5053:


Maybe I'm going blind, but I don't see any difference in the last three patches.

Also, sorry to reneg, but apparently I was more jetlagged than I thought.  I 
see two things to fix:

# In the string declaration, the second line should only be indented 4
# Instead of {{String.concat()}}, can we just use {{+=}}?

> More informative diagnostics when applications killed by a user
> ---
>
> Key: YARN-5053
> URL: https://issues.apache.org/jira/browse/YARN-5053
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Eric Badger
> Attachments: YARN-5053.001.patch, YARN-5053.002.patch, 
> YARN-5053.003.patch, YARN-5053.004.patch
>
>
> When an application kill request is processed by the ClientRMService it sets 
> the diagnostics to "Application killed by user".  It would be nice to report 
> the user and host that issued the kill request in the app diagnostics so it 
> is clear where the kill originated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5069) TestFifoScheduler.testResourceOverCommit race condition

2016-05-11 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280334#comment-15280334
 ] 

Eric Badger commented on YARN-5069:
---

[~eepayne], you committed [YARN-4556], which is similar to this. Can you review 
this patch? 

> TestFifoScheduler.testResourceOverCommit race condition
> ---
>
> Key: YARN-5069
> URL: https://issues.apache.org/jira/browse/YARN-5069
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-5069-b2.7.001.patch, YARN-5069.001.patch
>
>
> There is a race condition between updating the node resources and the node 
> report becoming available. If the update takes too long, the report will be 
> set to null and we will get an NPE when checking the report's available 
> resources. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5039) Applications ACCEPTED but not starting

2016-05-11 Thread Daniel Zhi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280259#comment-15280259
 ] 

Daniel Zhi commented on YARN-5039:
--

I am OOO so didn't really dive deep. EMR has logic to only schedule application 
master on CORE nodes by default (So to avoid SPOT instance termination if 
customer use SPOT for TASK nodes etc). I guess this might delay the start of 
application is all CORE slots are occupied.

This behavior could changed with "yarn.app.mapreduce.am.labels", which should 
have default value "CORE", but could be customized as "CORE,TASK" to allow 
MRAppMaster on task node.

> Applications ACCEPTED but not starting
> --
>
> Key: YARN-5039
> URL: https://issues.apache.org/jira/browse/YARN-5039
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Miles Crawford
> Attachments: Screen Shot 2016-05-04 at 1.57.19 PM.png, Screen Shot 
> 2016-05-04 at 2.41.22 PM.png, capacity-scheduler-at-debug.log.gz, 
> queue-config.log, resource-manager-application-starts.log.gz, 
> whole-scheduler-at-debug.log.gz, 
> yarn-yarn-resourcemanager-ip-10-12-47-144.log.gz
>
>
> Often when we submit applications to an incompletely utilized cluster, they 
> sit, unable to start for no apparent reason.
> There are multiple nodes in the cluster with available resources, but the 
> resourcemanger logs show that scheduling is being skipped. The scheduling is 
> skipped because the application itself has reserved the node? I'm not sure 
> how to interpret this log output:
> {code}
> 2016-05-04 20:19:21,315 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:21,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used= vCores:33> cluster=
> 2016-05-04 20:19:21,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_01
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-53.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used= vCores:33> cluster=
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-53.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_01
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used= vCores:33> cluster=
> 2016-05-04 20:19:22,316 INFO 
> 

[jira] [Commented] (YARN-4913) Yarn logs should take a -out option to write to a directory

2016-05-11 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280206#comment-15280206
 ] 

Varun Vasudev commented on YARN-4913:
-

[~xgong] - the latest patch is not entirely there. It creates a file per node, 
but what's really required is a file per container. What's required is to 
create a directory per node and for each node, generate container logs as 
individual files per node. Did that make sense?

> Yarn logs should take a -out option to write to a directory
> ---
>
> Key: YARN-4913
> URL: https://issues.apache.org/jira/browse/YARN-4913
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4913.1.patch, YARN-4913.2.patch, YARN-4913.3.patch, 
> YARN-4913.4.patch, YARN-4913.5.1.patch, YARN-4913.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4882) Change the log level to DEBUG for recovering completed applications

2016-05-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280214#comment-15280214
 ] 

Jason Lowe commented on YARN-4882:
--

The main motiviation of proposing a separate logger is to allow finer control 
in cases where someone wants to see this recovery output but not enable 
DEBUG/TRACE for the entire module.  We did something similar in the MapReduce 
ShuffleHandler so we could audit shuffle transfers without enabling debug for 
all ShuffleHandler operations (like verifying HTTP headers, etc.).  Admins can 
then configure it to work like it does today (i.e.: enable DEBUG/TRACE on the 
new logger and leave it going to the same log file as everything else), use a 
separate log file for that logger, or leave it disabled.

IIRC recovered completed applications are re-logged to the RM audit logger, so 
we may not really need another log at least for the completed case.  I'm OK if 
we want to lower successful recoveries to the DEBUG/TRACE level even without a 
separate logger for them.


> Change the log level to DEBUG for recovering completed applications
> ---
>
> Key: YARN-4882
> URL: https://issues.apache.org/jira/browse/YARN-4882
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Daniel Templeton
>
> I think for recovering completed applications no need to log as INFO, rather 
> it can be made it as DEBUG.  The problem seen from large cluster is if any 
> issue happens during RM start up and continuously switching , then  RM logs 
> are filled with most with recovering applications only. 
> There are 6 lines are logged for 1 applications as I shown in below logs, 
> then consider RM default value for max-completed applications is 10K. So for 
> each switch 10K*6=60K lines will be added which is not useful I feel.
> {noformat}
> 2016-03-01 10:20:59,077 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Default priority 
> level is set to application:application_1456298208485_21507
> 2016-03-01 10:20:59,094 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering 
> app: application_1456298208485_21507 with 1 attempts and final state = 
> FINISHED
> 2016-03-01 10:20:59,100 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Recovering attempt: appattempt_1456298208485_21507_01 with final state: 
> FINISHED
> 2016-03-01 10:20:59,107 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1456298208485_21507_01 State change from NEW to FINISHED
> 2016-03-01 10:20:59,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
> application_1456298208485_21507 State change from NEW to FINISHED
> 2016-03-01 10:20:59,112 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=rohith   
> OPERATION=Application Finished - Succeeded  TARGET=RMAppManager 
> RESULT=SUCCESS  APPID=application_1456298208485_21507
> {noformat}
> The main problem is missing important information's from the logs before RM 
> unstable. Even though log roll back is 50 or 100, in a short period all these 
> logs will be rolled out and all the logs contains only RM switching 
> information that too recovering applications!!. 
> I suggest at least completed applications recovery should be logged as DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5039) Applications ACCEPTED but not starting

2016-05-11 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280191#comment-15280191
 ] 

Nathan Roberts commented on YARN-5039:
--

Thanks [~milesc]. This seems to be an Amazon emr thing (unless I'm 
misunderstanding the log messages). 

Here are the important pieces:

Every time the scheduler is trying to schedule on a node with sufficient room, 
it is bailing out claiming it's not on the right type of emr node:
{noformat}
# egrep -i "node being looked for|is excluded" whole-scheduler-at-debug.log
2016-05-11 00:55:46,818 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
 (ResourceManager Event Processor): Node being looked for scheduling 
ip-10-12-40-239.us-west-2.compute.internal:8041 availableResource: 

2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerAppUtils 
(ResourceManager Event Processor): node 
ip-10-12-40-239.us-west-2.compute.internal with emrlabel:TASK is excluded to 
request with emrLabel:MASTER,CORE
{noformat}

And below you see it consider the 0041 application and everything looks 
promising until the node is excluded. This is an emr-specific check which is 
why it wasn't making a lot of sense as to how this could happen.
{noformat}
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): pre-assignContainers for application 
application_1462722347496_0041
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): User limit computation for ai2service in 
queue default userLimit=100 userLimitFactor=1.0 required:  consumed:  limit:  
queueCapacity:  qconsumed:  currentCapacity:  activeUsers: 1 
clusterCapacity: 
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt
 (ResourceManager Event Processor): showRequests: 
application=application_1462722347496_0041 headRoom= 
currentConsumption=0
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt
 (ResourceManager Event Processor): showRequests: 
application=application_1462722347496_0041 request={Priority: 0, Capability: 
, # Containers: 1, Location: *, Relax Locality: true}
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): needsContainers: app.#re-reserve=636 
reserved=2 nodeFactor=0.20974576 minAllocFactor=0.99986756 starvation=251
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): User limit computation for ai2service in 
queue default userLimit=100 userLimitFactor=1.0 required:  consumed:  limit:  
queueCapacity:  qconsumed:  currentCapacity:  activeUsers: 1 
clusterCapacity: 
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): Headroom calculation for user ai2service:  
userLimit= queueMaxAvailRes= consumed= headroom=
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerAppUtils 
(ResourceManager Event Processor): node 
ip-10-12-40-239.us-west-2.compute.internal with emrlabel:TASK is excluded to 
request with emrLabel:MASTER,CORE
{noformat}

I suspect EMR is not wanting to schedule AMs on nodes that are more likely to 
go away (TASK nodes). Once it gets the AM running though, it takes off. 
 
Maybe someone from Amazon can chime-in?? cc [~danzhi]


> Applications ACCEPTED but not starting
> --
>
> Key: YARN-5039
> URL: https://issues.apache.org/jira/browse/YARN-5039
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Miles Crawford
> Attachments: Screen Shot 2016-05-04 at 1.57.19 PM.png, Screen Shot 
> 2016-05-04 at 2.41.22 PM.png, capacity-scheduler-at-debug.log.gz, 
> queue-config.log, resource-manager-application-starts.log.gz, 
> whole-scheduler-at-debug.log.gz, 
> yarn-yarn-resourcemanager-ip-10-12-47-144.log.gz
>
>
> Often when we submit applications to an incompletely utilized 

[jira] [Commented] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases

2016-05-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280176#comment-15280176
 ] 

Jason Lowe commented on YARN-4325:
--

Yes, what I'm proposing is to have the log handlers always respond to the 
APPLICATION_FINISHED event.  We can look at this problem in two ways: either 
the bug is in the ApplicationImpl because it doesn't track that log handling 
failed and sometimes needs to clean up the app in other states, or the bug is 
in the log handlers because they failed to respond to the APPLICATION_FINISHED 
event when the application terminated.  If the log handlers always responded to 
the APPLICATION_FINISHED event with an APPLICATION_LOG_HANDLING_FAILED or 
APPLICATION_LOG_HANDLING_FINISHED event, wouldn't that also solve the problem?  
Then ApplicationImpl can simply wait until the terminal finished state to 
receive one of the log handling replies and then clean up the app in _one_ 
place rather than several places depending upon the special case being handled.


> Purge app state from NM state-store should cover more LOG_HANDLING cases
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: ApplicationImpl.PNG, YARN-4325-v1.1.patch, 
> YARN-4325-v1.patch, YARN-4325.patch
>
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. 
> After investigating, there are three issues cause app state leak in NM 
> state-store:
> 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in 
> NMStateStore.
> 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit 
> aggregator's doAppLogAggregation() exception case.
> 3. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has 
> transition to remove app in NM state store. Application in other status - 
> like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to 
> remove this app from NM state store even after app get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >