[jira] [Commented] (YARN-8013) Support APP-TAG namespace for allocation tags

2018-03-08 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392546#comment-16392546
 ] 

Weiwei Yang commented on YARN-8013:
---

Based on the work of YARN-8002, it doesn't need big change to support this. 
Attached v1 patch, note this patch will only build after applying the patch in 
YARN-8002.

> Support APP-TAG namespace for allocation tags
> -
>
> Key: YARN-8013
> URL: https://issues.apache.org/jira/browse/YARN-8013
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8013.001.patch
>
>
> YARN-1461 adds *Application Tag* concept to Yarn applications, user is able 
> to annotate application with multiple tags to classify apps. We can leverage 
> this to represent a namespace for a certain group of apps. So instead of 
> calling *app-label*, propose to call it *app-tag*.
> A typical use case is,
> There are a lot of TF jobs running on Yarn, and some of them are consuming 
> resources heavily. So we want to limit number of PS on each node for such BIG 
> players but ignore those SMALL ones. To achieve this, we can do following 
> steps:
>  # Add application tag "big-tf" to these big TF jobs
>  # For each PS request, we add "ps" source tag and map it to constraint 
> "{color:#d04437}notin, node, tensorflow/ps{color}" or 
> "{color:#d04437}cardinality, node, tensorflow/ps{color}{color:#d04437}, 0, 
> 2{color}" for finer grained controls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8013) Support APP-TAG namespace for allocation tags

2018-03-08 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8013:
--
Attachment: YARN-8013.001.patch

> Support APP-TAG namespace for allocation tags
> -
>
> Key: YARN-8013
> URL: https://issues.apache.org/jira/browse/YARN-8013
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8013.001.patch
>
>
> YARN-1461 adds *Application Tag* concept to Yarn applications, user is able 
> to annotate application with multiple tags to classify apps. We can leverage 
> this to represent a namespace for a certain group of apps. So instead of 
> calling *app-label*, propose to call it *app-tag*.
> A typical use case is,
> There are a lot of TF jobs running on Yarn, and some of them are consuming 
> resources heavily. So we want to limit number of PS on each node for such BIG 
> players but ignore those SMALL ones. To achieve this, we can do following 
> steps:
>  # Add application tag "big-tf" to these big TF jobs
>  # For each PS request, we add "ps" source tag and map it to constraint 
> "{color:#d04437}notin, node, tensorflow/ps{color}" or 
> "{color:#d04437}cardinality, node, tensorflow/ps{color}{color:#d04437}, 0, 
> 2{color}" for finer grained controls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7905) Parent directory permission incorrect during public localization

2018-03-08 Thread Bilwa S T (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-7905:

Attachment: YARN-7905-004.patch

> Parent directory permission incorrect during public localization 
> -
>
> Key: YARN-7905
> URL: https://issues.apache.org/jira/browse/YARN-7905
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-7905-001.patch, YARN-7905-002.patch, 
> YARN-7905-003.patch, YARN-7905-004.patch
>
>
> Similar to YARN-6708 during public localization also we have to take care for 
> parent directory if the umask is 027 during node manager start up.
> /filecache/0/200
> the directory permission of /filecache/0 is 750. Which cause 
> application failure 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8008) Admin command to manage global placement constraints

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392496#comment-16392496
 ] 

genericqa commented on YARN-8008:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
13s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
11s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
38s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 14s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 22 new + 209 unchanged - 0 fixed = 231 total (was 209) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
44s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
15s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
13s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 67m 
12s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 28m 
25s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
16s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}186m 

[jira] [Commented] (YARN-7905) Parent directory permission incorrect during public localization

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392456#comment-16392456
 ] 

genericqa commented on YARN-7905:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
25s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
45s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 45s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 4 new + 215 unchanged - 0 fixed = 219 total (was 215) 
{color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
28s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  3m 
55s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
20s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 45s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 36m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-7905 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913383/YARN-7905-003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ed4e2e383046 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 113f401 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-YARN-Build/19936/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| compile | 

[jira] [Commented] (YARN-7952) RM should be able to recover log aggregation status after restart/fail-over

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392452#comment-16392452
 ] 

genericqa commented on YARN-7952:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
10s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
56s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
57s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  6s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 2 new + 382 unchanged - 0 fixed = 384 total (was 382) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
13s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
26s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
47s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 67m 
35s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}175m 

[jira] [Commented] (YARN-8007) Support specifying placement constraint for task containers in SLS

2018-03-08 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392446#comment-16392446
 ] 

Weiwei Yang commented on YARN-8007:
---

Hi [~yangjiandan]

Thanks for the updates, it looks overall good to me. However, I think this one 
might depend on YARN-8015 if we want to get this completely tested, hence I 
suggest to hold on this one at present. Once YARN-8015 is ready, lets do a 
complete testing over this and see how it helps to testing the performance. 
Does that make sense to you?

Thanks a lot for the efforts working on this, will be back visiting this soon.

> Support specifying placement constraint for task containers in SLS
> --
>
> Key: YARN-8007
> URL: https://issues.apache.org/jira/browse/YARN-8007
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Major
> Attachments: YARN-8007.001.patch, YARN-8007.002.patch, 
> YARN-8007.003.patch
>
>
> YARN-6592 introduces placement constraint. Currently SLS does not support 
> specify placement constraint. 
> In order to help better perf test, we should be able to support specify 
> placement for containers in sls configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8015) Support allocation tag namespaces in AppPlacementAllocator

2018-03-08 Thread Weiwei Yang (JIRA)
Weiwei Yang created YARN-8015:
-

 Summary: Support allocation tag namespaces in AppPlacementAllocator
 Key: YARN-8015
 URL: https://issues.apache.org/jira/browse/YARN-8015
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacity scheduler
Reporter: Weiwei Yang


AppPlacementAllocator currently only supports intra-app anti-affinity placement 
constraints, once YARN-8002 and YARN-8013 are resolved, it needs to support 
inter-app constraints too. Also, this may require some refactoring on the 
existing code logic. Use this JIRA to track.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5015) Support sliding window retry capability for container restart

2018-03-08 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392428#comment-16392428
 ] 

Chandni Singh commented on YARN-5015:
-

{{TestContainerSchedulerQueuing.testStartMultipleContainers}} passes on my 
machine.

findbugs warnings is from the code which I haven't modified.

> Support sliding window retry capability for container restart 
> --
>
> Key: YARN-5015
> URL: https://issues.apache.org/jira/browse/YARN-5015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Chandni Singh
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-5015.01.patch, YARN-5015.02.patch, 
> YARN-5015.03.patch, YARN-5015.04.patch, YARN-5015.05.patch
>
>
> We support sliding window retry policy for AM restarts (Introduced in 
> YARN-611). Similar sliding window retry policy is needed for container 
> restarts.
> With this change, we can introduce a common class for 
> SlidingWindowRetryPolicy ( suggested by [~vvasudev] in the comments) and 
> integrate it to container restart. 
> In a subsequent jira, we can modify the AM code to use 
> SlidingWindowRetryPolicy which will unify the AM and container restart code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7871) Node attributes reporting from NM to RM

2018-03-08 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392401#comment-16392401
 ] 

Weiwei Yang commented on YARN-7871:
---

The remaining 1 checkstyle issue is not introduced by this patch. 
[~Naganarasimha], could you please check if this now looks good to you?

> Node attributes reporting from NM to RM 
> 
>
> Key: YARN-7871
> URL: https://issues.apache.org/jira/browse/YARN-7871
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-7871-YARN-3409.001.patch, 
> YARN-7871-YARN-3409.002.patch, YARN-7871-YARN-3409.003.patch, 
> YARN-7871-YARN-3409.004.patch
>
>
> Support to initialize proper attribute provider based on user's configuration.
> NM collects node attributes from a configured attribute provider and send 
> them to RM via HB. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7871) Node attributes reporting from NM to RM

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392387#comment-16392387
 ] 

genericqa commented on YARN-7871:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-3409 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 9s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
18s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
25s{color} | {color:green} YARN-3409 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
7s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
YARN-3409 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
58s{color} | {color:green} YARN-3409 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
34s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 55s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 343 unchanged - 3 fixed = 344 total (was 346) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
38s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
10s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m  
7s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m  6s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}162m 30s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce 

[jira] [Commented] (YARN-8008) Admin command to manage global placement constraints

2018-03-08 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392340#comment-16392340
 ] 

Weiwei Yang commented on YARN-8008:
---

Thanks [~sunilg]. I attached the patch and set to PA state again.
About your suggestion, it makes sense, we should support to provide a "list" 
command for normal users as well. But lets implement these for admin user first.

> Admin command to manage global placement constraints
> 
>
> Key: YARN-8008
> URL: https://issues.apache.org/jira/browse/YARN-8008
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8008.001.patch
>
>
> Add command for admin to manager global placement constraints, such as add, 
> remove and list. This will be exposed via, for example
> {code}
> yarn rmadmin -placementConstraint [ -add -t  -c  | -remove -t 
>  | -list ]
> {code}
> expose to use this JIRA to add API/proto changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8008) Admin command to manage global placement constraints

2018-03-08 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8008:
--
Attachment: YARN-8008.001.patch

> Admin command to manage global placement constraints
> 
>
> Key: YARN-8008
> URL: https://issues.apache.org/jira/browse/YARN-8008
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8008.001.patch
>
>
> Add command for admin to manager global placement constraints, such as add, 
> remove and list. This will be exposed via, for example
> {code}
> yarn rmadmin -placementConstraint [ -add -t  -c  | -remove -t 
>  | -list ]
> {code}
> expose to use this JIRA to add API/proto changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8007) Support specifying placement constraint for task containers in SLS

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392329#comment-16392329
 ] 

genericqa commented on YARN-8007:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 53s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 18s{color} 
| {color:red} hadoop-sls in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.sls.TestSLSStreamAMSynthWithConstraint |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8007 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913692/YARN-8007.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  xml  findbugs  checkstyle  |
| uname | Linux 085b4c9ffbdc 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 113f401 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/19933/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19933/testReport/ |
| Max. process+thread count | 467 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls |
| Console output | 

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392327#comment-16392327
 ] 

genericqa commented on YARN-5764:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
54s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
23s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
45s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 29s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 2 new + 219 unchanged - 0 fixed = 221 total (was 219) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
21s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m  
7s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}103m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-5764 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913469/YARN-5764-v8.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 95efa9aff6ce 3.13.0-139-generic 

[jira] [Commented] (YARN-5015) Support sliding window retry capability for container restart

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392320#comment-16392320
 ] 

genericqa commented on YARN-5015:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
8s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
53s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 35s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 440 unchanged - 0 fixed = 441 total (was 440) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 12s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
56s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
41s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
6s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 50s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
27s{color} | {color:green} hadoop-yarn-applications-distributedshell in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}108m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
|  |  Write to static 

[jira] [Commented] (YARN-8008) Admin command to manage global placement constraints

2018-03-08 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392317#comment-16392317
 ] 

Sunil G commented on YARN-8008:
---

[~cheersyang] I cancelled patch as patch was not there.

 

One quick suggestion, i think -list need not to be a part on admin rather it 
could be added in "yarn cluster" command line.

> Admin command to manage global placement constraints
> 
>
> Key: YARN-8008
> URL: https://issues.apache.org/jira/browse/YARN-8008
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>
> Add command for admin to manager global placement constraints, such as add, 
> remove and list. This will be exposed via, for example
> {code}
> yarn rmadmin -placementConstraint [ -add -t  -c  | -remove -t 
>  | -list ]
> {code}
> expose to use this JIRA to add API/proto changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7952) RM should be able to recover log aggregation status after restart/fail-over

2018-03-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392307#comment-16392307
 ] 

Xuan Gong commented on YARN-7952:
-

[~leftnoteasy]

The unit test failure is not related. I can pass it locally.

> RM should be able to recover log aggregation status after restart/fail-over
> ---
>
> Key: YARN-7952
> URL: https://issues.apache.org/jira/browse/YARN-7952
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7952-poc.patch, YARN-7952.1.patch, 
> YARN-7952.2.patch, YARN-7952.3.patch, YARN-7952.3.patch, YARN-7952.5.patch, 
> YARN-7952.6.patch, YARN-7952.7.patch
>
>
> Right now, the NM would send its own log aggregation status to RM 
> periodically to RM. And RM would aggregate the status for each application, 
> but it will not generate the final status until a client call(from web ui or 
> cli) trigger it. But RM never persists the log aggregation status. So, when 
> RM restarts/fails over, the log aggregation status will become “NOT_STARTED”. 
> This is confusing, maybe we should change it to “NOT_AVAILABLE” (will create 
> a separate ticket for this). Anyway, we need to persist the log aggregation 
> status for the future use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7952) RM should be able to recover log aggregation status after restart/fail-over

2018-03-08 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-7952:

Attachment: YARN-7952.7.patch

> RM should be able to recover log aggregation status after restart/fail-over
> ---
>
> Key: YARN-7952
> URL: https://issues.apache.org/jira/browse/YARN-7952
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7952-poc.patch, YARN-7952.1.patch, 
> YARN-7952.2.patch, YARN-7952.3.patch, YARN-7952.3.patch, YARN-7952.5.patch, 
> YARN-7952.6.patch, YARN-7952.7.patch
>
>
> Right now, the NM would send its own log aggregation status to RM 
> periodically to RM. And RM would aggregate the status for each application, 
> but it will not generate the final status until a client call(from web ui or 
> cli) trigger it. But RM never persists the log aggregation status. So, when 
> RM restarts/fails over, the log aggregation status will become “NOT_STARTED”. 
> This is confusing, maybe we should change it to “NOT_AVAILABLE” (will create 
> a separate ticket for this). Anyway, we need to persist the log aggregation 
> status for the future use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8008) Admin command to manage global placement constraints

2018-03-08 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8008:
--
Attachment: (was: YARN-8008.001.patch)

> Admin command to manage global placement constraints
> 
>
> Key: YARN-8008
> URL: https://issues.apache.org/jira/browse/YARN-8008
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>
> Add command for admin to manager global placement constraints, such as add, 
> remove and list. This will be exposed via, for example
> {code}
> yarn rmadmin -placementConstraint [ -add -t  -c  | -remove -t 
>  | -list ]
> {code}
> expose to use this JIRA to add API/proto changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7707) [GPG] Policy generator framework

2018-03-08 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392283#comment-16392283
 ] 

Giovanni Matteo Fumarola commented on YARN-7707:


Thanks [~youchen] for the patch, few comments:

*{{DefaultGlobalPolicy}}*
 * It should be called {{NoOpGlobalPolicy}} or something similar.

*PolicyGenerator*
 * {{invokeRMWebService}} can return null. Please add NullPointer checks in 
{{getSchedulerInfo}} and {{getInfos.}}
 * Line 193. Change the conf.get into 
{code:java}
conf.get(YarnConfiguration.GPG_POLICY_GENERATOR_BLACKLIST);
{code}
 and add a NullPointer check after. It makes cleaner.

*TestGPGPolicyFacade*
* Add javadoc for the tests.

In general keep the word {{SubCluster}} in this way around the patch.

> [GPG] Policy generator framework
> 
>
> Key: YARN-7707
> URL: https://issues.apache.org/jira/browse/YARN-7707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Young Chen
>Priority: Major
>  Labels: federation, gpg
> Attachments: YARN-7707-YARN-7402.01.patch, 
> YARN-7707-YARN-7402.02.patch, YARN-7707-YARN-7402.03.patch, 
> YARN-7707-YARN-7402.04.patch, YARN-7707-YARN-7402.05.patch, 
> YARN-7707-YARN-7402.06.patch, YARN-7707-YARN-7402.07.patch, 
> YARN-7707-YARN-7402.08.patch, YARN-7707-YARN-7402.09.patch
>
>
> This JIRA tracks the development of a generic framework for querying 
> sub-clusters for metrics, running policies, and updating them in the 
> FederationStateStore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8007) Support specifying placement constraint for task containers in SLS

2018-03-08 Thread Jiandan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392275#comment-16392275
 ] 

Jiandan Yang  commented on YARN-8007:
-

fix findbug, checkstyle and whitespace error, and upload v3 patch

> Support specifying placement constraint for task containers in SLS
> --
>
> Key: YARN-8007
> URL: https://issues.apache.org/jira/browse/YARN-8007
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Major
> Attachments: YARN-8007.001.patch, YARN-8007.002.patch, 
> YARN-8007.003.patch
>
>
> YARN-6592 introduces placement constraint. Currently SLS does not support 
> specify placement constraint. 
> In order to help better perf test, we should be able to support specify 
> placement for containers in sls configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8007) Support specifying placement constraint for task containers in SLS

2018-03-08 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-8007:

Attachment: YARN-8007.003.patch

> Support specifying placement constraint for task containers in SLS
> --
>
> Key: YARN-8007
> URL: https://issues.apache.org/jira/browse/YARN-8007
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Major
> Attachments: YARN-8007.001.patch, YARN-8007.002.patch, 
> YARN-8007.003.patch
>
>
> YARN-6592 introduces placement constraint. Currently SLS does not support 
> specify placement constraint. 
> In order to help better perf test, we should be able to support specify 
> placement for containers in sls configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7871) Node attributes reporting from NM to RM

2018-03-08 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-7871:
--
Attachment: YARN-7871-YARN-3409.004.patch

> Node attributes reporting from NM to RM 
> 
>
> Key: YARN-7871
> URL: https://issues.apache.org/jira/browse/YARN-7871
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-7871-YARN-3409.001.patch, 
> YARN-7871-YARN-3409.002.patch, YARN-7871-YARN-3409.003.patch, 
> YARN-7871-YARN-3409.004.patch
>
>
> Support to initialize proper attribute provider based on user's configuration.
> NM collects node attributes from a configured attribute provider and send 
> them to RM via HB. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7871) Node attributes reporting from NM to RM

2018-03-08 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392220#comment-16392220
 ] 

Weiwei Yang commented on YARN-7871:
---

Hi [~Naganarasimha]

Thanks for the comments, 
{quote}
NMDistributedNodeAttributesHandler, could be pulled into a separate outer class
{quote}
This is following rest of handlers pattern as an inner class, such as 
{{NMDistributedNodeLabelsHandler}}, {{NMCentralizedNodeLabelsHandler}}. I think 
we can continue to use this pattern and if maybe do some refactoring when it is 
necessary.

{quote}
replaceAttributes without a prefix is it good to clear all ? 
{quote}
>From API point of view, when a prefix == null, that implicates to remove ALL 
>node attributes regardless what prefix they have. So when a prefix is null, in 
>{{Host}}, we first line 448 clear all, then 459 add all new attributes. This 
>gives admin the ability to replace ALL attributes without iterating all 
>existing prefixes.

{quote}
checkstyle has valid warnings
{quote}
Fixed one checkstyle issue, but the other one about complying number of lines 
of {{ResourceTrackerService#nodeHeartbeat}} is not introduced by this patch.

Thanks

> Node attributes reporting from NM to RM 
> 
>
> Key: YARN-7871
> URL: https://issues.apache.org/jira/browse/YARN-7871
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-7871-YARN-3409.001.patch, 
> YARN-7871-YARN-3409.002.patch, YARN-7871-YARN-3409.003.patch
>
>
> Support to initialize proper attribute provider based on user's configuration.
> NM collects node attributes from a configured attribute provider and send 
> them to RM via HB. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5015) Support sliding window retry capability for container restart

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392217#comment-16392217
 ] 

genericqa commented on YARN-5015:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
55s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
13s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
12s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
19s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  1m  
4s{color} | {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  1m  4s{color} | 
{color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m  4s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 10s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 440 unchanged - 0 fixed = 441 total (was 440) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
20s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  3m 
41s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
20s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
18s{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager
 generated 2 new + 9 unchanged - 0 fixed = 11 total (was 9) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
59s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 19s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
19s{color} | {color:green} hadoop-yarn-applications-distributedshell in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | 

[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2018-03-08 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392213#comment-16392213
 ] 

Miklos Szegedi commented on YARN-5764:
--

Thank you, [~devaraj.k]. The patch looks good to me in general. I still see two 
checkstyle issues. I started a new jenkins run.

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
>Priority: Major
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v2.patch, 
> YARN-5764-v3.patch, YARN-5764-v4.patch, YARN-5764-v5.patch, 
> YARN-5764-v6.patch, YARN-5764-v7.patch, YARN-5764-v8.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5015) Support sliding window retry capability for container restart

2018-03-08 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-5015:

Attachment: YARN-5015.05.patch

> Support sliding window retry capability for container restart 
> --
>
> Key: YARN-5015
> URL: https://issues.apache.org/jira/browse/YARN-5015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Chandni Singh
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-5015.01.patch, YARN-5015.02.patch, 
> YARN-5015.03.patch, YARN-5015.04.patch, YARN-5015.05.patch
>
>
> We support sliding window retry policy for AM restarts (Introduced in 
> YARN-611). Similar sliding window retry policy is needed for container 
> restarts.
> With this change, we can introduce a common class for 
> SlidingWindowRetryPolicy ( suggested by [~vvasudev] in the comments) and 
> integrate it to container restart. 
> In a subsequent jira, we can modify the AM code to use 
> SlidingWindowRetryPolicy which will unify the AM and container restart code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5015) Support sliding window retry capability for container restart

2018-03-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392164#comment-16392164
 ] 

Wangda Tan commented on YARN-5015:
--

[~csingh], it looks like some classes are missing, plz double check the 
uploaded patch.

> Support sliding window retry capability for container restart 
> --
>
> Key: YARN-5015
> URL: https://issues.apache.org/jira/browse/YARN-5015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Chandni Singh
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-5015.01.patch, YARN-5015.02.patch, 
> YARN-5015.03.patch, YARN-5015.04.patch
>
>
> We support sliding window retry policy for AM restarts (Introduced in 
> YARN-611). Similar sliding window retry policy is needed for container 
> restarts.
> With this change, we can introduce a common class for 
> SlidingWindowRetryPolicy ( suggested by [~vvasudev] in the comments) and 
> integrate it to container restart. 
> In a subsequent jira, we can modify the AM code to use 
> SlidingWindowRetryPolicy which will unify the AM and container restart code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7952) RM should be able to recover log aggregation status after restart/fail-over

2018-03-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392160#comment-16392160
 ] 

Wangda Tan commented on YARN-7952:
--

Thanks [~rkanter]

> RM should be able to recover log aggregation status after restart/fail-over
> ---
>
> Key: YARN-7952
> URL: https://issues.apache.org/jira/browse/YARN-7952
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7952-poc.patch, YARN-7952.1.patch, 
> YARN-7952.2.patch, YARN-7952.3.patch, YARN-7952.3.patch, YARN-7952.5.patch, 
> YARN-7952.6.patch
>
>
> Right now, the NM would send its own log aggregation status to RM 
> periodically to RM. And RM would aggregate the status for each application, 
> but it will not generate the final status until a client call(from web ui or 
> cli) trigger it. But RM never persists the log aggregation status. So, when 
> RM restarts/fails over, the log aggregation status will become “NOT_STARTED”. 
> This is confusing, maybe we should change it to “NOT_AVAILABLE” (will create 
> a separate ticket for this). Anyway, we need to persist the log aggregation 
> status for the future use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7952) RM should be able to recover log aggregation status after restart/fail-over

2018-03-08 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392159#comment-16392159
 ] 

Robert Kanter commented on YARN-7952:
-

Sure.  I'll take a look tomorrow morning

> RM should be able to recover log aggregation status after restart/fail-over
> ---
>
> Key: YARN-7952
> URL: https://issues.apache.org/jira/browse/YARN-7952
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7952-poc.patch, YARN-7952.1.patch, 
> YARN-7952.2.patch, YARN-7952.3.patch, YARN-7952.3.patch, YARN-7952.5.patch, 
> YARN-7952.6.patch
>
>
> Right now, the NM would send its own log aggregation status to RM 
> periodically to RM. And RM would aggregate the status for each application, 
> but it will not generate the final status until a client call(from web ui or 
> cli) trigger it. But RM never persists the log aggregation status. So, when 
> RM restarts/fails over, the log aggregation status will become “NOT_STARTED”. 
> This is confusing, maybe we should change it to “NOT_AVAILABLE” (will create 
> a separate ticket for this). Anyway, we need to persist the log aggregation 
> status for the future use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7871) Node attributes reporting from NM to RM

2018-03-08 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392147#comment-16392147
 ] 

Naganarasimha G R commented on YARN-7871:
-

Thanks [~cheersyang],

IMO Patch is good enough to be committed but very few minor queries/comments 
left:
 *  NMDistributedNodeAttributesHandler, could be pulled into a separate outer 
class as it will have more logic later on as part of improvement Jira
 * NodeAttributesManagerImpl : ln no 448 : replaceAttributes without a prefix 
is it good to clear all ? though its used only by CLI and RTS for distributed
 *  checkstyle has valid warnings

 

 

> Node attributes reporting from NM to RM 
> 
>
> Key: YARN-7871
> URL: https://issues.apache.org/jira/browse/YARN-7871
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-7871-YARN-3409.001.patch, 
> YARN-7871-YARN-3409.002.patch, YARN-7871-YARN-3409.003.patch
>
>
> Support to initialize proper attribute provider based on user's configuration.
> NM collects node attributes from a configured attribute provider and send 
> them to RM via HB. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5015) Support sliding window retry capability for container restart

2018-03-08 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392140#comment-16392140
 ] 

Chandni Singh commented on YARN-5015:
-

Patch 4 addresses [~leftnoteasy] review comments.

Instead of creating a top level {{NMContainerRetryContext}} class, I have added 
a {{RetryContext}} class to {{SlidingWindowRetryPolicy}}

> Support sliding window retry capability for container restart 
> --
>
> Key: YARN-5015
> URL: https://issues.apache.org/jira/browse/YARN-5015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Chandni Singh
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-5015.01.patch, YARN-5015.02.patch, 
> YARN-5015.03.patch, YARN-5015.04.patch
>
>
> We support sliding window retry policy for AM restarts (Introduced in 
> YARN-611). Similar sliding window retry policy is needed for container 
> restarts.
> With this change, we can introduce a common class for 
> SlidingWindowRetryPolicy ( suggested by [~vvasudev] in the comments) and 
> integrate it to container restart. 
> In a subsequent jira, we can modify the AM code to use 
> SlidingWindowRetryPolicy which will unify the AM and container restart code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5015) Support sliding window retry capability for container restart

2018-03-08 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-5015:

Attachment: YARN-5015.04.patch

> Support sliding window retry capability for container restart 
> --
>
> Key: YARN-5015
> URL: https://issues.apache.org/jira/browse/YARN-5015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Chandni Singh
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-5015.01.patch, YARN-5015.02.patch, 
> YARN-5015.03.patch, YARN-5015.04.patch
>
>
> We support sliding window retry policy for AM restarts (Introduced in 
> YARN-611). Similar sliding window retry policy is needed for container 
> restarts.
> With this change, we can introduce a common class for 
> SlidingWindowRetryPolicy ( suggested by [~vvasudev] in the comments) and 
> integrate it to container restart. 
> In a subsequent jira, we can modify the AM code to use 
> SlidingWindowRetryPolicy which will unify the AM and container restart code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7523) Introduce description and version field in Service record

2018-03-08 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392084#comment-16392084
 ] 

Gour Saha commented on YARN-7523:
-

We have made version mandatory with this patch, which means a Yarnfile with no 
version will fail during launch itself.

Given that this is the first release with YARN services, I think we should let 
users explicitly be aware of the need and importance of the version field since 
upgrade is an important aspect in the lifecycle of their services. So, I would 
suggest not to default the version field and force app-owners to specify one 
explicitly.

> Introduce description and version field in Service record
> -
>
> Key: YARN-7523
> URL: https://issues.apache.org/jira/browse/YARN-7523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Chandni Singh
>Priority: Major
> Fix For: yarn-native-services
>
> Attachments: YARN-7523.001.patch, YARN-7523.002.patch, 
> YARN-7523.003.patch, YARN-7523.004.patch
>
>
> YARN-7512 would need version field in Service record. It would be good to 
> introduce a description field also to allow service owners to capture some 
> details which can be used to display in Service catalog as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7523) Introduce description and version field in Service record

2018-03-08 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392039#comment-16392039
 ] 

Chandni Singh commented on YARN-7523:
-

Adding a default version is fine with me. Is version mandatory in slider?

> Introduce description and version field in Service record
> -
>
> Key: YARN-7523
> URL: https://issues.apache.org/jira/browse/YARN-7523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Chandni Singh
>Priority: Major
> Fix For: yarn-native-services
>
> Attachments: YARN-7523.001.patch, YARN-7523.002.patch, 
> YARN-7523.003.patch, YARN-7523.004.patch
>
>
> YARN-7512 would need version field in Service record. It would be good to 
> introduce a description field also to allow service owners to capture some 
> details which can be used to display in Service catalog as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6936) [Atsv2] Retrospect storing entities into sub application table from client perspective

2018-03-08 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392026#comment-16392026
 ] 

Vrushali C commented on YARN-6936:
--

Currently, ATSv2 stores information in an additional table (sub application 
table) if the doAs user and AM user are different. This I believe is one of the 
Tez usage situations, DAGs run as individual users via the same AM (different 
AM user). 

Since DAGs or queries are not a primary YARN concept, they are not treated as 
first class citizens by timeline service. Timeline Service v2 will treat all 
entities as being inside of an application scope. Reading entities outside of 
application scope may not be a desirable recommendation. 

 Let's chat with Tez community to talk about the querying of entities done by 
Tez so that we can ascertain that the ATSv2 APIs address it and/or arrive at a 
generic recommendation for all frameworks that use ATsv2. 



> [Atsv2] Retrospect storing entities into sub application table from client 
> perspective
> --
>
> Key: YARN-6936
> URL: https://issues.apache.org/jira/browse/YARN-6936
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Priority: Major
>
> Currently YARN-6734 stores entities into sub application table only if doAs 
> user and submitted users are different. This holds good for Tez kind of use 
> cases. But AM runs as same as submitted user like MR also need to store 
> entities in sub application table so that it could read entities without 
> application id. 
> This would be a point of concern later stages when ATSv2 is deployed into 
> production. This JIRA is to retrospect decision of storing entities into sub 
> application table based on client side configuration driven rather than user 
> driven. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7523) Introduce description and version field in Service record

2018-03-08 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392020#comment-16392020
 ] 

Billie Rinaldi commented on YARN-7523:
--

What do you think about defaulting to version 1 if the version is unspecified? 
Then people wouldn't need to change their Yarnfiles if they don't need the 
version field.

> Introduce description and version field in Service record
> -
>
> Key: YARN-7523
> URL: https://issues.apache.org/jira/browse/YARN-7523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Chandni Singh
>Priority: Major
> Fix For: yarn-native-services
>
> Attachments: YARN-7523.001.patch, YARN-7523.002.patch, 
> YARN-7523.003.patch, YARN-7523.004.patch
>
>
> YARN-7512 would need version field in Service record. It would be good to 
> introduce a description field also to allow service owners to capture some 
> details which can be used to display in Service catalog as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7952) RM should be able to recover log aggregation status after restart/fail-over

2018-03-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391991#comment-16391991
 ] 

Wangda Tan commented on YARN-7952:
--

Thanks [~xgong], the latest patch looks good, could you check if the failed UT 
is related or not? 

[~rkanter], do you want to take a look at the patch if you have chance? I plan 
to commit the patch by tomorrow if no objections, plz let me know if you want 
more time to review.

> RM should be able to recover log aggregation status after restart/fail-over
> ---
>
> Key: YARN-7952
> URL: https://issues.apache.org/jira/browse/YARN-7952
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7952-poc.patch, YARN-7952.1.patch, 
> YARN-7952.2.patch, YARN-7952.3.patch, YARN-7952.3.patch, YARN-7952.5.patch, 
> YARN-7952.6.patch
>
>
> Right now, the NM would send its own log aggregation status to RM 
> periodically to RM. And RM would aggregate the status for each application, 
> but it will not generate the final status until a client call(from web ui or 
> cli) trigger it. But RM never persists the log aggregation status. So, when 
> RM restarts/fails over, the log aggregation status will become “NOT_STARTED”. 
> This is confusing, maybe we should change it to “NOT_AVAILABLE” (will create 
> a separate ticket for this). Anyway, we need to persist the log aggregation 
> status for the future use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7657) Queue Mapping could provide options to provide 'user' specific auto-created queues under a specified group parent queue

2018-03-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391987#comment-16391987
 ] 

Wangda Tan commented on YARN-7657:
--

[~suma.shivaprasad], 

Thanks for working on the fix, one question: 

The updated logic for {{g:}} is: 

{code}
  if (mapping.type == MappingType.GROUP) {
for (String userGroups : groups.getGroups(user)) {
  if (userGroups.equals(mapping.source)) {
if (mapping.hasParentQueue() && mapping.queue.equals
(CURRENT_USER_MAPPING)) {
  return getPlacementContext(mapping, user);
}
return getPlacementContext(mapping);
  }
}
  }
{code}

So my question is, is it possible to specify: {{g:makerting-group:%user}} 
(without parent queue). And is it a supported case for today? I think this 
looks valid (only do queue mapping for user from specific groups), but the 
syntax might be bit complex for user to understand.

And is it necessary to add end-to-end test (from config to submit app) for the 
new syntax? I think we have some for auto queue creation, but not sure about 
normal queue mapping.

> Queue Mapping could provide options to provide 'user' specific auto-created 
> queues under a specified group parent queue
> ---
>
> Key: YARN-7657
> URL: https://issues.apache.org/jira/browse/YARN-7657
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-7657.1.patch, YARN-7657.2.patch, YARN-7657.3.patch
>
>
> Current Queue-Mapping only provides %user as an option for 'user' specific 
> queues as u:%user:%user. We can also support %user with group as 
> 'g:makerting-group:marketing.%user'  and user specific queues can be 
> automatically created under a group queue in this case.
> cc [~leftnoteasy]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8000) Yarn Service: component instance name shows up as component name in container record

2018-03-08 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8000:

Attachment: Screen Shot 2018-03-08 at 1.37.07 PM.png

> Yarn Service: component instance name shows up as component name in container 
> record 
> -
>
> Key: YARN-8000
> URL: https://issues.apache.org/jira/browse/YARN-8000
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: Screen Shot 2018-03-08 at 1.37.07 PM.png, 
> YARN-8000.001.patch, YARN-8000.002.patch, YARN-8000.003.patch, 
> YARN-8000.004.patch
>
>
> Yarn Service: component instance name shows up as component name in container 
> record 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8000) Yarn Service: component instance name shows up as component name in container record

2018-03-08 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391945#comment-16391945
 ] 

Chandni Singh commented on YARN-8000:
-

I tested the patch and don't find any issues with the service component page. 
!Screen Shot 2018-03-08 at 1.37.07 PM.png!

> Yarn Service: component instance name shows up as component name in container 
> record 
> -
>
> Key: YARN-8000
> URL: https://issues.apache.org/jira/browse/YARN-8000
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: Screen Shot 2018-03-08 at 1.37.07 PM.png, 
> YARN-8000.001.patch, YARN-8000.002.patch, YARN-8000.003.patch, 
> YARN-8000.004.patch
>
>
> Yarn Service: component instance name shows up as component name in container 
> record 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8010) add config in FederationRMFailoverProxy to not bypass facade cache when failing over

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391899#comment-16391899
 ] 

genericqa commented on YARN-8010:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 
56s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
13s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
14s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 10s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 2 new + 214 unchanged - 0 fixed = 216 total (was 214) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 47s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
14s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
20s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}104m 47s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8010 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913628/YARN-8010.v1.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  

[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-03-08 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391802#comment-16391802
 ] 

Eric Payne commented on YARN-4606:
--

[~maniraj...@gmail.com], thank you for the patch. The overall approach looks 
fine, but I have a couple of concerns.
 - The behavior of assigning resources to schedulable applications has changed. 
With this patch, in the following use case, resources are not assigned to the 
second app when they should be. I have not analyzed the behavior closely enough 
to debug the issue, but I wish to document the behavior:
 -- Queue1 total resources: 40
 -- Queue1 Max Application Master Resources: 2
 -- Container sizes are all 1 resource
|*User Name*|*Applicatiton ID*|*Used AM resources*|*Total Used 
Resources*|*Pending Resources*|
|User1|App1|1|39|20|
|User2|App2|0|0|1 (waiting for AM)|

 -- In this scenario, User2 wants to start App2 but User1 is consuming all 
resources in the queue with App1. When App1 releases a resource, however, it is 
not given to App2. The resource is given back to App1, which brings its Pending 
value down to 19. This is incorrect behavior since Queue1 has room for 2 AMs.

 - I think the {{TestRMHA}} unit test needs to be modified to adjust to this 
patch:
{code:java}
TestRMHA
TestRMHA.testFailoverAndTransitions:219->verifyClusterMetrics:754 Incorrect 
value for metric activeApplications expected:<1> but was:<0>
TestRMHA.testFailoverClearsRMContext:550->verifyClusterMetrics:754 Incorrect 
value for metric activeApplications expected:<1> but was:<0>
{code}

 - A couple of minor things:
 -- IIUC, the value stored in {{activeUsersOfPendingApps}} represents the 
number of suers that do not have any active applications. Is that correct? If 
so, I think it would be more clear if it were called 
{{usersWithOnlyPendingApps}}.
 -- In {{AbstractUsersManager}} and {{ActiveUsersManager}}, *atleast* should be 
"at least*.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8009) YARN limit number of simultaneously running containers in the application level

2018-03-08 Thread Sachin Jose (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Jose resolved YARN-8009.
---
  Resolution: Invalid
Release Note: 

https://issues.apache.org/jira/browse/MAPREDUCE-5583
https://issues.apache.org/jira/browse/TEZ-2914

> YARN limit number of simultaneously running containers in the application 
> level
> ---
>
> Key: YARN-8009
> URL: https://issues.apache.org/jira/browse/YARN-8009
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Sachin Jose
>Priority: Minor
>  Labels: features
>
> It would be really useful if the user can specify maximum containers can be 
> running simultaneously in the application level. Most of the long running 
> YARN application can be benefited out of it. At this moment, the only 
> available option to restrict resource over usage of long running is in the 
> YARN resource manager queue level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8009) YARN limit number of simultaneously running containers in the application level

2018-03-08 Thread Sachin Jose (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391770#comment-16391770
 ] 

Sachin Jose commented on YARN-8009:
---

I've should've done some more details check in mapred-defailt.xml and 
tez-default.xml before raising this request. 

Thanks a lot [~eepayne] and [~miklos.szeg...@cloudera.com] 

> YARN limit number of simultaneously running containers in the application 
> level
> ---
>
> Key: YARN-8009
> URL: https://issues.apache.org/jira/browse/YARN-8009
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Sachin Jose
>Priority: Minor
>  Labels: features
>
> It would be really useful if the user can specify maximum containers can be 
> running simultaneously in the application level. Most of the long running 
> YARN application can be benefited out of it. At this moment, the only 
> available option to restrict resource over usage of long running is in the 
> YARN resource manager queue level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7944) [UI2] Remove master node link from headers of application pages

2018-03-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391728#comment-16391728
 ] 

Hudson commented on YARN-7944:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13798 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13798/])
YARN-7944. [UI2] Remove master node link from headers of application (sunilg: 
rev 113f401f41ee575cb303ceb647bc243108d93a04)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-app.js
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/serializers/yarn-app-timeline.js
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/models/yarn-app-timeline.js
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/serializers/yarn-app.js
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-app.hbs


> [UI2] Remove master node link from headers of application pages
> ---
>
> Key: YARN-7944
> URL: https://issues.apache.org/jira/browse/YARN-7944
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: YARN-7944.001.patch, YARN-7944.002.patch, 
> YARN-7944.003.patch
>
>
> Rm UI2 has links for Master container log and master node. 
> This link published on application and service page. This links are not 
> required on all pages because AM container node link and container log link 
> are already present in Application view. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-03-08 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391688#comment-16391688
 ] 

Sunil G commented on YARN-4781:
---

[~eepayne] Thanks for the reply.

Yes, this makes sense for me. I will take a deep look into the patch and will 
share if I have some comment.

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-4781.001.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4488) CapacityScheduler: Compute per-container allocation latency and roll up to get per-application and per-queue

2018-03-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391687#comment-16391687
 ] 

Wangda Tan commented on YARN-4488:
--

[~ywskycn], apologize that I missed your ping in YARN-7844. I think it will be 
useful to understand the latency between resource requested added and container 
allocated. However, as I commented above: 
https://issues.apache.org/jira/browse/YARN-4488?focusedCommentId=16382744=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16382744,
 it could be an expensive call to record a more accurate delay. IMHO, if we 
don't record accurate delay, the metrics will not be useful. 

So my personal suggestion is: can we check if it is possible to put the 
container allocation delay code to a isolated module, and enable only on demand 
(just like the approach of YARN-7844). [~maniraj...@gmail.com], does it make 
sense to you? Could you investigate this option?

> CapacityScheduler: Compute per-container allocation latency and roll up to 
> get per-application and per-queue
> 
>
> Key: YARN-4488
> URL: https://issues.apache.org/jira/browse/YARN-4488
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Karthik Kambatla
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-4485.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7944) [UI2] Remove master node link from headers of application pages

2018-03-08 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391673#comment-16391673
 ] 

Sunil G edited comment on YARN-7944 at 3/8/18 6:23 PM:
---

Thanks [~yeshavora] for the patch. And thanks [~leftnoteasy] for checking this.

Committed to trunk and branch-3.1


was (Author: sunilg):
Thanks [~leftnoteasy].

Committed to trunk and branch-3.1

> [UI2] Remove master node link from headers of application pages
> ---
>
> Key: YARN-7944
> URL: https://issues.apache.org/jira/browse/YARN-7944
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: YARN-7944.001.patch, YARN-7944.002.patch, 
> YARN-7944.003.patch
>
>
> Rm UI2 has links for Master container log and master node. 
> This link published on application and service page. This links are not 
> required on all pages because AM container node link and container log link 
> are already present in Application view. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7944) [UI2] Remove master node link from headers of application pages

2018-03-08 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391673#comment-16391673
 ] 

Sunil G commented on YARN-7944:
---

Thanks [~leftnoteasy].

Committed to trunk and branch-3.1

> [UI2] Remove master node link from headers of application pages
> ---
>
> Key: YARN-7944
> URL: https://issues.apache.org/jira/browse/YARN-7944
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: YARN-7944.001.patch, YARN-7944.002.patch, 
> YARN-7944.003.patch
>
>
> Rm UI2 has links for Master container log and master node. 
> This link published on application and service page. This links are not 
> required on all pages because AM container node link and container log link 
> are already present in Application view. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7944) [UI2] Remove master node link from headers of application pages

2018-03-08 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-7944:
--
Summary: [UI2] Remove master node link from headers of application pages  
(was: Remove master node link from headers of application pages)

> [UI2] Remove master node link from headers of application pages
> ---
>
> Key: YARN-7944
> URL: https://issues.apache.org/jira/browse/YARN-7944
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: YARN-7944.001.patch, YARN-7944.002.patch, 
> YARN-7944.003.patch
>
>
> Rm UI2 has links for Master container log and master node. 
> This link published on application and service page. This links are not 
> required on all pages because AM container node link and container log link 
> are already present in Application view. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7844) Expose metrics for scheduler operation (allocate, schedulerEvent) to JMX

2018-03-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391672#comment-16391672
 ] 

Wangda Tan commented on YARN-7844:
--

Thanks [~ywskycn], I briefly took a look at the patch, and I like the idea to 
record scheduler events, but I'm not sure should we record frequency in 
addition to (or instead of) per-invoke latency. In many cases lock contention 
inside scheduler (especially after 3.0 since we improved lots of scheduler 
related locking and performance issues) impact less of actual container 
allocation latency. The frequency could be a very useful to analysis scheduler 
performance, for example, allocation call. I think we already have some of the 
frequency metrics. 

And in addition to scheduler events, some other operations can be recorded, 
such as getQueueInfo/appInfo call, we saw many customer's prod deployment 
impacted by huge number of read calls since it will grab locks of scheduler.

> Expose metrics for scheduler operation (allocate, schedulerEvent) to JMX
> 
>
> Key: YARN-7844
> URL: https://issues.apache.org/jira/browse/YARN-7844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Major
> Attachments: YARN-7844.000.patch, YARN-7844.001.patch
>
>
> Currently FairScheduler's FSOpDurations records some scheduler operation 
> metrics: nodeUpdateCall, preemptCall, etc. We may need similar for 
> CapacityScheduler. Also, need to add more metrics there. This could help 
> monitor the RM scheduler performance, and get more insights whether scheduler 
> is under-pressure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7844) Expose metrics for scheduler operation (allocate, schedulerEvent) to JMX

2018-03-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-7844:
-
Priority: Major  (was: Minor)

> Expose metrics for scheduler operation (allocate, schedulerEvent) to JMX
> 
>
> Key: YARN-7844
> URL: https://issues.apache.org/jira/browse/YARN-7844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Major
> Attachments: YARN-7844.000.patch, YARN-7844.001.patch
>
>
> Currently FairScheduler's FSOpDurations records some scheduler operation 
> metrics: nodeUpdateCall, preemptCall, etc. We may need similar for 
> CapacityScheduler. Also, need to add more metrics there. This could help 
> monitor the RM scheduler performance, and get more insights whether scheduler 
> is under-pressure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7844) Expose metrics for scheduler operation (allocate, schedulerEvent) to JMX

2018-03-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-7844:
-
Target Version/s: 3.2.0

> Expose metrics for scheduler operation (allocate, schedulerEvent) to JMX
> 
>
> Key: YARN-7844
> URL: https://issues.apache.org/jira/browse/YARN-7844
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Major
> Attachments: YARN-7844.000.patch, YARN-7844.001.patch
>
>
> Currently FairScheduler's FSOpDurations records some scheduler operation 
> metrics: nodeUpdateCall, preemptCall, etc. We may need similar for 
> CapacityScheduler. Also, need to add more metrics there. This could help 
> monitor the RM scheduler performance, and get more insights whether scheduler 
> is under-pressure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4488) CapacityScheduler: Compute per-container allocation latency and roll up to get per-application and per-queue

2018-03-08 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391597#comment-16391597
 ] 

Wei Yan commented on YARN-4488:
---

Thanks for pinging, [~leftnoteasy]. I created YARN-7844 previously, which 
mostly exposes related metrics in the scheduler level, including (may not fully 
included in YARN-7844.001.patch) various scheduler ops (node_add, node_remove, 
allocate, update...), and event queue size. This set of metrics would help us 
understand whether RM scheduler is under-pressure, what is the throughput of 
the scheduler, and whether the scheduler itself becomes a system bottleneck.

For this JIRA, the scheduling delay for a container, an application can be 
various due to different reasons: scheduler itself, resource availability, 
queue configs... I'm not sure how we can use this info in prod, to tune queue 
configs. In our prod env, the top complaints from customers are their jobs get 
long time to run. Mostly becuase of their queues short of resources, which have 
already covered by existing metrics (tracking available resources for each 
queue).

> CapacityScheduler: Compute per-container allocation latency and roll up to 
> get per-application and per-queue
> 
>
> Key: YARN-4488
> URL: https://issues.apache.org/jira/browse/YARN-4488
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Karthik Kambatla
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-4485.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8009) YARN limit number of simultaneously running containers in the application level

2018-03-08 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391504#comment-16391504
 ] 

Eric Payne commented on YARN-8009:
--

[~sachinjose2...@gmail.com], this property is a per-framework setting, as 
[~miklos.szeg...@cloudera.com] pointed out above for the distributed shell.

For MapReduce, the settings are {{mapreduce.job.running.map.limit}} and 
{{mapreduce.job.running.reduce.limit}}.

For tez, it's {{tez.am.vertex.max-task-concurrency}}

> YARN limit number of simultaneously running containers in the application 
> level
> ---
>
> Key: YARN-8009
> URL: https://issues.apache.org/jira/browse/YARN-8009
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Sachin Jose
>Priority: Minor
>  Labels: features
>
> It would be really useful if the user can specify maximum containers can be 
> running simultaneously in the application level. Most of the long running 
> YARN application can be benefited out of it. At this moment, the only 
> available option to restrict resource over usage of long running is in the 
> YARN resource manager queue level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-03-08 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391499#comment-16391499
 ] 

Jason Lowe commented on YARN-7999:
--

Thanks for trying out the patch!  Can you provide more details on the hang 
(e.g.: pstack of the hung container-executor process) or detailed steps on how 
to reproduce it?  We were able to reproduce the original issue and verified 
this patch fixes it, but we never saw a container-executor hang.

> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Jason Lowe
>Priority: Major
> Attachments: YARN-7999.001.patch, YARN-7999.002.patch
>
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_20]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_20
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_20//container_1520032931921_0001_01_20.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8010) add config in FederationRMFailoverProxy to not bypass facade cache when failing over

2018-03-08 Thread Botong Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8010:
---
Attachment: YARN-8010.v1.patch

> add config in FederationRMFailoverProxy to not bypass facade cache when 
> failing over
> 
>
> Key: YARN-8010
> URL: https://issues.apache.org/jira/browse/YARN-8010
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
> Attachments: YARN-8010.v1.patch, YARN-8010.v1.patch
>
>
> Today when YarnRM is failing over, the FederationRMFailoverProxy running in 
> AMRMProxy will perform failover, try to get latest subcluster info from 
> FederationStateStore and then retry connect to the latest YarnRM master. When 
> calling getSubCluster() to FederationStateStoreFacade, it bypasses the cache 
> with a flush flag. When YarnRM is failing over, every AM heartbeat thread 
> creates a different thread inside FederationInterceptor, each of which keeps 
> performing failover several times. This leads to a big spike of getSubCluster 
> call to FederationStateStore. 
> Depending on the cluster setup (e.g. putting a VIP before all YarnRMs), 
> YarnRM master slave change might not result in RM addr change. In other 
> cases, a small delay of getting latest subcluster information may be 
> acceptable. This patch thus creates a config option, so that it is possible 
> to ask the FederationRMFailoverProxy to not flush cache when calling 
> getSubCluster(). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8014) YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously

2018-03-08 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt resolved YARN-8014.

Resolution: Invalid

Closing as invalid

> YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously
> -
>
> Key: YARN-8014
> URL: https://issues.apache.org/jira/browse/YARN-8014
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.2
>Reporter: Evan Tepsic
>Priority: Minor
>
> A graceful shutdown & then startup of a NodeManager process using YARN/HDFS 
> v2.8.2 seems to successfully place the Node back into RUNNING state. However, 
> ResouceManager appears to keep the Node also in SHUTDOWN state.
>  
> *Steps To Reproduce:*
> 1. SSH to host running NodeManager.
>  2. Switch-to UserID that NodeManager is running as (hadoop).
>  3. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  4. Wait for NodeManager process to terminate gracefully.
>  5. Confirm Node is in SHUTDOWN state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  6. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  7. Confirm Node is in RUNNING state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  
> *Investigation:*
>  1. Review contents of ResourceManager + NodeManager log-files:
> +ResourceManager log-[file:+|file:///+]
>  2018-03-08 08:15:44,085 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node 
> with node id : rb0101.local:43892 has shutdown, hence unregistering the node.
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node rb0101.local:43892 as it is now SHUTDOWN
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:43892 Node Transitioned from RUNNING to SHUTDOWN
>  2018-03-08 08:15:44,093 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Removed node rb0101.local:43892 cluster capacity: 
>  2018-03-08 08:16:08,915 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node rb0101.local(cmPort: 42627 httpPort: 8042) registered 
> with capability: , assigned nodeId rb0101.local:42627
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:42627 Node Transitioned from NEW to RUNNING
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Added node rb0101.local:42627 cluster capacity: 
>  2018-03-08 08:16:34,826 WARN org.apache.hadoop.ipc.Server: Large response 
> size 2976014 for call Call#428958 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 192.168.1.100:44034
>  
> +NodeManager log-[file:+|file:///+]
>  2018-03-08 08:00:14,500 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:10:14,498 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:15:44,048 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: 
> SIGTERM
>  2018-03-08 08:15:44,101 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Successfully 
> Unregistered the Node rb0101.local:43892 with ResourceManager.
>  2018-03-08 08:15:44,114 INFO org.mortbay.log: Stopped 
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042
>  2018-03-08 08:15:44,226 INFO org.apache.hadoop.ipc.Server: Stopping server 
> on 43892
>  2018-03-08 08:15:44,232 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 43892
>  2018-03-08 08:15:44,237 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
>  2018-03-08 08:15:44,239 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  org.apache.hadoop.yarn.server.nodemanager.containermanager.logag
>  gregation.LogAggregationService waiting for pending aggregation during exit
>  2018-03-08 08:15:44,242 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Cont
>  ainersMonitorImpl is interrupted. Exiting.
>  2018-03-08 08:15:44,284 INFO org.apache.hadoop.ipc.Server: Stopping server 
> on 8040
>  2018-03-08 08:15:44,285 INFO 

[jira] [Reopened] (YARN-8014) YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously

2018-03-08 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt reopened YARN-8014:


> YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously
> -
>
> Key: YARN-8014
> URL: https://issues.apache.org/jira/browse/YARN-8014
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.2
>Reporter: Evan Tepsic
>Priority: Minor
>
> A graceful shutdown & then startup of a NodeManager process using YARN/HDFS 
> v2.8.2 seems to successfully place the Node back into RUNNING state. However, 
> ResouceManager appears to keep the Node also in SHUTDOWN state.
>  
> *Steps To Reproduce:*
> 1. SSH to host running NodeManager.
>  2. Switch-to UserID that NodeManager is running as (hadoop).
>  3. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  4. Wait for NodeManager process to terminate gracefully.
>  5. Confirm Node is in SHUTDOWN state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  6. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  7. Confirm Node is in RUNNING state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  
> *Investigation:*
>  1. Review contents of ResourceManager + NodeManager log-files:
> +ResourceManager log-[file:+|file:///+]
>  2018-03-08 08:15:44,085 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node 
> with node id : rb0101.local:43892 has shutdown, hence unregistering the node.
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node rb0101.local:43892 as it is now SHUTDOWN
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:43892 Node Transitioned from RUNNING to SHUTDOWN
>  2018-03-08 08:15:44,093 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Removed node rb0101.local:43892 cluster capacity: 
>  2018-03-08 08:16:08,915 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node rb0101.local(cmPort: 42627 httpPort: 8042) registered 
> with capability: , assigned nodeId rb0101.local:42627
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:42627 Node Transitioned from NEW to RUNNING
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Added node rb0101.local:42627 cluster capacity: 
>  2018-03-08 08:16:34,826 WARN org.apache.hadoop.ipc.Server: Large response 
> size 2976014 for call Call#428958 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 192.168.1.100:44034
>  
> +NodeManager log-[file:+|file:///+]
>  2018-03-08 08:00:14,500 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:10:14,498 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:15:44,048 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: 
> SIGTERM
>  2018-03-08 08:15:44,101 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Successfully 
> Unregistered the Node rb0101.local:43892 with ResourceManager.
>  2018-03-08 08:15:44,114 INFO org.mortbay.log: Stopped 
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042
>  2018-03-08 08:15:44,226 INFO org.apache.hadoop.ipc.Server: Stopping server 
> on 43892
>  2018-03-08 08:15:44,232 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 43892
>  2018-03-08 08:15:44,237 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
>  2018-03-08 08:15:44,239 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  org.apache.hadoop.yarn.server.nodemanager.containermanager.logag
>  gregation.LogAggregationService waiting for pending aggregation during exit
>  2018-03-08 08:15:44,242 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Cont
>  ainersMonitorImpl is interrupted. Exiting.
>  2018-03-08 08:15:44,284 INFO org.apache.hadoop.ipc.Server: Stopping server 
> on 8040
>  2018-03-08 08:15:44,285 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server 

[jira] [Comment Edited] (YARN-8014) YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously

2018-03-08 Thread Evan Tepsic (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391438#comment-16391438
 ] 

Evan Tepsic edited comment on YARN-8014 at 3/8/18 3:59 PM:
---

This could be caused by buildNodeId( ), as the Port # it generates appears to 
be random when yarn.nodemanager.address is not defined in a NodeManager's 
yarn-site.xml.


was (Author: tepsic):
This could be caused by buildNodeId( ), as the Port # it generates appears to 
be random.

> YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously
> -
>
> Key: YARN-8014
> URL: https://issues.apache.org/jira/browse/YARN-8014
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.2
>Reporter: Evan Tepsic
>Priority: Minor
>
> A graceful shutdown & then startup of a NodeManager process using YARN/HDFS 
> v2.8.2 seems to successfully place the Node back into RUNNING state. However, 
> ResouceManager appears to keep the Node also in SHUTDOWN state.
>  
> *Steps To Reproduce:*
> 1. SSH to host running NodeManager.
>  2. Switch-to UserID that NodeManager is running as (hadoop).
>  3. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  4. Wait for NodeManager process to terminate gracefully.
>  5. Confirm Node is in SHUTDOWN state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  6. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  7. Confirm Node is in RUNNING state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  
> *Investigation:*
>  1. Review contents of ResourceManager + NodeManager log-files:
> +ResourceManager log-[file:+|file:///+]
>  2018-03-08 08:15:44,085 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node 
> with node id : rb0101.local:43892 has shutdown, hence unregistering the node.
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node rb0101.local:43892 as it is now SHUTDOWN
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:43892 Node Transitioned from RUNNING to SHUTDOWN
>  2018-03-08 08:15:44,093 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Removed node rb0101.local:43892 cluster capacity: 
>  2018-03-08 08:16:08,915 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node rb0101.local(cmPort: 42627 httpPort: 8042) registered 
> with capability: , assigned nodeId rb0101.local:42627
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:42627 Node Transitioned from NEW to RUNNING
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Added node rb0101.local:42627 cluster capacity: 
>  2018-03-08 08:16:34,826 WARN org.apache.hadoop.ipc.Server: Large response 
> size 2976014 for call Call#428958 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 192.168.1.100:44034
>  
> +NodeManager log-[file:+|file:///+]
>  2018-03-08 08:00:14,500 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:10:14,498 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:15:44,048 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: 
> SIGTERM
>  2018-03-08 08:15:44,101 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Successfully 
> Unregistered the Node rb0101.local:43892 with ResourceManager.
>  2018-03-08 08:15:44,114 INFO org.mortbay.log: Stopped 
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042
>  2018-03-08 08:15:44,226 INFO org.apache.hadoop.ipc.Server: Stopping server 
> on 43892
>  2018-03-08 08:15:44,232 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 43892
>  2018-03-08 08:15:44,237 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
>  2018-03-08 08:15:44,239 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  org.apache.hadoop.yarn.server.nodemanager.containermanager.logag
>  gregation.LogAggregationService waiting for pending aggregation during exit
>  2018-03-08 08:15:44,242 WARN 
> 

[jira] [Commented] (YARN-8014) YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously

2018-03-08 Thread Evan Tepsic (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391438#comment-16391438
 ] 

Evan Tepsic commented on YARN-8014:
---

This could be caused by buildNodeId( ), as the Port # it generates appears to 
be random.

> YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously
> -
>
> Key: YARN-8014
> URL: https://issues.apache.org/jira/browse/YARN-8014
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.2
>Reporter: Evan Tepsic
>Priority: Minor
>
> A graceful shutdown & then startup of a NodeManager process using YARN/HDFS 
> v2.8.2 seems to successfully place the Node back into RUNNING state. However, 
> ResouceManager appears to keep the Node also in SHUTDOWN state.
>  
> *Steps To Reproduce:*
> 1. SSH to host running NodeManager.
>  2. Switch-to UserID that NodeManager is running as (hadoop).
>  3. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  4. Wait for NodeManager process to terminate gracefully.
>  5. Confirm Node is in SHUTDOWN state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  6. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  7. Confirm Node is in RUNNING state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  
> *Investigation:*
>  1. Review contents of ResourceManager + NodeManager log-files:
> +ResourceManager log-[file:+|file:///+]
>  2018-03-08 08:15:44,085 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node 
> with node id : rb0101.local:43892 has shutdown, hence unregistering the node.
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node rb0101.local:43892 as it is now SHUTDOWN
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:43892 Node Transitioned from RUNNING to SHUTDOWN
>  2018-03-08 08:15:44,093 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Removed node rb0101.local:43892 cluster capacity: 
>  2018-03-08 08:16:08,915 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node rb0101.local(cmPort: 42627 httpPort: 8042) registered 
> with capability: , assigned nodeId rb0101.local:42627
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:42627 Node Transitioned from NEW to RUNNING
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Added node rb0101.local:42627 cluster capacity: 
>  2018-03-08 08:16:34,826 WARN org.apache.hadoop.ipc.Server: Large response 
> size 2976014 for call Call#428958 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 192.168.1.100:44034
>  
> +NodeManager log-[file:+|file:///+]
>  2018-03-08 08:00:14,500 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:10:14,498 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:15:44,048 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: 
> SIGTERM
>  2018-03-08 08:15:44,101 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Successfully 
> Unregistered the Node rb0101.local:43892 with ResourceManager.
>  2018-03-08 08:15:44,114 INFO org.mortbay.log: Stopped 
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042
>  2018-03-08 08:15:44,226 INFO org.apache.hadoop.ipc.Server: Stopping server 
> on 43892
>  2018-03-08 08:15:44,232 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 43892
>  2018-03-08 08:15:44,237 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
>  2018-03-08 08:15:44,239 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  org.apache.hadoop.yarn.server.nodemanager.containermanager.logag
>  gregation.LogAggregationService waiting for pending aggregation during exit
>  2018-03-08 08:15:44,242 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Cont
>  ainersMonitorImpl is interrupted. Exiting.
>  2018-03-08 08:15:44,284 INFO 

[jira] [Resolved] (YARN-8014) YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously

2018-03-08 Thread Evan Tepsic (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Evan Tepsic resolved YARN-8014.
---
Resolution: Fixed

> YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously
> -
>
> Key: YARN-8014
> URL: https://issues.apache.org/jira/browse/YARN-8014
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.2
>Reporter: Evan Tepsic
>Priority: Minor
>
> A graceful shutdown & then startup of a NodeManager process using YARN/HDFS 
> v2.8.2 seems to successfully place the Node back into RUNNING state. However, 
> ResouceManager appears to keep the Node also in SHUTDOWN state.
>  
> *Steps To Reproduce:*
> 1. SSH to host running NodeManager.
>  2. Switch-to UserID that NodeManager is running as (hadoop).
>  3. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  4. Wait for NodeManager process to terminate gracefully.
>  5. Confirm Node is in SHUTDOWN state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  6. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  7. Confirm Node is in RUNNING state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  
> *Investigation:*
>  1. Review contents of ResourceManager + NodeManager log-files:
> +ResourceManager log-[file:+|file:///+]
>  2018-03-08 08:15:44,085 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node 
> with node id : rb0101.local:43892 has shutdown, hence unregistering the node.
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node rb0101.local:43892 as it is now SHUTDOWN
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:43892 Node Transitioned from RUNNING to SHUTDOWN
>  2018-03-08 08:15:44,093 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Removed node rb0101.local:43892 cluster capacity: 
>  2018-03-08 08:16:08,915 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node rb0101.local(cmPort: 42627 httpPort: 8042) registered 
> with capability: , assigned nodeId rb0101.local:42627
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:42627 Node Transitioned from NEW to RUNNING
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Added node rb0101.local:42627 cluster capacity: 
>  2018-03-08 08:16:34,826 WARN org.apache.hadoop.ipc.Server: Large response 
> size 2976014 for call Call#428958 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 192.168.1.100:44034
>  
> +NodeManager log-[file:+|file:///+]
>  2018-03-08 08:00:14,500 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:10:14,498 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:15:44,048 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: 
> SIGTERM
>  2018-03-08 08:15:44,101 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Successfully 
> Unregistered the Node rb0101.local:43892 with ResourceManager.
>  2018-03-08 08:15:44,114 INFO org.mortbay.log: Stopped 
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042
>  2018-03-08 08:15:44,226 INFO org.apache.hadoop.ipc.Server: Stopping server 
> on 43892
>  2018-03-08 08:15:44,232 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 43892
>  2018-03-08 08:15:44,237 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
>  2018-03-08 08:15:44,239 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  org.apache.hadoop.yarn.server.nodemanager.containermanager.logag
>  gregation.LogAggregationService waiting for pending aggregation during exit
>  2018-03-08 08:15:44,242 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Cont
>  ainersMonitorImpl is interrupted. Exiting.
>  2018-03-08 08:15:44,284 INFO org.apache.hadoop.ipc.Server: Stopping server 
> on 8040
>  2018-03-08 08:15:44,285 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> 

[jira] [Commented] (YARN-8014) YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously

2018-03-08 Thread Evan Tepsic (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391423#comment-16391423
 ] 

Evan Tepsic commented on YARN-8014:
---

This behavior seems to be caused by the lack of property 
*yarn.nodemanager.address* inside yarn-site.xml files of NodeManagers.


When explicitly defining that, this behavior does-not occur:

 
   yarn.nodemanager.address
   rb0101.local:


> YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously
> -
>
> Key: YARN-8014
> URL: https://issues.apache.org/jira/browse/YARN-8014
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.2
>Reporter: Evan Tepsic
>Priority: Minor
>
> A graceful shutdown & then startup of a NodeManager process using YARN/HDFS 
> v2.8.2 seems to successfully place the Node back into RUNNING state. However, 
> ResouceManager appears to keep the Node also in SHUTDOWN state.
>  
> *Steps To Reproduce:*
> 1. SSH to host running NodeManager.
>  2. Switch-to UserID that NodeManager is running as (hadoop).
>  3. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  4. Wait for NodeManager process to terminate gracefully.
>  5. Confirm Node is in SHUTDOWN state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  6. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  7. Confirm Node is in RUNNING state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  
> *Investigation:*
>  1. Review contents of ResourceManager + NodeManager log-files:
> +ResourceManager log-[file:+|file:///+]
>  2018-03-08 08:15:44,085 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node 
> with node id : rb0101.local:43892 has shutdown, hence unregistering the node.
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node rb0101.local:43892 as it is now SHUTDOWN
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:43892 Node Transitioned from RUNNING to SHUTDOWN
>  2018-03-08 08:15:44,093 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Removed node rb0101.local:43892 cluster capacity: 
>  2018-03-08 08:16:08,915 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node rb0101.local(cmPort: 42627 httpPort: 8042) registered 
> with capability: , assigned nodeId rb0101.local:42627
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:42627 Node Transitioned from NEW to RUNNING
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Added node rb0101.local:42627 cluster capacity: 
>  2018-03-08 08:16:34,826 WARN org.apache.hadoop.ipc.Server: Large response 
> size 2976014 for call Call#428958 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 192.168.1.100:44034
>  
> +NodeManager log-[file:+|file:///+]
>  2018-03-08 08:00:14,500 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:10:14,498 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:15:44,048 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: 
> SIGTERM
>  2018-03-08 08:15:44,101 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Successfully 
> Unregistered the Node rb0101.local:43892 with ResourceManager.
>  2018-03-08 08:15:44,114 INFO org.mortbay.log: Stopped 
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042
>  2018-03-08 08:15:44,226 INFO org.apache.hadoop.ipc.Server: Stopping server 
> on 43892
>  2018-03-08 08:15:44,232 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 43892
>  2018-03-08 08:15:44,237 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
>  2018-03-08 08:15:44,239 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  org.apache.hadoop.yarn.server.nodemanager.containermanager.logag
>  gregation.LogAggregationService waiting for pending aggregation during exit
>  2018-03-08 08:15:44,242 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  

[jira] [Commented] (YARN-8014) YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously

2018-03-08 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391417#comment-16391417
 ] 

Jason Lowe commented on YARN-8014:
--

I believe this is an artifact of the NM appearing to be two separate instances 
of nodemanagers.  Note that the NM port changed between the two instances.  It 
originally was rb0101.local:43892 but became rb0101.local:42627 after it was 
restarted.  That explains why the node shows up twice when listing all nodes.  
The RM did not understand that the newly joining NM at port 42627 was supposed 
to be the same one that was at port 43892.  The RM does not preclude multiple 
NMs running at the same node, and indeed that's how the mini clusters used for 
unit tests can run multiple NMs with only one host.

It is surprising that the shutdown NM instance does not appear when explicitly 
asking for nodes in the shutdown state.  I suspect somewhere in the RM's 
bookkeeping it is dropping the port distinction and the RUNNING instance ends 
up superceding the SHUTDOWN one for that query.

Simplest workaround for this is to use a fixed port for the NM.  Then the RM 
will understand that the new node joining is the same node that left 
previously.  That also has the benefit of precluding an accidental 
double-startup of an NM on a node which is not going to go well if not 
configured intentionally for that scenario.  Both NMs will think they have 
control of the node's resources and end up using far more resources on the node 
than intended.


> YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously
> -
>
> Key: YARN-8014
> URL: https://issues.apache.org/jira/browse/YARN-8014
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.2
>Reporter: Evan Tepsic
>Priority: Minor
>
> A graceful shutdown & then startup of a NodeManager process using YARN/HDFS 
> v2.8.2 seems to successfully place the Node back into RUNNING state. However, 
> ResouceManager appears to keep the Node also in SHUTDOWN state.
>  
> *Steps To Reproduce:*
> 1. SSH to host running NodeManager.
>  2. Switch-to UserID that NodeManager is running as (hadoop).
>  3. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  4. Wait for NodeManager process to terminate gracefully.
>  5. Confirm Node is in SHUTDOWN state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  6. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
>  7. Confirm Node is in RUNNING state via: 
> [http://rb01rm01.local:8088/cluster/nodes]
>  
> *Investigation:*
>  1. Review contents of ResourceManager + NodeManager log-files:
> +ResourceManager log-[file:+|file:///+]
>  2018-03-08 08:15:44,085 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node 
> with node id : rb0101.local:43892 has shutdown, hence unregistering the node.
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node rb0101.local:43892 as it is now SHUTDOWN
>  2018-03-08 08:15:44,092 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:43892 Node Transitioned from RUNNING to SHUTDOWN
>  2018-03-08 08:15:44,093 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Removed node rb0101.local:43892 cluster capacity: 
>  2018-03-08 08:16:08,915 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node rb0101.local(cmPort: 42627 httpPort: 8042) registered 
> with capability: , assigned nodeId rb0101.local:42627
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> rb0101.local:42627 Node Transitioned from NEW to RUNNING
>  2018-03-08 08:16:08,916 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Added node rb0101.local:42627 cluster capacity: 
>  2018-03-08 08:16:34,826 WARN org.apache.hadoop.ipc.Server: Large response 
> size 2976014 for call Call#428958 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 192.168.1.100:44034
>  
> +NodeManager log-[file:+|file:///+]
>  2018-03-08 08:00:14,500 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:10:14,498 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
>  Deleted: 0, Private Deleted: 0
>  2018-03-08 08:15:44,048 ERROR 
> 

[jira] [Updated] (YARN-8014) YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously

2018-03-08 Thread Evan Tepsic (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Evan Tepsic updated YARN-8014:
--
Description: 
A graceful shutdown & then startup of a NodeManager process using YARN/HDFS 
v2.8.2 seems to successfully place the Node back into RUNNING state. However, 
ResouceManager appears to keep the Node also in SHUTDOWN state.

 

*Steps To Reproduce:*

1. SSH to host running NodeManager.
 2. Switch-to UserID that NodeManager is running as (hadoop).
 3. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
 4. Wait for NodeManager process to terminate gracefully.
 5. Confirm Node is in SHUTDOWN state via: 
[http://rb01rm01.local:8088/cluster/nodes]
 6. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
 7. Confirm Node is in RUNNING state via: 
[http://rb01rm01.local:8088/cluster/nodes]

 

*Investigation:*
 1. Review contents of ResourceManager + NodeManager log-files:

+ResourceManager log-[file:+|file:///+]
 2018-03-08 08:15:44,085 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node with 
node id : rb0101.local:43892 has shutdown, hence unregistering the node.
 2018-03-08 08:15:44,092 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
Node rb0101.local:43892 as it is now SHUTDOWN
 2018-03-08 08:15:44,092 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
rb0101.local:43892 Node Transitioned from RUNNING to SHUTDOWN
 2018-03-08 08:15:44,093 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Removed node rb0101.local:43892 cluster capacity: 
 2018-03-08 08:16:08,915 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
NodeManager from node rb0101.local(cmPort: 42627 httpPort: 8042) registered 
with capability: , assigned nodeId rb0101.local:42627
 2018-03-08 08:16:08,916 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
rb0101.local:42627 Node Transitioned from NEW to RUNNING
 2018-03-08 08:16:08,916 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Added node rb0101.local:42627 cluster capacity: 
 2018-03-08 08:16:34,826 WARN org.apache.hadoop.ipc.Server: Large response size 
2976014 for call Call#428958 Retry#0 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
192.168.1.100:44034

 

+NodeManager log-[file:+|file:///+]
 2018-03-08 08:00:14,500 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
 Deleted: 0, Private Deleted: 0
 2018-03-08 08:10:14,498 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
 Deleted: 0, Private Deleted: 0
 2018-03-08 08:15:44,048 ERROR 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: 
SIGTERM
 2018-03-08 08:15:44,101 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Successfully 
Unregistered the Node rb0101.local:43892 with ResourceManager.
 2018-03-08 08:15:44,114 INFO org.mortbay.log: Stopped 
HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042
 2018-03-08 08:15:44,226 INFO org.apache.hadoop.ipc.Server: Stopping server on 
43892
 2018-03-08 08:15:44,232 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
listener on 43892
 2018-03-08 08:15:44,237 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
Responder
 2018-03-08 08:15:44,239 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logag
 gregation.LogAggregationService waiting for pending aggregation during exit
 2018-03-08 08:15:44,242 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Cont
 ainersMonitorImpl is interrupted. Exiting.
 2018-03-08 08:15:44,284 INFO org.apache.hadoop.ipc.Server: Stopping server on 
8040
 2018-03-08 08:15:44,285 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
listener on 8040
 2018-03-08 08:15:44,285 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
Responder
 2018-03-08 08:15:44,287 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Public cache exiting
 2018-03-08 08:15:44,289 WARN 
org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl: 
org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl is 
interrupted. Exiting.
 2018-03-08 08:15:44,294 INFO 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics 
system...
 2018-03-08 08:15:44,295 INFO 

[jira] [Created] (YARN-8014) YARN ResourceManager Lists A NodeManager As RUNNING & SHUTDOWN Simultaneously

2018-03-08 Thread Evan Tepsic (JIRA)
Evan Tepsic created YARN-8014:
-

 Summary: YARN ResourceManager Lists A NodeManager As RUNNING & 
SHUTDOWN Simultaneously
 Key: YARN-8014
 URL: https://issues.apache.org/jira/browse/YARN-8014
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.8.2
Reporter: Evan Tepsic


A graceful shutdown & then startup of a NodeManager process using YARN/HDFS 
v2.8.2 seems to successfully place the Node back into RUNNING state. However, 
ResouceManager appears to keep the Node also in SHUTDOWN state.

 

*Steps To Reproduce:*

1. SSH to host running NodeManager.
2. Switch-to UserID that NodeManager is running as (hadoop).
3. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
4. Wait for NodeManager process to terminate gracefully.
5. Confirm Node is in SHUTDOWN state via: 
http://rb01rm01.local:8088/cluster/nodes
6. Execute cmd: /opt/hadoop/sbin/yarn-daemon.sh stop nodemanager
7. Confirm Node is in RUNNING state via: 
http://rb01rm01.local:8088/cluster/nodes


*Investigation:*
1. Review contents of ResourceManager + NodeManager log-files:

+ResourceManager log-file:+
2018-03-08 08:15:44,085 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node with 
node id : rb0101.local:43892 has shutdown, hence unregistering the node.
2018-03-08 08:15:44,092 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
Node rb0101.local:43892 as it is now SHUTDOWN
2018-03-08 08:15:44,092 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
rb0101.local:43892 Node Transitioned from RUNNING to SHUTDOWN
2018-03-08 08:15:44,093 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Removed node rb0101.local:43892 cluster capacity: 
2018-03-08 08:16:08,915 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
NodeManager from node rb0101.local(cmPort: 42627 httpPort: 8042) registered 
with capability: , assigned nodeId rb0101.local:42627
2018-03-08 08:16:08,916 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
rb0101.local:42627 Node Transitioned from NEW to RUNNING
2018-03-08 08:16:08,916 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Added node rb0101.local:42627 cluster capacity: 
2018-03-08 08:16:34,826 WARN org.apache.hadoop.ipc.Server: Large response size 
2976014 for call Call#428958 Retry#0 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
192.168.1.100:44034

 

+NodeManager log-file:+
2018-03-08 08:00:14,500 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
Deleted: 0, Private Deleted: 0
2018-03-08 08:10:14,498 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Cache Size Before Clean: 10720046250, Total Deleted: 0, Public
Deleted: 0, Private Deleted: 0
2018-03-08 08:15:44,048 ERROR 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: 
SIGTERM
2018-03-08 08:15:44,101 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Successfully 
Unregistered the Node rb0101.local:43892 with ResourceManager.
2018-03-08 08:15:44,114 INFO org.mortbay.log: Stopped 
HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042
2018-03-08 08:15:44,226 INFO org.apache.hadoop.ipc.Server: Stopping server on 
43892
2018-03-08 08:15:44,232 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
listener on 43892
2018-03-08 08:15:44,237 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
Responder
2018-03-08 08:15:44,239 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logag
gregation.LogAggregationService waiting for pending aggregation during exit
2018-03-08 08:15:44,242 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.Cont
ainersMonitorImpl is interrupted. Exiting.
2018-03-08 08:15:44,284 INFO org.apache.hadoop.ipc.Server: Stopping server on 
8040
2018-03-08 08:15:44,285 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
listener on 8040
2018-03-08 08:15:44,285 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
Responder
2018-03-08 08:15:44,287 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Public cache exiting
2018-03-08 08:15:44,289 WARN 
org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl: 
org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl is 
interrupted. Exiting.

[jira] [Commented] (YARN-7952) RM should be able to recover log aggregation status after restart/fail-over

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391231#comment-16391231
 ] 

genericqa commented on YARN-7952:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
8s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
34s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 17s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 3 new + 382 unchanged - 0 fixed = 385 total (was 382) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  2s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
42s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
22s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
21s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m  4s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 72m  
3s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}181m 12s{color} 

[jira] [Commented] (YARN-8007) Support specifying placement constraint for task containers in SLS

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391172#comment-16391172
 ] 

genericqa commented on YARN-8007:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 11s{color} | {color:orange} hadoop-tools/hadoop-sls: The patch generated 13 
new + 57 unchanged - 0 fixed = 70 total (was 57) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
40s{color} | {color:red} hadoop-tools/hadoop-sls generated 2 new + 0 unchanged 
- 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m  0s{color} 
| {color:red} hadoop-sls in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 55m 39s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-tools/hadoop-sls |
|  |  Unread field:SynthTraceJobProducer.java:[line 135] |
|  |  Unread field:SynthTraceJobProducer.java:[line 136] |
| Failed junit tests | hadoop.yarn.sls.TestSLSStreamAMSynthWithConstraint |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8007 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913587/YARN-8007.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  xml  findbugs  checkstyle  |
| uname | Linux 618e88575910 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7ef4d94 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | 

[jira] [Commented] (YARN-2442) ResourceManager JMX UI does not give HA State

2018-03-08 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391166#comment-16391166
 ] 

Bibin A Chundatt commented on YARN-2442:


[~rohithsharma]

As per current implemetation /jmx requests also gets diverted to active 
RM.(YARN-1898)

So RMInfo ha state will always be *active*. jmx do return details like Runtime 
, Operating System details etc , so redirect is not correct when user wants to 
access standby RM details though jmx.

One solution i had in my mind is making the NON_REDIRECT_URL configurable in 
RMWebAppFilter

Thoughts??

> ResourceManager JMX UI does not give HA State
> -
>
> Key: YARN-2442
> URL: https://issues.apache.org/jira/browse/YARN-2442
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Nishan Shetty
>Assignee: Rohith Sharma K S
>Priority: Major
>  Labels: oct16-easy
> Attachments: 0001-YARN-2442.patch, YARN-2442.02.patch
>
>
> ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, 
> STOPPED)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8013) Support APP-TAG namespace for allocation tags

2018-03-08 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8013:
--
Description: 
YARN-1461 adds *Application Tag* concept to Yarn applications, user is able to 
annotate application with multiple tags to classify apps. We can leverage this 
to represent a namespace for a certain group of apps. So instead of calling 
*app-label*, propose to call it *app-tag*.

A typical use case is,

There are a lot of TF jobs running on Yarn, and some of them are consuming 
resources heavily. So we want to limit number of PS on each node for such BIG 
players but ignore those SMALL ones. To achieve this, we can do following steps:
 # Add application tag "big-tf" to these big TF jobs
 # For each PS request, we add "ps" source tag and map it to constraint 
"{color:#d04437}notin, node, tensorflow/ps{color}" or 
"{color:#d04437}cardinality, node, tensorflow/ps{color}{color:#d04437}, 0, 
2{color}" for finer grained controls.

  was:
YARN-1461 adds *Application Tag* concept to Yarn applications, user is able to 
annotate application with multiple tags to classify apps. We can leverage this 
to represent a namespace for a certain group of apps. So instead of calling 
*app-label*, propose to call it *app-tag*.

A typical use case is,

There are a lot of TF jobs running on Yarn, and some of them are consuming 
resources heavily. So we want to limit number of PS on each node for such BIG 
players but ignore those SMALL ones. To achieve this, we can do following steps:
 # Add application tag "big-tf" to these big TF jobs
 # For each PS request, we add "ps" source tag and map it to constraint 
"{color:#d04437}notin, node, tensorflow/ps{color}" or 
"{color:#d04437}cardinality, node, tensorflow/ps{color}, 0, 2" for finer 
grained controls.


> Support APP-TAG namespace for allocation tags
> -
>
> Key: YARN-8013
> URL: https://issues.apache.org/jira/browse/YARN-8013
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>
> YARN-1461 adds *Application Tag* concept to Yarn applications, user is able 
> to annotate application with multiple tags to classify apps. We can leverage 
> this to represent a namespace for a certain group of apps. So instead of 
> calling *app-label*, propose to call it *app-tag*.
> A typical use case is,
> There are a lot of TF jobs running on Yarn, and some of them are consuming 
> resources heavily. So we want to limit number of PS on each node for such BIG 
> players but ignore those SMALL ones. To achieve this, we can do following 
> steps:
>  # Add application tag "big-tf" to these big TF jobs
>  # For each PS request, we add "ps" source tag and map it to constraint 
> "{color:#d04437}notin, node, tensorflow/ps{color}" or 
> "{color:#d04437}cardinality, node, tensorflow/ps{color}{color:#d04437}, 0, 
> 2{color}" for finer grained controls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8013) Support APP-TAG namespace for allocation tags

2018-03-08 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391109#comment-16391109
 ] 

Weiwei Yang commented on YARN-8013:
---

Loop [~leftnoteasy], [~asuresh], [~kkaranasos] for discussion, please let me 
know your thoughts, thanks!

> Support APP-TAG namespace for allocation tags
> -
>
> Key: YARN-8013
> URL: https://issues.apache.org/jira/browse/YARN-8013
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>
> YARN-1461 adds *Application Tag* concept to Yarn applications, user is able 
> to annotate application with multiple tags to classify apps. We can leverage 
> this to represent a namespace for a certain group of apps. So instead of 
> calling *app-label*, propose to call it *app-tag*.
> A typical use case is,
> There are a lot of TF jobs running on Yarn, and some of them are consuming 
> resources heavily. So we want to limit number of PS on each node for such BIG 
> players but ignore those SMALL ones. To achieve this, we can do following 
> steps:
>  # Add application tag "big-tf" to these big TF jobs
>  # For each PS request, we add "ps" source tag and map it to constraint 
> "notin, node, tensorflow/ps" or "cardinality, node, tensorflow/ps, 0, 2" for 
> finer grained controls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8013) Support APP-TAG namespace for allocation tags

2018-03-08 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8013:
--
Description: 
YARN-1461 adds *Application Tag* concept to Yarn applications, user is able to 
annotate application with multiple tags to classify apps. We can leverage this 
to represent a namespace for a certain group of apps. So instead of calling 
*app-label*, propose to call it *app-tag*.

A typical use case is,

There are a lot of TF jobs running on Yarn, and some of them are consuming 
resources heavily. So we want to limit number of PS on each node for such BIG 
players but ignore those SMALL ones. To achieve this, we can do following steps:
 # Add application tag "big-tf" to these big TF jobs
 # For each PS request, we add "ps" source tag and map it to constraint 
"{color:#d04437}notin, node, tensorflow/ps{color}" or 
"{color:#d04437}cardinality, node, tensorflow/ps{color}, 0, 2" for finer 
grained controls.

  was:
YARN-1461 adds *Application Tag* concept to Yarn applications, user is able to 
annotate application with multiple tags to classify apps. We can leverage this 
to represent a namespace for a certain group of apps. So instead of calling 
*app-label*, propose to call it *app-tag*.

A typical use case is,

There are a lot of TF jobs running on Yarn, and some of them are consuming 
resources heavily. So we want to limit number of PS on each node for such BIG 
players but ignore those SMALL ones. To achieve this, we can do following steps:
 # Add application tag "big-tf" to these big TF jobs
 # For each PS request, we add "ps" source tag and map it to constraint 
"+notin, node, tensorflow/ps+" or "+cardinality, node, tensorflow/ps+, 0, 2" 
for finer grained controls.


> Support APP-TAG namespace for allocation tags
> -
>
> Key: YARN-8013
> URL: https://issues.apache.org/jira/browse/YARN-8013
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>
> YARN-1461 adds *Application Tag* concept to Yarn applications, user is able 
> to annotate application with multiple tags to classify apps. We can leverage 
> this to represent a namespace for a certain group of apps. So instead of 
> calling *app-label*, propose to call it *app-tag*.
> A typical use case is,
> There are a lot of TF jobs running on Yarn, and some of them are consuming 
> resources heavily. So we want to limit number of PS on each node for such BIG 
> players but ignore those SMALL ones. To achieve this, we can do following 
> steps:
>  # Add application tag "big-tf" to these big TF jobs
>  # For each PS request, we add "ps" source tag and map it to constraint 
> "{color:#d04437}notin, node, tensorflow/ps{color}" or 
> "{color:#d04437}cardinality, node, tensorflow/ps{color}, 0, 2" for finer 
> grained controls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8013) Support APP-TAG namespace for allocation tags

2018-03-08 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8013:
--
Description: 
YARN-1461 adds *Application Tag* concept to Yarn applications, user is able to 
annotate application with multiple tags to classify apps. We can leverage this 
to represent a namespace for a certain group of apps. So instead of calling 
*app-label*, propose to call it *app-tag*.

A typical use case is,

There are a lot of TF jobs running on Yarn, and some of them are consuming 
resources heavily. So we want to limit number of PS on each node for such BIG 
players but ignore those SMALL ones. To achieve this, we can do following steps:
 # Add application tag "big-tf" to these big TF jobs
 # For each PS request, we add "ps" source tag and map it to constraint 
"+notin, node, tensorflow/ps+" or "+cardinality, node, tensorflow/ps+, 0, 2" 
for finer grained controls.

  was:
YARN-1461 adds *Application Tag* concept to Yarn applications, user is able to 
annotate application with multiple tags to classify apps. We can leverage this 
to represent a namespace for a certain group of apps. So instead of calling 
*app-label*, propose to call it *app-tag*.

A typical use case is,

There are a lot of TF jobs running on Yarn, and some of them are consuming 
resources heavily. So we want to limit number of PS on each node for such BIG 
players but ignore those SMALL ones. To achieve this, we can do following steps:
 # Add application tag "big-tf" to these big TF jobs
 # For each PS request, we add "ps" source tag and map it to constraint "notin, 
node, tensorflow/ps" or "cardinality, node, tensorflow/ps, 0, 2" for finer 
grained controls.


> Support APP-TAG namespace for allocation tags
> -
>
> Key: YARN-8013
> URL: https://issues.apache.org/jira/browse/YARN-8013
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>
> YARN-1461 adds *Application Tag* concept to Yarn applications, user is able 
> to annotate application with multiple tags to classify apps. We can leverage 
> this to represent a namespace for a certain group of apps. So instead of 
> calling *app-label*, propose to call it *app-tag*.
> A typical use case is,
> There are a lot of TF jobs running on Yarn, and some of them are consuming 
> resources heavily. So we want to limit number of PS on each node for such BIG 
> players but ignore those SMALL ones. To achieve this, we can do following 
> steps:
>  # Add application tag "big-tf" to these big TF jobs
>  # For each PS request, we add "ps" source tag and map it to constraint 
> "+notin, node, tensorflow/ps+" or "+cardinality, node, tensorflow/ps+, 0, 2" 
> for finer grained controls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8013) Support app-tag namespace for allocation tags

2018-03-08 Thread Weiwei Yang (JIRA)
Weiwei Yang created YARN-8013:
-

 Summary: Support app-tag namespace for allocation tags
 Key: YARN-8013
 URL: https://issues.apache.org/jira/browse/YARN-8013
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Weiwei Yang
Assignee: Weiwei Yang


YARN-1461 adds *Application Tag* concept to Yarn applications, user is able to 
annotate application with multiple tags to classify apps. We can leverage this 
to represent a namespace for a certain group of apps. So instead of calling 
*app-label*, propose to call it *app-tag*.

A typical use case is,

There are a lot of TF jobs running on Yarn, and some of them are consuming 
resources heavily. So we want to limit number of PS on each node for such BIG 
players but ignore those SMALL ones. To achieve this, we can do following steps:
 # Add application tag "big-tf" to these big TF jobs
 # For each PS request, we add "ps" source tag and map it to constraint "notin, 
node, tensorflow/ps" or "cardinality, node, tensorflow/ps, 0, 2" for finer 
grained controls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8013) Support APP-TAG namespace for allocation tags

2018-03-08 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8013:
--
Summary: Support APP-TAG namespace for allocation tags  (was: Support 
app-tag namespace for allocation tags)

> Support APP-TAG namespace for allocation tags
> -
>
> Key: YARN-8013
> URL: https://issues.apache.org/jira/browse/YARN-8013
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>
> YARN-1461 adds *Application Tag* concept to Yarn applications, user is able 
> to annotate application with multiple tags to classify apps. We can leverage 
> this to represent a namespace for a certain group of apps. So instead of 
> calling *app-label*, propose to call it *app-tag*.
> A typical use case is,
> There are a lot of TF jobs running on Yarn, and some of them are consuming 
> resources heavily. So we want to limit number of PS on each node for such BIG 
> players but ignore those SMALL ones. To achieve this, we can do following 
> steps:
>  # Add application tag "big-tf" to these big TF jobs
>  # For each PS request, we add "ps" source tag and map it to constraint 
> "notin, node, tensorflow/ps" or "cardinality, node, tensorflow/ps, 0, 2" for 
> finer grained controls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8007) Support specifying placement constraint for task containers in SLS

2018-03-08 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-8007:

Attachment: YARN-8007.002.patch

> Support specifying placement constraint for task containers in SLS
> --
>
> Key: YARN-8007
> URL: https://issues.apache.org/jira/browse/YARN-8007
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Major
> Attachments: YARN-8007.001.patch, YARN-8007.002.patch
>
>
> YARN-6592 introduces placement constraint. Currently SLS does not support 
> specify placement constraint. 
> In order to help better perf test, we should be able to support specify 
> placement for containers in sls configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8007) Support specifying placement constraint for task containers in SLS

2018-03-08 Thread Jiandan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391097#comment-16391097
 ] 

Jiandan Yang  commented on YARN-8007:
-

Thank [~cheersyang] for timely review and suggestion.
I will upload v2 patch according to your comment.

> Support specifying placement constraint for task containers in SLS
> --
>
> Key: YARN-8007
> URL: https://issues.apache.org/jira/browse/YARN-8007
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Major
> Attachments: YARN-8007.001.patch, YARN-8007.002.patch
>
>
> YARN-6592 introduces placement constraint. Currently SLS does not support 
> specify placement constraint. 
> In order to help better perf test, we should be able to support specify 
> placement for containers in sls configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8011) TestOpportunisticContainerAllocatorAMService#testContainerPromoteAndDemoteBeforeContainerStart fails sometimes in trunk

2018-03-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391046#comment-16391046
 ] 

Hudson commented on YARN-8011:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13795 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13795/])
YARN-8011. (wwei: rev b451889e8e83f7977f2b76789c61e823e2d40487)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestOpportunisticContainerAllocatorAMService.java


> TestOpportunisticContainerAllocatorAMService#testContainerPromoteAndDemoteBeforeContainerStart
>  fails sometimes in trunk
> ---
>
> Key: YARN-8011
> URL: https://issues.apache.org/jira/browse/YARN-8011
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: YARN-8011.001.patch, YARN-8011.002.patch
>
>
> TestOpportunisticContainerAllocatorAMService#testContainerPromoteAndDemoteBeforeContainerStart
>  often pass, but the following errors sometimes occur:
> {noformat}
> java.lang.AssertionError: 
> Expected :15360
> Actual :14336
> 
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService.verifyMetrics(TestOpportunisticContainerAllocatorAMService.java:732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService.testContainerPromoteAndDemoteBeforeContainerStart(TestOpportunisticContainerAllocatorAMService.java:330)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}
>  
> This problem is caused by that deducting resource is a little behind the 
> assertion. To solve this problem, It can sleep a while before this assertion 
> as below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8011) TestOpportunisticContainerAllocatorAMService#testContainerPromoteAndDemoteBeforeContainerStart fails sometimes in trunk

2018-03-08 Thread Tao Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391027#comment-16391027
 ] 

Tao Yang commented on YARN-8011:


Thanks [~cheersyang] for reviewing and committing!

> TestOpportunisticContainerAllocatorAMService#testContainerPromoteAndDemoteBeforeContainerStart
>  fails sometimes in trunk
> ---
>
> Key: YARN-8011
> URL: https://issues.apache.org/jira/browse/YARN-8011
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: YARN-8011.001.patch, YARN-8011.002.patch
>
>
> TestOpportunisticContainerAllocatorAMService#testContainerPromoteAndDemoteBeforeContainerStart
>  often pass, but the following errors sometimes occur:
> {noformat}
> java.lang.AssertionError: 
> Expected :15360
> Actual :14336
> 
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService.verifyMetrics(TestOpportunisticContainerAllocatorAMService.java:732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService.testContainerPromoteAndDemoteBeforeContainerStart(TestOpportunisticContainerAllocatorAMService.java:330)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}
>  
> This problem is caused by that deducting resource is a little behind the 
> assertion. To solve this problem, It can sleep a while before this assertion 
> as below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8011) TestOpportunisticContainerAllocatorAMService#testContainerPromoteAndDemoteBeforeContainerStart fails sometimes in trunk

2018-03-08 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391024#comment-16391024
 ] 

Weiwei Yang commented on YARN-8011:
---

Committed to trunk, thanks [~Tao Yang] for the contribution!

> TestOpportunisticContainerAllocatorAMService#testContainerPromoteAndDemoteBeforeContainerStart
>  fails sometimes in trunk
> ---
>
> Key: YARN-8011
> URL: https://issues.apache.org/jira/browse/YARN-8011
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: YARN-8011.001.patch, YARN-8011.002.patch
>
>
> TestOpportunisticContainerAllocatorAMService#testContainerPromoteAndDemoteBeforeContainerStart
>  often pass, but the following errors sometimes occur:
> {noformat}
> java.lang.AssertionError: 
> Expected :15360
> Actual :14336
> 
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService.verifyMetrics(TestOpportunisticContainerAllocatorAMService.java:732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService.testContainerPromoteAndDemoteBeforeContainerStart(TestOpportunisticContainerAllocatorAMService.java:330)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}
>  
> This problem is caused by that deducting resource is a little behind the 
> assertion. To solve this problem, It can sleep a while before this assertion 
> as below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7952) RM should be able to recover log aggregation status after restart/fail-over

2018-03-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391008#comment-16391008
 ] 

Xuan Gong commented on YARN-7952:
-

Thanks, [~leftnoteasy]

Addressed all of your comments. Also fix the testcase failures

> RM should be able to recover log aggregation status after restart/fail-over
> ---
>
> Key: YARN-7952
> URL: https://issues.apache.org/jira/browse/YARN-7952
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7952-poc.patch, YARN-7952.1.patch, 
> YARN-7952.2.patch, YARN-7952.3.patch, YARN-7952.3.patch, YARN-7952.5.patch, 
> YARN-7952.6.patch
>
>
> Right now, the NM would send its own log aggregation status to RM 
> periodically to RM. And RM would aggregate the status for each application, 
> but it will not generate the final status until a client call(from web ui or 
> cli) trigger it. But RM never persists the log aggregation status. So, when 
> RM restarts/fails over, the log aggregation status will become “NOT_STARTED”. 
> This is confusing, maybe we should change it to “NOT_AVAILABLE” (will create 
> a separate ticket for this). Anyway, we need to persist the log aggregation 
> status for the future use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7952) RM should be able to recover log aggregation status after restart/fail-over

2018-03-08 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-7952:

Attachment: YARN-7952.6.patch

> RM should be able to recover log aggregation status after restart/fail-over
> ---
>
> Key: YARN-7952
> URL: https://issues.apache.org/jira/browse/YARN-7952
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7952-poc.patch, YARN-7952.1.patch, 
> YARN-7952.2.patch, YARN-7952.3.patch, YARN-7952.3.patch, YARN-7952.5.patch, 
> YARN-7952.6.patch
>
>
> Right now, the NM would send its own log aggregation status to RM 
> periodically to RM. And RM would aggregate the status for each application, 
> but it will not generate the final status until a client call(from web ui or 
> cli) trigger it. But RM never persists the log aggregation status. So, when 
> RM restarts/fails over, the log aggregation status will become “NOT_STARTED”. 
> This is confusing, maybe we should change it to “NOT_AVAILABLE” (will create 
> a separate ticket for this). Anyway, we need to persist the log aggregation 
> status for the future use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8011) TestOpportunisticContainerAllocatorAMService#testContainerPromoteAndDemoteBeforeContainerStart fails sometimes in trunk

2018-03-08 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390983#comment-16390983
 ] 

Weiwei Yang commented on YARN-8011:
---

+1 to v2 patch, I will commit this shortly. Thanks [~Tao Yang].

> TestOpportunisticContainerAllocatorAMService#testContainerPromoteAndDemoteBeforeContainerStart
>  fails sometimes in trunk
> ---
>
> Key: YARN-8011
> URL: https://issues.apache.org/jira/browse/YARN-8011
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: YARN-8011.001.patch, YARN-8011.002.patch
>
>
> TestOpportunisticContainerAllocatorAMService#testContainerPromoteAndDemoteBeforeContainerStart
>  often pass, but the following errors sometimes occur:
> {noformat}
> java.lang.AssertionError: 
> Expected :15360
> Actual :14336
> 
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService.verifyMetrics(TestOpportunisticContainerAllocatorAMService.java:732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService.testContainerPromoteAndDemoteBeforeContainerStart(TestOpportunisticContainerAllocatorAMService.java:330)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}
>  
> This problem is caused by that deducting resource is a little behind the 
> assertion. To solve this problem, It can sleep a while before this assertion 
> as below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8011) TestOpportunisticContainerAllocatorAMService#testContainerPromoteAndDemoteBeforeContainerStart fails sometimes in trunk

2018-03-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390971#comment-16390971
 ] 

genericqa commented on YARN-8011:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 73m 
27s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}122m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8011 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913550/YARN-8011.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0358736bdcd0 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 
19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 583f459 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19922/testReport/ |
| Max. process+thread count | 854 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19922/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> 

[jira] [Updated] (YARN-8008) Admin command to manage global placement constraints

2018-03-08 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8008:
--
Attachment: YARN-8008.001.patch

> Admin command to manage global placement constraints
> 
>
> Key: YARN-8008
> URL: https://issues.apache.org/jira/browse/YARN-8008
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8008.001.patch
>
>
> Add command for admin to manager global placement constraints, such as add, 
> remove and list. This will be exposed via, for example
> {code}
> yarn rmadmin -placementConstraint [ -add -t  -c  | -remove -t 
>  | -list ]
> {code}
> expose to use this JIRA to add API/proto changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org