[jira] [Comment Edited] (YARN-5597) YARN Federation improvements

2018-09-06 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606719#comment-16606719
 ] 

Bibin A Chundatt edited comment on YARN-5597 at 9/7/18 5:43 AM:


[~subru]/[~elgoiri]

{quote}
We use the same ZK ensemble and connection string so we are not having issues 
here.
{quote}
In HA subcluster with same Zk for leader election,RM store and Federation Store 
+ kerberos shouldn't have any issue as per understanding.
But above topology could be load on Zk, since all subcluster RMs will write 
Store to single ZK ensemble.
Why not have separate conf for FederationStore connection string?

Mysql seems best fit now if zk security is required.

What is the clean up strategy for metadata ?.

In Federation store its not required to keep the apps list(router mapping to 
app) once the apps is flushed out from RM memory/store rt ??




was (Author: bibinchundatt):
[~subru]/[~elgoiri]

{quote}
We use the same ZK ensemble and connection string so we are not having issues 
here.
{quote}
In HA subcluster with same Zk for, leader election,RM store and Federation 
Store + kerberos shouldn't have any issue as per understanding.
Above topology could be load on Zk, since all subclusters will write Store to 
single ZK ensemble.
Why not have separate conf for FederaionStore connection string?

Mysql seems best fit now if zk security is required.

What is the clean up strategy for metadata ?.

In Federation store its not required to keep the apps list(router mapping to 
app) once the apps is flushed out from RM memory/store rt ??



> YARN Federation improvements
> 
>
> Key: YARN-5597
> URL: https://issues.apache.org/jira/browse/YARN-5597
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>Priority: Major
>
> This umbrella JIRA tracks set of improvements over the YARN Federation MVP 
> (YARN-2915)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5597) YARN Federation improvements

2018-09-06 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606719#comment-16606719
 ] 

Bibin A Chundatt commented on YARN-5597:


[~subru]/[~elgoiri]

{quote}
We use the same ZK ensemble and connection string so we are not having issues 
here.
{quote}
In HA subcluster with same Zk for, leader election,RM store and Federation 
Store + kerberos shouldn't have any issue as per understanding.
Above topology could be load on Zk, since all subclusters will write Store to 
single ZK ensemble.
Why not have separate conf for FederaionStore connection string?

Mysql seems best fit now if zk security is required.

What is the clean up strategy for metadata ?.

In Federation store its not required to keep the apps list(router mapping to 
app) once the apps is flushed out from RM memory/store rt ??



> YARN Federation improvements
> 
>
> Key: YARN-5597
> URL: https://issues.apache.org/jira/browse/YARN-5597
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>Priority: Major
>
> This umbrella JIRA tracks set of improvements over the YARN Federation MVP 
> (YARN-2915)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router

2018-09-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606694#comment-16606694
 ] 

Hadoop QA commented on YARN-8699:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 37s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 51s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
2s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
54s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
6s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
57s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
57s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}135m 48s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8699 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938748/YARN-8699.005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 36dbbdeb2b15 3.13.0-153-generic 

[jira] [Commented] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived

2018-09-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606686#comment-16606686
 ] 

Hadoop QA commented on YARN-8752:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
30m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 45m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8752 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938754/YARN-8752-1.patch |
| Optional Tests |  dupname  asflicense  mvnsite  |
| uname | Linux 967f155e9989 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 396ce7b |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 302 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21783/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> yarn-registry.md has wrong word ong-lived,it should be long-lived
> -
>
> Key: YARN-8752
> URL: https://issues.apache.org/jira/browse/YARN-8752
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.0
>Reporter: leiqiang
>Priority: Major
>  Labels: documentation
> Attachments: YARN-8752-1.patch
>
>
> In yarn-registry.md line 88,
> deploy {color:#FF}ong-lived{color} services instances, this word should 
> be {color:#FF}long-lived{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8718) Merge related work for YARN-3409

2018-09-06 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606673#comment-16606673
 ] 

Sunil Govindan commented on YARN-8718:
--

# ASF license issues are not due the patch. Its on MR code which this branch 
code didn't make any change to.
 # Test case failures are not related to branch code. 
TestTimelineClientV2Impl#testSyncCall will be tracked in separate Jira against 
trunk.
 # whitespace issue is in /bin/yarn. We added nodeattributes command line, and 
this is same like other command. Hence skipping this.
 # checkstyle errors are fixed as possible. Current issues are mostly on making 
existing method length under 150 line, access methods for private/protected etc.

 

> Merge related work for YARN-3409
> 
>
> Key: YARN-8718
> URL: https://issues.apache.org/jira/browse/YARN-8718
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil Govindan
>Priority: Major
> Attachments: YARN-3409.001.patch, YARN-3409.002.patch, 
> YARN-8718.003.patch, YARN-8718.004.patch, YARN-8718.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8717) set memory.limit_in_bytes when NodeManager starting

2018-09-06 Thread Jiandan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594858#comment-16594858
 ] 

Jiandan Yang  edited comment on YARN-8717 at 9/7/18 3:05 AM:
-

Hi [~cheersyang]
Thanks for watching.
We found NM was killed by OOM-killer.
conditions are as follows:
```
yarn.nodemanager.resource.memory.enforced=false
yarn.nodemanager.resource.memory-mb = 100G
Physical Memory of NM machine is 120G
NM has two container, each requests 40G memory, but actual each request 50G+
```
So we thought setting limit on the hireachy of hadoop-yarn


was (Author: yangjiandan):
Hi [~cheersyang]
Thanks for watching.
We found NM was killed by OOM-killer.
conditions are as follows:
```
yarn.nodemanager.resource.memory.enabled=false
yarn.nodemanager.resource.memory-mb = 100G
Physical Memory of NM machine is 120G
NM has two container, each requests 40G memory, but actual each request 50G+
```
So we thought setting limit on the hireachy of hadoop-yarn

> set memory.limit_in_bytes when NodeManager starting
> ---
>
> Key: YARN-8717
> URL: https://issues.apache.org/jira/browse/YARN-8717
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Major
>  Labels: cgroups
> Attachments: YARN-8717.001.patch
>
>
> CGroupsCpuResourceHandlerImpl sets cpu quota at hirarchy of hadoop-yarn  to 
> restrict total resource of cpu of NM when NM starting; 
> CGroupsMemoryResourceHandlerImpl also should set memory.limit_in_bytes at 
> hirachy of hadoop-yarn to control memory resource of NM



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived

2018-09-06 Thread leiqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leiqiang updated YARN-8752:
---
Attachment: (was: YARN-8752-1.patch)

> yarn-registry.md has wrong word ong-lived,it should be long-lived
> -
>
> Key: YARN-8752
> URL: https://issues.apache.org/jira/browse/YARN-8752
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.0
>Reporter: leiqiang
>Priority: Major
>  Labels: documentation
>
> In yarn-registry.md line 88,
> deploy {color:#FF}ong-lived{color} services instances, this word should 
> be {color:#FF}long-lived{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived

2018-09-06 Thread leiqiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leiqiang updated YARN-8752:
---
Attachment: YARN-8752-1.patch

> yarn-registry.md has wrong word ong-lived,it should be long-lived
> -
>
> Key: YARN-8752
> URL: https://issues.apache.org/jira/browse/YARN-8752
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.0
>Reporter: leiqiang
>Priority: Major
>  Labels: documentation
> Attachments: YARN-8752-1.patch
>
>
> In yarn-registry.md line 88,
> deploy {color:#FF}ong-lived{color} services instances, this word should 
> be {color:#FF}long-lived{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived

2018-09-06 Thread leiqiang (JIRA)
leiqiang created YARN-8752:
--

 Summary: yarn-registry.md has wrong word ong-lived,it should be 
long-lived
 Key: YARN-8752
 URL: https://issues.apache.org/jira/browse/YARN-8752
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.1.0
Reporter: leiqiang


In yarn-registry.md line 88,

deploy {color:#FF}ong-lived{color} services instances, this word should be 
{color:#FF}long-lived{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router

2018-09-06 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606627#comment-16606627
 ] 

Bibin A Chundatt edited comment on YARN-8699 at 9/7/18 1:59 AM:


Thank  you [~giovanni.fumarola] for review

Attached patch handling typo  fix too.


was (Author: bibinchundatt):
[~giovanni.fumarola]

Attached patch handling typo  fix too.

> Add Yarnclient#yarnclusterMetrics API implementation in router
> --
>
> Key: YARN-8699
> URL: https://issues.apache.org/jira/browse/YARN-8699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8699.001.patch, YARN-8699.002.patch, 
> YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch
>
>
> Implement YarnclusterMetrics API in FederationClientInterceptor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router

2018-09-06 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8699:
---
Attachment: YARN-8699.005.patch

> Add Yarnclient#yarnclusterMetrics API implementation in router
> --
>
> Key: YARN-8699
> URL: https://issues.apache.org/jira/browse/YARN-8699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8699.001.patch, YARN-8699.002.patch, 
> YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch
>
>
> Implement YarnclusterMetrics API in FederationClientInterceptor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router

2018-09-06 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606627#comment-16606627
 ] 

Bibin A Chundatt commented on YARN-8699:


[~giovanni.fumarola]

Attached patch handling typo  fix too.

> Add Yarnclient#yarnclusterMetrics API implementation in router
> --
>
> Key: YARN-8699
> URL: https://issues.apache.org/jira/browse/YARN-8699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8699.001.patch, YARN-8699.002.patch, 
> YARN-8699.003.patch, YARN-8699.004.patch, YARN-8699.005.patch
>
>
> Implement YarnclusterMetrics API in FederationClientInterceptor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606590#comment-16606590
 ] 

Hadoop QA commented on YARN-8658:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 
51s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
52s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
30s{color} | {color:red} hadoop-yarn-server-common in the patch failed. {color} 
|
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
21s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
29s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 48s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 1 new + 
1 unchanged - 0 fixed = 2 total (was 1) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
23s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
26s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 24s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8658 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938734/YARN-8658.04.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1b8ad07e56a9 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (YARN-5597) YARN Federation improvements

2018-09-06 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606535#comment-16606535
 ] 

Íñigo Goiri commented on YARN-5597:
---

We are currently testing federation using ZK for both the FederationStateStore 
and the RMStateStore.
We use the same ZK ensemble and connection string so we are not having issues 
here.
However, we haven't tested it with Kerberos yet.

> YARN Federation improvements
> 
>
> Key: YARN-5597
> URL: https://issues.apache.org/jira/browse/YARN-5597
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>Priority: Major
>
> This umbrella JIRA tracks set of improvements over the YARN Federation MVP 
> (YARN-2915)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-06 Thread Young Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Young Chen updated YARN-8658:
-
Attachment: YARN-8658.04.patch

> Metrics for AMRMClientRelayer inside FederationInterceptor
> --
>
> Key: YARN-8658
> URL: https://issues.apache.org/jira/browse/YARN-8658
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Young Chen
>Priority: Major
> Attachments: YARN-8658.01.patch, YARN-8658.02.patch, 
> YARN-8658.03.patch, YARN-8658.04.patch
>
>
> AMRMClientRelayer (YARN-7900) is introduced for stateful 
> FederationInterceptor (YARN-7899), to keep track of all pending requests sent 
> to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to 
> show the state of things in FederationInterceptor. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606524#comment-16606524
 ] 

Hadoop QA commented on YARN-8751:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 27m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  0s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
54s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 98m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8751 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938719/YARN-8751.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f1b1de2e20ea 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / eca1a4b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21780/testReport/ |
| Max. process+thread count | 407 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21780/console |
| Powered by | Apache Yetus 0.8.0   

[jira] [Commented] (YARN-7794) SLSRunner is not loading timeline service jars causing failure

2018-09-06 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606480#comment-16606480
 ] 

Yufei Gu commented on YARN-7794:


[~jhung], the patch looks good to me.

> SLSRunner is not loading timeline service jars causing failure
> --
>
> Key: YARN-7794
> URL: https://issues.apache.org/jira/browse/YARN-7794
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 3.1.0
>Reporter: Sunil Govindan
>Assignee: Yufei Gu
>Priority: Blocker
> Fix For: 3.1.0
>
> Attachments: YARN-7794-branch-2.001.patch, YARN-7794.001.patch
>
>
> {code:java}
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>         ... 13 more
> Exception in thread "pool-2-thread-390" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollector
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:443)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:321)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:641){code}
> We are getting this error while running SLS. new patch of timelineservice 
> under share/hadoop/yarn is not loaded in SLS jvm (verified from slsrunner 
> classpath)
> cc/ [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8045) Reduce log output from container status calls

2018-09-06 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606463#comment-16606463
 ] 

Shane Kumpf commented on YARN-8045:
---

Good call, that sounds good to me.

> Reduce log output from container status calls
> -
>
> Key: YARN-8045
> URL: https://issues.apache.org/jira/browse/YARN-8045
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shane Kumpf
>Priority: Major
>
> Each time a container's status is returned a log entry is produced in the NM 
> from {{ContainerManagerImpl}}. The container status includes the diagnostics 
> field for the container. If the diagnostics field contains an exception, it 
> can appear as if the exception is logged repeatedly every second. The 
> diagnostics message can also span many lines, which puts pressure on the logs 
> and makes it harder to read.
> For example:
> {code}
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_e01_1521323860653_0001_01_05
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: 
> RUNNING, Capability: , Diagnostics: [2018-03-17 
> 22:01:00.675]Exception from container-launch.
> Container id: container_e01_1521323860653_0001_01_05
> Exit code: -1
> Exception message: 
> Shell ouput: 
> [2018-03-17 22:01:00.750]Diagnostic message from attempt :
> [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1.
> , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8045) Reduce log output from container status calls

2018-09-06 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606410#comment-16606410
 ] 

Craig Condit commented on YARN-8045:


[~shaneku...@gmail.com], as a compromise, I think we can maintain compatibility 
by adding a bit of logic to the logging – still log message at INFO, but 
replace diagnostic content with '...' if DEBUG logging is not enabled. This 
shouldn't trip up parsers and would still give administrators the ability to 
turn it on if necessary.

> Reduce log output from container status calls
> -
>
> Key: YARN-8045
> URL: https://issues.apache.org/jira/browse/YARN-8045
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shane Kumpf
>Priority: Major
>
> Each time a container's status is returned a log entry is produced in the NM 
> from {{ContainerManagerImpl}}. The container status includes the diagnostics 
> field for the container. If the diagnostics field contains an exception, it 
> can appear as if the exception is logged repeatedly every second. The 
> diagnostics message can also span many lines, which puts pressure on the logs 
> and makes it harder to read.
> For example:
> {code}
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_e01_1521323860653_0001_01_05
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: 
> RUNNING, Capability: , Diagnostics: [2018-03-17 
> 22:01:00.675]Exception from container-launch.
> Container id: container_e01_1521323860653_0001_01_05
> Exit code: -1
> Exception message: 
> Shell ouput: 
> [2018-03-17 22:01:00.750]Diagnostic message from attempt :
> [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1.
> , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8045) Reduce log output from container status calls

2018-09-06 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606394#comment-16606394
 ] 

Shane Kumpf commented on YARN-8045:
---

Thanks for the proposal [~ccondit-target]. Moving the meat of the diagnostics 
field to DEBUG makes sense to me and would meet the requirement with minimal 
change.

My one concern is how that might impact compatibility. HADOOP-13714 recently 
updated the [compatibility 
guide|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md#log-output],
 which includes logs. Given that logs are considered Unstable, I think we are 
safe, but there is a note about ensuring existing parsers don't break. Can we 
consider the parser requirement in moving this entry to DEBUG?

> Reduce log output from container status calls
> -
>
> Key: YARN-8045
> URL: https://issues.apache.org/jira/browse/YARN-8045
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shane Kumpf
>Priority: Major
>
> Each time a container's status is returned a log entry is produced in the NM 
> from {{ContainerManagerImpl}}. The container status includes the diagnostics 
> field for the container. If the diagnostics field contains an exception, it 
> can appear as if the exception is logged repeatedly every second. The 
> diagnostics message can also span many lines, which puts pressure on the logs 
> and makes it harder to read.
> For example:
> {code}
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_e01_1521323860653_0001_01_05
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: 
> RUNNING, Capability: , Diagnostics: [2018-03-17 
> 22:01:00.675]Exception from container-launch.
> Container id: container_e01_1521323860653_0001_01_05
> Exit code: -1
> Exception message: 
> Shell ouput: 
> [2018-03-17 22:01:00.750]Diagnostic message from attempt :
> [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1.
> , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8666) [UI2] Remove application tab from Yarn Queue Page

2018-09-06 Thread Yesha Vora (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606329#comment-16606329
 ] 

Yesha Vora commented on YARN-8666:
--

Patch updated to remove "Applications" from Queue page. The screenshot after 
removing "Applications" is attached. 

> [UI2] Remove application tab from Yarn Queue Page
> -
>
> Key: YARN-8666
> URL: https://issues.apache.org/jira/browse/YARN-8666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-08-14 at 3.43.18 PM.png, Screen Shot 
> 2018-09-06 at 12.50.14 PM.png, YARN-8666.001.patch
>
>
> Yarn UI2 Queue page puts Application button. This button does not redirect to 
> any other page. In addition to that running application table is also 
> available on same page. 
> Thus, there is no need to have a button for application in Queue page. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8666) [UI2] Remove application tab from Yarn Queue Page

2018-09-06 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8666:
-
Attachment: YARN-8666.001.patch

> [UI2] Remove application tab from Yarn Queue Page
> -
>
> Key: YARN-8666
> URL: https://issues.apache.org/jira/browse/YARN-8666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-08-14 at 3.43.18 PM.png, Screen Shot 
> 2018-09-06 at 12.50.14 PM.png, YARN-8666.001.patch
>
>
> Yarn UI2 Queue page puts Application button. This button does not redirect to 
> any other page. In addition to that running application table is also 
> available on same page. 
> Thus, there is no need to have a button for application in Queue page. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8666) [UI2] Remove application tab from Yarn Queue Page

2018-09-06 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8666:
-
Attachment: Screen Shot 2018-09-06 at 12.50.14 PM.png

> [UI2] Remove application tab from Yarn Queue Page
> -
>
> Key: YARN-8666
> URL: https://issues.apache.org/jira/browse/YARN-8666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-08-14 at 3.43.18 PM.png, Screen Shot 
> 2018-09-06 at 12.50.14 PM.png, YARN-8666.001.patch
>
>
> Yarn UI2 Queue page puts Application button. This button does not redirect to 
> any other page. In addition to that running application table is also 
> available on same page. 
> Thus, there is no need to have a button for application in Queue page. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-3879) [Storage implementation] Create HDFS backing storage implementation for ATS reads

2018-09-06 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606294#comment-16606294
 ] 

Vrushali C edited comment on YARN-3879 at 9/6/18 7:27 PM:
--

Thanks [~abmodi] ! Patch looks good overall. If you are going to update it, 
then I would suggest using File.separator instead of an actual "/" . If the 
patch does not need any updating, then let's leave it. 
Overall +1 on the patch 




was (Author: vrushalic):
Patch looks good overall. If you are going to update it, then I would suggest 
using File.separator instead of an actual "/" . If the patch does not need any 
updating, then let's leave it. 
Overall +1 on the patch 

> [Storage implementation] Create HDFS backing storage implementation for ATS 
> reads
> -
>
> Key: YARN-3879
> URL: https://issues.apache.org/jira/browse/YARN-3879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Tsuyoshi Ozawa
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: YARN-5355
> Attachments: YARN-3879-YARN-7055.001.patch, YARN-3879.001.patch, 
> YARN-3879.002.patch, YARN-3879.003.patch, YARN-3879.004.patch
>
>
> Reader version of YARN-3841



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3879) [Storage implementation] Create HDFS backing storage implementation for ATS reads

2018-09-06 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606294#comment-16606294
 ] 

Vrushali C commented on YARN-3879:
--

Patch looks good overall. If you are going to update it, then I would suggest 
using File.separator instead of an actual "/" . If the patch does not need any 
updating, then let's leave it. 
Overall +1 on the patch 

> [Storage implementation] Create HDFS backing storage implementation for ATS 
> reads
> -
>
> Key: YARN-3879
> URL: https://issues.apache.org/jira/browse/YARN-3879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Tsuyoshi Ozawa
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: YARN-5355
> Attachments: YARN-3879-YARN-7055.001.patch, YARN-3879.001.patch, 
> YARN-3879.002.patch, YARN-3879.003.patch, YARN-3879.004.patch
>
>
> Reader version of YARN-3841



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-3841) [Storage implementation] Adding retry semantics to HDFS backing storage

2018-09-06 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606291#comment-16606291
 ] 

Vrushali C edited comment on YARN-3841 at 9/6/18 7:25 PM:
--

Thanks [~abmodi] ! Patch looks good overall. A couple of minor comments:

- let's use File.separator instead of an actual "/" 

- For FileSystemTimelineWriterImpl.java, I think we may not want to do a   
fs.close(); at line 261. This will close the FileSystem handle for all threads 
in that process since this is a static instance. 

-  For line281, instead of 
 {{ LOG.info("Retrying operation on FS. Retry no. " + retry); }}
we could perhaps update it to 
 {{ "Will retry operation on  FS. Retry no. " + retry + " after sleeping for " 
+ fsRetryInterval + " seconds" ); }} 

Will be a better indication of the sleep & retry. What do you think? 




was (Author: vrushalic):
Thanks [~abmodi] ! Patch looks good overall. A couple of minor comments:

- let's use File.separator instead of an actual "/" 

- For FileSystemTimelineWriterImpl.java, I think we may not want to do a   
fs.close(); at line 261. This will close the FileSystem handle for all threads 
in that process since this is a static instance. 

-  For line281, instead of 
 {{monospaced}} LOG.info("Retrying operation on FS. Retry no. " + retry); 
{{monospaced}}
we could perhaps update it to 
 {{monospaced}}  "Will retry operation on  FS. Retry no. " + retry + " after 
sleeping for " + fsRetryInterval + " seconds" );  {{monospaced}} 

Will be a better indication of the sleep & retry. What do you think? 



> [Storage implementation] Adding retry semantics to HDFS backing storage
> ---
>
> Key: YARN-3841
> URL: https://issues.apache.org/jira/browse/YARN-3841
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Tsuyoshi Ozawa
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: YARN-5355
> Attachments: YARN-3841-YARN-7055.002.patch, YARN-3841.001.patch, 
> YARN-3841.002.patch, YARN-3841.003.patch, YARN-3841.004.patch
>
>
> HDFS backing storage is useful for following scenarios.
> 1. For Hadoop clusters which don't run HBase.
> 2. For fallback from HBase when HBase cluster is temporary unavailable. 
> Quoting ATS design document of YARN-2928:
> {quote}
> In the case the HBase
> storage is not available, the plugin should buffer the writes temporarily 
> (e.g. HDFS), and flush
> them once the storage comes back online. Reading and writing to hdfs as the 
> the backup storage
> could potentially use the HDFS writer plugin unless the complexity of 
> generalizing the HDFS
> writer plugin for this purpose exceeds the benefits of reusing it here.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-3841) [Storage implementation] Adding retry semantics to HDFS backing storage

2018-09-06 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606291#comment-16606291
 ] 

Vrushali C edited comment on YARN-3841 at 9/6/18 7:25 PM:
--

Thanks [~abmodi] ! Patch looks good overall. A couple of minor comments:

- let's use File.separator instead of an actual "/" 

- For FileSystemTimelineWriterImpl.java, I think we may not want to do a   
fs.close(); at line 261. This will close the FileSystem handle for all threads 
in that process since this is a static instance. 

-  For line281, instead of 
 {{LOG.info("Retrying operation on FS. Retry no. " + retry);}}
we could perhaps update it to 
 {{"Will retry operation on  FS. Retry no. " + retry + " after sleeping for " + 
fsRetryInterval + " seconds" );}} 

Will be a better indication of the sleep & retry. What do you think? 




was (Author: vrushalic):
Thanks [~abmodi] ! Patch looks good overall. A couple of minor comments:

- let's use File.separator instead of an actual "/" 

- For FileSystemTimelineWriterImpl.java, I think we may not want to do a   
fs.close(); at line 261. This will close the FileSystem handle for all threads 
in that process since this is a static instance. 

-  For line281, instead of 
 {{ LOG.info("Retrying operation on FS. Retry no. " + retry); }}
we could perhaps update it to 
 {{ "Will retry operation on  FS. Retry no. " + retry + " after sleeping for " 
+ fsRetryInterval + " seconds" ); }} 

Will be a better indication of the sleep & retry. What do you think? 



> [Storage implementation] Adding retry semantics to HDFS backing storage
> ---
>
> Key: YARN-3841
> URL: https://issues.apache.org/jira/browse/YARN-3841
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Tsuyoshi Ozawa
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: YARN-5355
> Attachments: YARN-3841-YARN-7055.002.patch, YARN-3841.001.patch, 
> YARN-3841.002.patch, YARN-3841.003.patch, YARN-3841.004.patch
>
>
> HDFS backing storage is useful for following scenarios.
> 1. For Hadoop clusters which don't run HBase.
> 2. For fallback from HBase when HBase cluster is temporary unavailable. 
> Quoting ATS design document of YARN-2928:
> {quote}
> In the case the HBase
> storage is not available, the plugin should buffer the writes temporarily 
> (e.g. HDFS), and flush
> them once the storage comes back online. Reading and writing to hdfs as the 
> the backup storage
> could potentially use the HDFS writer plugin unless the complexity of 
> generalizing the HDFS
> writer plugin for this purpose exceeds the benefits of reusing it here.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3841) [Storage implementation] Adding retry semantics to HDFS backing storage

2018-09-06 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606291#comment-16606291
 ] 

Vrushali C commented on YARN-3841:
--

Thanks [~abmodi] ! Patch looks good overall. A couple of minor comments:

- let's use File.separator instead of an actual "/" 

- For FileSystemTimelineWriterImpl.java, I think we may not want to do a   
fs.close(); at line 261. This will close the FileSystem handle for all threads 
in that process since this is a static instance. 

-  For line281, instead of 
 {{monospaced}} LOG.info("Retrying operation on FS. Retry no. " + retry); 
{{monospaced}}
we could perhaps update it to 
 {{monospaced}}  "Will retry operation on  FS. Retry no. " + retry + " after 
sleeping for " + fsRetryInterval + " seconds" );  {{monospaced}} 

Will be a better indication of the sleep & retry. What do you think? 



> [Storage implementation] Adding retry semantics to HDFS backing storage
> ---
>
> Key: YARN-3841
> URL: https://issues.apache.org/jira/browse/YARN-3841
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Tsuyoshi Ozawa
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: YARN-5355
> Attachments: YARN-3841-YARN-7055.002.patch, YARN-3841.001.patch, 
> YARN-3841.002.patch, YARN-3841.003.patch, YARN-3841.004.patch
>
>
> HDFS backing storage is useful for following scenarios.
> 1. For Hadoop clusters which don't run HBase.
> 2. For fallback from HBase when HBase cluster is temporary unavailable. 
> Quoting ATS design document of YARN-2928:
> {quote}
> In the case the HBase
> storage is not available, the plugin should buffer the writes temporarily 
> (e.g. HDFS), and flush
> them once the storage comes back online. Reading and writing to hdfs as the 
> the backup storage
> could potentially use the HDFS writer plugin unless the complexity of 
> generalizing the HDFS
> writer plugin for this purpose exceeds the benefits of reusing it here.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606270#comment-16606270
 ] 

Craig Condit edited comment on YARN-8751 at 9/6/18 7:02 PM:


[~shaneku...@gmail.com], looks like we have consensus on the approach. I can 
take this one.


was (Author: ccondit-target):
[~shaneku...@gmail.com], looks like have consensus on the approach. I can take 
this one.

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Critical
>  Labels: Docker
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating script paths...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating local dirs...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Path 
> /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05
>  has permission 774 but needs per
> mission 750.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> 

[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606270#comment-16606270
 ] 

Craig Condit commented on YARN-8751:


[~shaneku...@gmail.com], looks like have consensus on the approach. I can take 
this one.

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Critical
>  Labels: Docker
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating script paths...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating local dirs...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Path 
> /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05
>  has permission 774 but needs per
> mission 750.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null)
> 2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch 
> (ContainerRelaunch.java:call(129)) - 

[jira] [Assigned] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Craig Condit (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit reassigned YARN-8751:
--

Assignee: Craig Condit

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Craig Condit
>Priority: Critical
>  Labels: Docker
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating script paths...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating local dirs...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Path 
> /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05
>  has permission 774 but needs per
> mission 750.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null)
> 2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch 
> (ContainerRelaunch.java:call(129)) - Failed to launch container due to 
> configuration error.
> 

[jira] [Commented] (YARN-8718) Merge related work for YARN-3409

2018-09-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606260#comment-16606260
 ] 

Hadoop QA commented on YARN-8718:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 1s{color} | {color:green} The patch appears to include 31 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m  
0s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  4m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 16m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m 
34s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 15m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
26s{color} | {color:green} root generated 0 new + 1453 unchanged - 1 fixed = 
1453 total (was 1454) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 53s{color} | {color:orange} root: The patch generated 14 new + 1590 
unchanged - 54 fixed = 1604 total (was 1644) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 13m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
28s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
14s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 2 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 12m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
18s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
48s{color} | {color:green} hadoop-common 

[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606189#comment-16606189
 ] 

Shane Kumpf commented on YARN-8751:
---

Thanks for the feedback and suggestions everyone. I think the issue is most 
likely to happen under relaunch conditions with a poorly behaving container (as 
noted by [~eyang]). Relaunch (afaik) is only used by YARN Services today, so 
the impact may be isolated. Having said that, based on the conversation here, 
it does appear there are other non-fatal cases that could trigger these errors, 
so I'm +1 on the proposal from [~jlowe] affecting both launch and relaunch.

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Critical
>  Labels: Docker
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating script paths...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating local dirs...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Path 
> 

[jira] [Updated] (YARN-8200) Backport resource types/GPU features to branch-2

2018-09-06 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8200:

Attachment: YARN-8200-branch-2.001.patch

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-8200-branch-2.001.patch, 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-09-06 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606091#comment-16606091
 ] 

Jonathan Hung commented on YARN-8200:
-

Rebased YARN-8200 on branch-2. Attached the full diff between branch-2 and 
YARN-8200 (001)

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-8200-branch-2.001.patch, 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606089#comment-16606089
 ] 

Eric Yang commented on YARN-8751:
-

+1 with [~jlowe]'s proposal that only INVALID_CONTAINER_EXEC_PERMISSIONS and 
INVALID_CONFIG_FILE throws ConfigurationException.  The other exit code are 
non-fatal and need to be kept at best effort of retries even when system is 
running with unfavorable conditions.

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Critical
>  Labels: Docker
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating script paths...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating local dirs...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Path 
> /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05
>  has permission 774 but needs per
> mission 750.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> 

[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606007#comment-16606007
 ] 

Craig Condit commented on YARN-8751:


Each of these error codes could have any number of root causes ranging from 
transient to task-specific, disk-specific, node-specific, or cluster-level. 
Trying to do root cause analysis of OS-level failures in code isn't really 
practical. No two environments are alike and it's going to be very difficult to 
set a policy which makes sense for all clusters. This is where things like 
admin-provided health check scripts come into play. These can check things like 
disks available, disks non-full, permissions (at top level dirs) set correctly, 
etc. That said, I think we should have defaults which cause the least amount of 
pain in the majority of cases. It seems to me that in most cases, it's far more 
likely to be a transient or per-disk issue causing these failures than a global 
misconfiguration, so not failing the NM makes sense.

As a way to address detection of the specific issue mentioned in this JIRA, 
top-level permissions on NM-controlled dirs could be validated on startup (if 
they aren't already) and cause a NM failure at that point (or at least consider 
the specific disk bad). This would cause fail-fast behavior for something that 
is clearly configured wrong globally. it would also make these issues occuring 
at a container level far more likely to be transient or task/app-specific.

 

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Critical
>  Labels: Docker
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> 

[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606003#comment-16606003
 ] 

Eric Yang commented on YARN-8751:
-

[~shaneku...@gmail.com] I believe the COULD_NOT_CREATE_WORK_DIRECTORIES exit 
code needs to happen on all disks before the option is exhausted.  Introduction 
of relaunch may single out a single working directory, and report a false 
positive response while the system may have option to fall back to create new 
working directory on other disks to move forward.  I am not sure if the test 
system has more than one local disks.  If it only had one disk, it may appear 
this single container crashes the node manager.  If relaunch doesn't retry 
other disks, then it is a bug to change container-executor logic to detect such 
case and create working directory on other disks.  This is similar to fault 
tolerance design in HDFS, relaunch is best effort to reuse the same working 
directory, but use other data directory, if the current one has turned bad.

Let's look at the problem from a different angles, the container is doing 
destructive operation to working directory and knock out all disks by abusing 
relaunch.  This looks more like a deliberate attempt to sabotage the system.  
In this case, it is really system administrator's responsibility to disallow 
such badly behaved user/image to grant them privileged container.  This is same 
as saying, don't hand them a chainsaw, if you know they are irresponsible 
individuals.  There is little that can be done to protect irresponsible 
individuals from themselves.  You can only protect them by not giving them too 
much power.  Disable write mount for privileged container is the wrong option 
because there are real program that can run multi-users container that depends 
on privileged container feature.  If the badly behaved program is a QA test, 
then we may need to hand wave that we hand you a chainsaw, read the 
instructions and be careful with it.

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Critical
>  Labels: Docker
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> 

[jira] [Commented] (YARN-8680) YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate

2018-09-06 Thread Pradeep Ambati (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605998#comment-16605998
 ] 

Pradeep Ambati commented on YARN-8680:
--

Thanks for the review! [~jlowe]

I have addressed all the issues you raised in the latest patch review.

> YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate
> -
>
> Key: YARN-8680
> URL: https://issues.apache.org/jira/browse/YARN-8680
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Pradeep Ambati
>Assignee: Pradeep Ambati
>Priority: Critical
> Attachments: YARN-8680.00.patch, YARN-8680.01.patch, 
> YARN-8680.02.patch, YARN-8680.03.patch
>
>
> Similar to YARN-8242, implement iterable abstraction for 
> LocalResourceTrackerState to load completed and in progress resources when 
> needed rather than loading them all at a time for a respective state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8680) YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate

2018-09-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605993#comment-16605993
 ] 

Hadoop QA commented on YARN-8680:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 237 unchanged - 5 fixed = 237 total (was 242) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
45s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8680 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938657/YARN-8680.03.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 15efbab9fa02 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b6c543f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21778/testReport/ |
| Max. process+thread count | 468 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21778/console |
| Powered by | Apache Yetus 

[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605969#comment-16605969
 ] 

Eric Badger commented on YARN-8751:
---

I agree that we shouldn't kill the NM because of something like bad permissions 
that only affects a single job. If that is possible, then a user could pretty 
easily bring down the entire cluster, which is double plus ungood. However, it 
would also be nice to still be able to mark the node bad in cases where things 
are really wrong and will affect all jobs. Just thinking out loud here, but if 
all of the disks are 100% full, the NM is going to fail every container that 
runs on it. Yes, NM blacklisting will help, but that has to be re-learned for 
each application (afaik). It would be nice to detect if the error is actually 
fatal to all jobs or not. And I'm not sure that's an easy thing to do when it 
comes to creating directories. Maybe someone else has an idea? 

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Critical
>  Labels: Docker
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) 

[jira] [Updated] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Eric Badger (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8751:
--
Labels: Docker  (was: )

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Critical
>  Labels: Docker
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating script paths...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating local dirs...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Path 
> /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05
>  has permission 774 but needs per
> mission 750.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null)
> 2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch 
> (ContainerRelaunch.java:call(129)) - Failed to launch container due to 
> configuration error.
> 

[jira] [Commented] (YARN-8045) Reduce log output from container status calls

2018-09-06 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605901#comment-16605901
 ] 

Craig Condit commented on YARN-8045:


Might make the code (slightly) more complex, but we could output diagnostics 
only at DEBUG level and the rest of the message at INFO.

> Reduce log output from container status calls
> -
>
> Key: YARN-8045
> URL: https://issues.apache.org/jira/browse/YARN-8045
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Shane Kumpf
>Priority: Major
>
> Each time a container's status is returned a log entry is produced in the NM 
> from {{ContainerManagerImpl}}. The container status includes the diagnostics 
> field for the container. If the diagnostics field contains an exception, it 
> can appear as if the exception is logged repeatedly every second. The 
> diagnostics message can also span many lines, which puts pressure on the logs 
> and makes it harder to read.
> For example:
> {code}
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_e01_1521323860653_0001_01_05
> 2018-03-17 22:01:11,632 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_e01_1521323860653_0001_01_05, ExecutionType: GUARANTEED, State: 
> RUNNING, Capability: , Diagnostics: [2018-03-17 
> 22:01:00.675]Exception from container-launch.
> Container id: container_e01_1521323860653_0001_01_05
> Exit code: -1
> Exception message: 
> Shell ouput: 
> [2018-03-17 22:01:00.750]Diagnostic message from attempt :
> [2018-03-17 22:01:00.750]Container exited with a non-zero exit code -1.
> , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605895#comment-16605895
 ] 

Craig Condit commented on YARN-8751:


{quote}[~jlowe] : So my vote is keep INVALID_CONTAINER_EXEC_PERMISSIONS and 
INVALID_CONFIG_FILE fatal but the others should only fail the single container 
launch rather than the whole NM process.
{quote}
Agreed. The remainder of the exit codes could be caused by any number of 
things, such as disk failure, which you point out. Even if the problem were to 
be caused by something more systemic, NM blacklisting should kick in pretty 
quickly as tasks fail. +1 on making this non-fatal. Additionally, we may want 
to consider updating the diagnostic message returned in the following {{else}} 
clause to contain the exit code enum name as well as the number – this would 
seem to make diagnosing problems much easier for both users and administrators.

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Critical
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating script paths...
> 2018-08-31 21:07:22,365 

[jira] [Updated] (YARN-8680) YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate

2018-09-06 Thread Pradeep Ambati (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Ambati updated YARN-8680:
-
Attachment: YARN-8680.03.patch

> YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate
> -
>
> Key: YARN-8680
> URL: https://issues.apache.org/jira/browse/YARN-8680
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Pradeep Ambati
>Assignee: Pradeep Ambati
>Priority: Critical
> Attachments: YARN-8680.00.patch, YARN-8680.01.patch, 
> YARN-8680.02.patch, YARN-8680.03.patch
>
>
> Similar to YARN-8242, implement iterable abstraction for 
> LocalResourceTrackerState to load completed and in progress resources when 
> needed rather than loading them all at a time for a respective state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Billie Rinaldi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8751:
-
Priority: Critical  (was: Major)

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Critical
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating script paths...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating local dirs...
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Path 
> /grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05
>  has permission 774 but needs per
> mission 750.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null)
> 2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch 
> (ContainerRelaunch.java:call(129)) - Failed to launch container due to 
> configuration error.
> org.apache.hadoop.yarn.exceptions.ConfigurationException: 

[jira] [Commented] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605803#comment-16605803
 ] 

Jason Lowe commented on YARN-8751:
--

A bad container executor or config file is pretty catastrophic since the NM 
can't control anything at that point, including the inability to even cleanup 
containers when it shuts down.  However the other errors are specific to 
setting up an individual container and should not bring down the NM.  If a disk 
goes bad and the container executor can't create one of the directories then 
this should not be a fatal error to the NM, just a fatal error to that 
container launch.  Otherwise a single disk failure can bring down the NM if the 
container executor discovers it before the NM disk checker does.

So my vote is keep INVALID_CONTAINER_EXEC_PERMISSIONS and INVALID_CONFIG_FILE 
fatal but the others should only fail the single container launch rather than 
the whole NM process.

> Container-executor permission check errors cause the NM to be marked unhealthy
> --
>
> Key: YARN-8751
> URL: https://issues.apache.org/jira/browse/YARN-8751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Priority: Major
>
> {{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
> NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
> {{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
> occurs based on the exit code returned by container-executor, and 7 different 
> exit codes cause the NM to be marked UNHEALTHY.
> {code:java}
> if (exitCode ==
> ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
> exitCode ==
> ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
> exitCode ==
> ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
>   throw new ConfigurationException(
>   "Linux Container Executor reached unrecoverable exception", e);{code}
> I can understand why these are treated as fatal with the existing process 
> container model. However, with privileged Docker containers this may be too 
> harsh, as Privileged Docker containers don't guarantee the user's identity 
> will be propagated into the container, so these mismatches can occur. Outside 
> of privileged containers, an application may inadvertently change the 
> permissions on one of these directories, triggering this condition.
> In our case, a container changed the "appcache//" 
> directory permissions to 774. Some time later, the process in the container 
> died and the Retry Policy kicked in to RELAUNCH the container. When the 
> RELAUNCH occurred, container-executor checked the permissions of the 
> "appcache//" directory (the existing workdir is retained 
> for RELAUNCH) and returned exit code 35. Exit code 35 is 
> COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
> containers running on that node, when really only this container would have 
> been impacted.
> {code:java}
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e15_1535130383425_0085_01_05
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 35
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch 
> container failed
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: Could not 
> create container dirsCould not create local files and directories 5 6
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is user
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
> 2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Creating script 

[jira] [Updated] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader

2018-09-06 Thread Sushil Ks (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushil Ks updated YARN-8270:

Description: This Jira is for emitting JMX Metrics for ATS v2 Timeline 
Collector and Timeline Reader, basically for Timeline Collector it tries to 
capture success and failure latencies for *putEntities* and *putEntitiesAsync*  
from *TimelineCollectorWebService* , similarly all the API's success and 
failure latencies for fetching TimelineEntities from 
*TimelineReaderWebServices*. This would actually help in monitoring and 
measuring performance for ATSv2 at scale.  (was: This Jira is for emitting JMX 
Metrics for ATS v2 Timeline Collector and Timeline Reader, basically for 
Timeline Collector it tries to capture success, failure and latencies for 
*putEntities* and *putEntitiesAsync*  from *TimelineCollectorWebService* and 
all the API's success, failure and latencies for fetching TimelineEntities from 
*TimelineReaderWebServices*. This would actually help in monitoring and 
measuring performance for ATSv2 at scale.)

> Adding JMX Metrics for Timeline Collector and Reader
> 
>
> Key: YARN-8270
> URL: https://issues.apache.org/jira/browse/YARN-8270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2, timelineserver
>Reporter: Sushil Ks
>Assignee: Sushil Ks
>Priority: Major
> Attachments: YARN-8270.001.patch, YARN-8270.002.patch
>
>
> This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and 
> Timeline Reader, basically for Timeline Collector it tries to capture success 
> and failure latencies for *putEntities* and *putEntitiesAsync*  from 
> *TimelineCollectorWebService* , similarly all the API's success and failure 
> latencies for fetching TimelineEntities from *TimelineReaderWebServices*. 
> This would actually help in monitoring and measuring performance for ATSv2 at 
> scale.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8751:
--
Description: 
{{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
{{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
occurs based on the exit code returned by container-executor, and 7 different 
exit codes cause the NM to be marked UNHEALTHY.
{code:java}
if (exitCode ==
ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
exitCode ==
ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
  throw new ConfigurationException(
  "Linux Container Executor reached unrecoverable exception", e);{code}
I can understand why these are treated as fatal with the existing process 
container model. However, with privileged Docker containers this may be too 
harsh, as Privileged Docker containers don't guarantee the user's identity will 
be propagated into the container, so these mismatches can occur. Outside of 
privileged containers, an application may inadvertently change the permissions 
on one of these directories, triggering this condition.

In our case, a container changed the "appcache//" directory 
permissions to 774. Some time later, the process in the container died and the 
Retry Policy kicked in to RELAUNCH the container. When the RELAUNCH occurred, 
container-executor checked the permissions of the 
"appcache//" directory (the existing workdir is retained 
for RELAUNCH) and returned exit code 35. Exit code 35 is 
COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
containers running on that node, when really only this container would have 
been impacted.
{code:java}
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Container id: 
container_e15_1535130383425_0085_01_05
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Exit code: 35
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch container 
failed
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Shell error output: Could not create 
container dirsCould not create local files and directories 5 6
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) -
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Shell output: main : command provided 
4
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - main : run as user is user
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Creating script paths...
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Creating local dirs...
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Path 
/grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05
 has permission 774 but needs per
mission 750.
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null)
2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch 
(ContainerRelaunch.java:call(129)) - Failed to launch container due to 
configuration error.
org.apache.hadoop.yarn.exceptions.ConfigurationException: Linux Container 
Executor reached unrecoverable exception
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleExitCode(LinuxContainerExecutor.java:633)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:573)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
at 

[jira] [Created] (YARN-8751) Container-executor permission check errors cause the NM to be marked unhealthy

2018-09-06 Thread Shane Kumpf (JIRA)
Shane Kumpf created YARN-8751:
-

 Summary: Container-executor permission check errors cause the NM 
to be marked unhealthy
 Key: YARN-8751
 URL: https://issues.apache.org/jira/browse/YARN-8751
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Shane Kumpf


{{ContainerLaunch}} (and {{ContainerRelaunch}}) contains logic to mark a 
NodeManager as UNHEALTHY if a {{ConfigurationException}} is thrown by 
{{ContainerLaunch#launchContainer}} (or relaunchContainer). The exception 
occurs based on the exit code returned by container-executor, and 7 different 
exit codes cause the NM to be marked UNHEALTHY.
{code:java}
if (exitCode ==
ExitCode.INVALID_CONTAINER_EXEC_PERMISSIONS.getExitCode() ||
exitCode ==
ExitCode.INVALID_CONFIG_FILE.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_SCRIPT_COPY.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_CREDENTIALS_FILE.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_WORK_DIRECTORIES.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_APP_LOG_DIRECTORIES.getExitCode() ||
exitCode ==
ExitCode.COULD_NOT_CREATE_TMP_DIRECTORIES.getExitCode()) {
  throw new ConfigurationException(
  "Linux Container Executor reached unrecoverable exception", e);{code}
I can understand why these are treated as fatal with the existing process 
container model. However, with privileged Docker containers this may be too 
harsh, as Privileged Docker containers don't guarantee the user's identity will 
be propagated into the container.

In our case, a privileged container changed the 
"appcache//" directory permissions to 774. Some time later, 
the process in the container died and the Retry Policy kicked in to RELAUNCH 
the container. When the RELAUNCH occurred, container-executor checked the 
permissions of the "appcache//" directory (the existing 
workdir is retained for RELAUNCH) and returned exit code 35. Exit code 35 is 
COULD_NOT_CREATE_WORK_DIRECTORIES, which is a fatal error. This killed all 
containers running on that node, when really only this container would have 
been impacted.
{code:java}
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Container id: 
container_e15_1535130383425_0085_01_05
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Exit code: 35
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Exception message: Relaunch container 
failed
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Shell error output: Could not create 
container dirsCould not create local files and directories 5 6
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) -
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Shell output: main : command provided 
4
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - main : run as user is user
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - main : requested yarn user is yarn
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Creating script paths...
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Creating local dirs...
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Path 
/grid/0/hadoop/yarn/local/usercache/user/appcache/application_1535130383425_0085/container_e15_1535130383425_0085_01_05
 has permission 774 but needs per
mission 750.
2018-08-31 21:07:22,365 INFO  nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Wrote the exit code 35 to (null)
2018-08-31 21:07:22,386 ERROR launcher.ContainerRelaunch 
(ContainerRelaunch.java:call(129)) - Failed to launch container due to 
configuration error.
org.apache.hadoop.yarn.exceptions.ConfigurationException: Linux Container 
Executor reached unrecoverable exception
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleExitCode(LinuxContainerExecutor.java:633)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:573)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)

[jira] [Created] (YARN-8750) Refactor TestQueueMetrics

2018-09-06 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8750:


 Summary: Refactor TestQueueMetrics
 Key: YARN-8750
 URL: https://issues.apache.org/jira/browse/YARN-8750
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


{{TestQueueMetrics#checkApps}} and {{TestQueueMetrics#checkResources}} have 8 
and 14 parameters, respectively.
It is very hard to read the testcases that are using these methods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context

2018-09-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605721#comment-16605721
 ] 

Hadoop QA commented on YARN-8258:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} YARN-8258 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8258 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21777/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> YARN webappcontext for UI2 should inherit all filters from default context
> --
>
> Key: YARN-8258
> URL: https://issues.apache.org/jira/browse/YARN-8258
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Sumana Sathish
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: Screen Shot 2018-06-26 at 5.54.35 PM.png, 
> YARN-8258.001.patch, YARN-8258.002.patch, YARN-8258.003.patch, 
> YARN-8258.004.patch, YARN-8258.005.patch, YARN-8258.006.patch, 
> YARN-8258.007.patch, YARN-8258.008.patch
>
>
> Thanks [~ssath...@hortonworks.com] for finding this.
> Ideally all filters from default context has to be inherited to UI2 context 
> as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8745) Misplaced the TestRMWebServicesFairScheduler.java file.

2018-09-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605702#comment-16605702
 ] 

Hadoop QA commented on YARN-8745:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 11 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 56s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}120m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8745 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938618/YARN-8745.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8e3f375236df 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 962089a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21774/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21774/testReport/ |
| Max. process+thread count | 936 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration

2018-09-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605683#comment-16605683
 ] 

ASF GitHub Bot commented on YARN-8747:
--

Github user collinmazb closed the pull request at:

https://github.com/apache/hadoop/pull/411


> [UI2] YARN UI2 page loading failed due to js error under some time zone 
> configuration
> -
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Assignee: collinma
>Priority: Blocker
> Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration

2018-09-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605673#comment-16605673
 ] 

Hadoop QA commented on YARN-8747:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
35m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8747 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12938635/YARN-8747.001.patch |
| Optional Tests |  dupname  asflicense  shadedclient  |
| uname | Linux 31b3eda5335f 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 962089a |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 301 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21775/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> [UI2] YARN UI2 page loading failed due to js error under some time zone 
> configuration
> -
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Assignee: collinma
>Priority: Blocker
> Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8746) ui2 overview doesn't display GPU usage info when using Fairscheduler

2018-09-06 Thread collinma (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

collinma reassigned YARN-8746:
--

Assignee: collinma

> ui2 overview doesn't display GPU usage info when using Fairscheduler 
> -
>
> Key: YARN-8746
> URL: https://issues.apache.org/jira/browse/YARN-8746
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: collinma
>Assignee: collinma
>Priority: Blocker
>  Labels: GPU, fairscheduler, yarn
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When using fair scheduler, GPU related information isn't displayed because 
> the "metrics" api doesn't return any GPU related usage information( has run 
> yarn on GPU per [this 
> |[https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/UsingGpus.html]).]
>   The hadoop version is 3.1.1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration

2018-09-06 Thread collinma (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605631#comment-16605631
 ] 

collinma commented on YARN-8747:


Thanks [~sunilg]. I've re-sent a 
PR([https://github.com/apache/hadoop/pull/412).] Thanks for your work, very 
appriciate it!

> [UI2] YARN UI2 page loading failed due to js error under some time zone 
> configuration
> -
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Assignee: collinma
>Priority: Blocker
> Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration

2018-09-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605630#comment-16605630
 ] 

ASF GitHub Bot commented on YARN-8747:
--

GitHub user collinmazb opened a pull request:

https://github.com/apache/hadoop/pull/412

YARN-8747: update moment-timezone version to 0.5.1

re-sent a PR  per 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/collinmazb/hadoop trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/412.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #412


commit b547969d0446ad3a9fb1aa9038baaa091f4fc225
Author: collinma 
Date:   2018-09-06T10:54:19Z

YARN-8747: update moment-timezone version to 0.5.1




> [UI2] YARN UI2 page loading failed due to js error under some time zone 
> configuration
> -
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Assignee: collinma
>Priority: Blocker
> Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8726) [UI2] YARN UI2 is not accessible when config.env file failed to load

2018-09-06 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605601#comment-16605601
 ] 

Sunil Govindan commented on YARN-8726:
--

This looks good to me. Will commit shortly if no objections

> [UI2] YARN UI2 is not accessible when config.env file failed to load
> 
>
> Key: YARN-8726
> URL: https://issues.apache.org/jira/browse/YARN-8726
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Critical
> Attachments: YARN-8726.001.patch
>
>
> It is observed that yarn UI2 is not accessible. When UI2 is inspected, it 
> gives below error
> {code:java}
> index.html:1 Refused to execute script from 
> 'http://ctr-e138-1518143905142-456429-01-05.hwx.site:8088/ui2/config/configs.env'
>  because its MIME type ('text/plain') is not executable, and strict MIME type 
> checking is enabled.
> yarn-ui.js:219 base url:
> vendor.js:1978 ReferenceError: ENV is not defined
>  at updateConfigs (yarn-ui.js:212)
>  at Object.initialize (yarn-ui.js:218)
>  at vendor.js:824
>  at vendor.js:825
>  at visit (vendor.js:3025)
>  at Object.visit [as default] (vendor.js:3024)
>  at DAG.topsort (vendor.js:750)
>  at Class._runInitializer (vendor.js:825)
>  at Class.runInitializers (vendor.js:824)
>  at Class._bootSync (vendor.js:823)
> onerrorDefault @ vendor.js:1978
> trigger @ vendor.js:2967
> (anonymous) @ vendor.js:3006
> invoke @ vendor.js:626
> flush @ vendor.js:629
> flush @ vendor.js:619
> end @ vendor.js:642
> run @ vendor.js:648
> join @ vendor.js:648
> run.join @ vendor.js:1510
> (anonymous) @ vendor.js:1512
> fire @ vendor.js:230
> fireWith @ vendor.js:235
> ready @ vendor.js:242
> completed @ vendor.js:242
> vendor.js:823 Uncaught ReferenceError: ENV is not defined
>  at updateConfigs (yarn-ui.js:212)
>  at Object.initialize (yarn-ui.js:218)
>  at vendor.js:824
>  at vendor.js:825
>  at visit (vendor.js:3025)
>  at Object.visit [as default] (vendor.js:3024)
>  at DAG.topsort (vendor.js:750)
>  at Class._runInitializer (vendor.js:825)
>  at Class.runInitializers (vendor.js:824)
>  at Class._bootSync (vendor.js:823)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration

2018-09-06 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan reassigned YARN-8747:


Assignee: collinma

> [UI2] YARN UI2 page loading failed due to js error under some time zone 
> configuration
> -
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Assignee: collinma
>Priority: Blocker
> Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration

2018-09-06 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605599#comment-16605599
 ] 

Sunil Govindan commented on YARN-8747:
--

Thanks [~collinma] for the patch. I think ur pull request had few more commits, 
current patch attached here seems good to me.So i cud commit this patch instead 
of pull request. Or i can use the pull request to merge if u can share correct 
pull request against "trunk" branch with this change alone.

Also adding [~collinma] as a contributor so you can assign jiras later.

> [UI2] YARN UI2 page loading failed due to js error under some time zone 
> configuration
> -
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Priority: Blocker
> Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration

2018-09-06 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB reassigned YARN-8747:
--

Assignee: (was: Akhil PB)

> [UI2] YARN UI2 page loading failed due to js error under some time zone 
> configuration
> -
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Priority: Blocker
> Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration

2018-09-06 Thread Akhil PB (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605573#comment-16605573
 ] 

Akhil PB commented on YARN-8747:


Hi, the PR includes so many other changes along with moment-timezone update. I 
have just submitted a patch only for moment-timezone update since the bug was 
related to UI2.

[~sunilg] could you please help to review.

> [UI2] YARN UI2 page loading failed due to js error under some time zone 
> configuration
> -
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Assignee: Akhil PB
>Priority: Blocker
> Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration

2018-09-06 Thread Akhil PB (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605564#comment-16605564
 ] 

Akhil PB edited comment on YARN-8747 at 9/6/18 9:55 AM:


Attaching patch for the moment-timezone update.

cc [~sunilg]


was (Author: akhilpb):
Attaching patch for the moment-timezone update.

> [UI2] YARN UI2 page loading failed due to js error under some time zone 
> configuration
> -
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Assignee: Akhil PB
>Priority: Blocker
> Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration

2018-09-06 Thread collinma (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605565#comment-16605565
 ] 

collinma edited comment on YARN-8747 at 9/6/18 9:55 AM:


hi there, I've sent a pr([https://github.com/apache/hadoop/pull/411)] which 
update moment-timezone version to 0.5.1. Could someone here help review it?


was (Author: collinma):
hi there, I've send a pr([https://github.com/apache/hadoop/pull/411)] which 
update moment-timezone version to 0.5.1. Could someone here help review it?

> [UI2] YARN UI2 page loading failed due to js error under some time zone 
> configuration
> -
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Assignee: Akhil PB
>Priority: Blocker
> Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration

2018-09-06 Thread collinma (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605565#comment-16605565
 ] 

collinma commented on YARN-8747:


hi there, I've send a pr([https://github.com/apache/hadoop/pull/411)] which 
update moment-timezone version to 0.5.1. Could someone here help review it?

> [UI2] YARN UI2 page loading failed due to js error under some time zone 
> configuration
> -
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Assignee: Akhil PB
>Priority: Blocker
> Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration

2018-09-06 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-8747:
---
Summary: [UI2] YARN UI2 page loading failed due to js error under some time 
zone configuration  (was: ui2 page loading failed due to js error under some 
time zone configuration)

> [UI2] YARN UI2 page loading failed due to js error under some time zone 
> configuration
> -
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Assignee: Akhil PB
>Priority: Blocker
> Attachments: image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8747) ui2 page loading failed due to js error under some time zone configuration

2018-09-06 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB reassigned YARN-8747:
--

Assignee: Akhil PB

> ui2 page loading failed due to js error under some time zone configuration
> --
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Assignee: Akhil PB
>Priority: Blocker
> Attachments: image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context

2018-09-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605557#comment-16605557
 ] 

Hadoop QA commented on YARN-8258:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} YARN-8258 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8258 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21773/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> YARN webappcontext for UI2 should inherit all filters from default context
> --
>
> Key: YARN-8258
> URL: https://issues.apache.org/jira/browse/YARN-8258
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Sumana Sathish
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: Screen Shot 2018-06-26 at 5.54.35 PM.png, 
> YARN-8258.001.patch, YARN-8258.002.patch, YARN-8258.003.patch, 
> YARN-8258.004.patch, YARN-8258.005.patch, YARN-8258.006.patch, 
> YARN-8258.007.patch, YARN-8258.008.patch
>
>
> Thanks [~ssath...@hortonworks.com] for finding this.
> Ideally all filters from default context has to be inherited to UI2 context 
> as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8745) Misplaced the TestRMWebServicesFairScheduler.java file.

2018-09-06 Thread Y. SREENIVASULU REDDY (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605512#comment-16605512
 ] 

Y. SREENIVASULU REDDY commented on YARN-8745:
-

[~bibinchundatt]
I have attached the patch, addressed your comments.

> Misplaced the TestRMWebServicesFairScheduler.java file.
> ---
>
> Key: YARN-8745
> URL: https://issues.apache.org/jira/browse/YARN-8745
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, test
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Y. SREENIVASULU REDDY
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8745.001.patch, YARN-8745.002.patch
>
>
> TestRMWebServicesFairScheduler.java file exist in
> {noformat}
> hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java
> {noformat}
> But the package structure is 
> {noformat}
> package org.apache.hadoop.yarn.server.resourcemanager.webapp.fairscheduler;
> {noformat}
> so moving the file to proper package.
> YARN-7451 issue triggered from this ID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8745) Misplaced the TestRMWebServicesFairScheduler.java file.

2018-09-06 Thread Y. SREENIVASULU REDDY (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Y. SREENIVASULU REDDY updated YARN-8745:

Attachment: YARN-8745.002.patch

> Misplaced the TestRMWebServicesFairScheduler.java file.
> ---
>
> Key: YARN-8745
> URL: https://issues.apache.org/jira/browse/YARN-8745
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, test
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Y. SREENIVASULU REDDY
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8745.001.patch, YARN-8745.002.patch
>
>
> TestRMWebServicesFairScheduler.java file exist in
> {noformat}
> hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java
> {noformat}
> But the package structure is 
> {noformat}
> package org.apache.hadoop.yarn.server.resourcemanager.webapp.fairscheduler;
> {noformat}
> so moving the file to proper package.
> YARN-7451 issue triggered from this ID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context

2018-09-06 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605493#comment-16605493
 ] 

Sunil Govindan commented on YARN-8258:
--

kicking jenkins.

> YARN webappcontext for UI2 should inherit all filters from default context
> --
>
> Key: YARN-8258
> URL: https://issues.apache.org/jira/browse/YARN-8258
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Sumana Sathish
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: Screen Shot 2018-06-26 at 5.54.35 PM.png, 
> YARN-8258.001.patch, YARN-8258.002.patch, YARN-8258.003.patch, 
> YARN-8258.004.patch, YARN-8258.005.patch, YARN-8258.006.patch, 
> YARN-8258.007.patch, YARN-8258.008.patch
>
>
> Thanks [~ssath...@hortonworks.com] for finding this.
> Ideally all filters from default context has to be inherited to UI2 context 
> as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8258) YARN webappcontext for UI2 should inherit all filters from default context

2018-09-06 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605492#comment-16605492
 ] 

Sunil Govindan commented on YARN-8258:
--

If there are no objections, i ll get this patch for 3.2 release.

> YARN webappcontext for UI2 should inherit all filters from default context
> --
>
> Key: YARN-8258
> URL: https://issues.apache.org/jira/browse/YARN-8258
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Sumana Sathish
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: Screen Shot 2018-06-26 at 5.54.35 PM.png, 
> YARN-8258.001.patch, YARN-8258.002.patch, YARN-8258.003.patch, 
> YARN-8258.004.patch, YARN-8258.005.patch, YARN-8258.006.patch, 
> YARN-8258.007.patch, YARN-8258.008.patch
>
>
> Thanks [~ssath...@hortonworks.com] for finding this.
> Ideally all filters from default context has to be inherited to UI2 context 
> as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5592) Add support for dynamic resource updates with multiple resource types

2018-09-06 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605491#comment-16605491
 ] 

Sunil Govindan commented on YARN-5592:
--

Few comments on the design doc attached,

1. {{We can introduce a file named dymanic-resource-types.xml}} . This doesn't 
looks like a cleaner approach to me. We should use the existing 
resource-types.xml and add new types as desired. Once its loaded, yarn should 
auto detect whats new added and update internally the same.

2. {{We can introduce an option something like “-refreshResourceTypes”}} I am 
fine with such a CLI option to force YARN to fetch updated resource types. 
Naming seems a bit confusing which can be improved.

3. *Approach A:* is not good as its kills containers. As per me *Update 
existing resource types in RM (resource1)*, this is not an immediate use case, 
hence in first round lets skip this.

4. My 2 cents on removal of resource types. This is one of the complex 
operations. Some node may have this resource type, and some container may be 
running with this and some are waiting etc. Hence a removal of resource type 
cannot be done seemless. I think we should restart YARN ideally in such cases. 
[~leftnoteasy] [~cheersyang], what's ur thoughts on support removal of a 
resource types at run time from YARN?

> Add support for dynamic resource updates with multiple resource types
> -
>
> Key: YARN-5592
> URL: https://issues.apache.org/jira/browse/YARN-5592
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-5592-design-2.docx
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8749) Restrict job submission to queue based on apptype

2018-09-06 Thread Oleksandr Shevchenko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605489#comment-16605489
 ] 

Oleksandr Shevchenko edited comment on YARN-8749 at 9/6/18 8:59 AM:


YARN has the information about a type of each application in 
ApplicationSubmissionContext. We can implement this in 
RMAppManager#createAndPopulateNewRMApp() the same as ACL check for a queue. And 
add some additional property for Fair and Capacity schedulers.
For example:
{code}

  

   SPARK

  

{code}
In this case "q1" will have list of accessible application types wich contains 
only "SPARK" type. As the result we can submit only Spark application. 
Applications with some other types (YARN, TEZ etc) will be rejected.

Could someone evaluate this feature and approach? And if no one objects I would 
like to start working on this.
Thanks a lot for any comments.


was (Author: oshevchenko):
YARN has the information about a type of each application in 
ApplicationSubmissionContext. We can implement this in 
RMAppManager#createAndPopulateNewRMApp() the same as ACL check for a queue. And 
add some additional property for Fair and Capacity schedulers.
For example:

  

   SPARK

  

In this case "q1" will have list of accessible application types wich contains 
only "SPARK" type. As the result we can submit only Spark application. 
Applications with some other types (YARN, TEZ etc) will be rejected.

Could someone evaluate this feature and approach? And if no one objects I would 
like to start working on this.
Thanks a lot for any comments.

> Restrict job submission to queue based on apptype
> -
>
> Key: YARN-8749
> URL: https://issues.apache.org/jira/browse/YARN-8749
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: RM, scheduler
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
>
> The proposed possibility here is adding a new property for queue tuning to 
> allow submit an application to queue only with the allowed types. If an 
> application has a different type from queue allowed types, the application 
> should be rejected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8749) Restrict job submission to queue based on apptype

2018-09-06 Thread Oleksandr Shevchenko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605489#comment-16605489
 ] 

Oleksandr Shevchenko commented on YARN-8749:


YARN has the information about a type of each application in 
ApplicationSubmissionContext. We can implement this in 
RMAppManager#createAndPopulateNewRMApp() the same as ACL check for a queue. And 
add some additional property for Fair and Capacity schedulers.
For example:

  

   SPARK

  

In this case "q1" will have list of accessible application types wich contains 
only "SPARK" type. As the result we can submit only Spark application. 
Applications with some other types (YARN, TEZ etc) will be rejected.

Could someone evaluate this feature and approach? And if no one objects I would 
like to start working on this.
Thanks a lot for any comments.

> Restrict job submission to queue based on apptype
> -
>
> Key: YARN-8749
> URL: https://issues.apache.org/jira/browse/YARN-8749
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: RM, scheduler
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
>
> The proposed possibility here is adding a new property for queue tuning to 
> allow submit an application to queue only with the allowed types. If an 
> application has a different type from queue allowed types, the application 
> should be rejected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8749) Restrict job submission to queue based on apptype

2018-09-06 Thread Oleksandr Shevchenko (JIRA)
Oleksandr Shevchenko created YARN-8749:
--

 Summary: Restrict job submission to queue based on apptype
 Key: YARN-8749
 URL: https://issues.apache.org/jira/browse/YARN-8749
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: RM, scheduler
Reporter: Oleksandr Shevchenko
Assignee: Oleksandr Shevchenko


The proposed possibility here is adding a new property for queue tuning to 
allow submit an application to queue only with the allowed types. If an 
application has a different type from queue allowed types, the application 
should be rejected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org