date:20171108


 [ 
https://issues.apache.org/jira/browse/YARN-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasudevan Skm updated YARN-7464:

Attachment: YARN-7464.001.patch
Screen Shot 2017-11-08 at 4.56.04 PM.png
Screen Shot 2017-11-08 at 4.56.12 PM.png

[~sunil.gov...@gmail.com]

> Allow fiters on Nodes page
> --
>
> Key: YARN-7464
> URL: https://issues.apache.org/jira/browse/YARN-7464
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Vasudevan Skm
>Assignee: Vasudevan Skm
> Attachments: Screen Shot 2017-11-08 at 4.56.04 PM.png, Screen Shot 
> 2017-11-08 at 4.56.12 PM.png, YARN-7464.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7461) DominantResourceCalculator#ratio calculation problem when right resource contains zero value

2017-11-08 Thread Tao Yang (JIRA)

Tao Yang created YARN-7461:
--

 Summary: DominantResourceCalculator#ratio calculation problem when 
right resource contains zero value
 Key: YARN-7461
 URL: https://issues.apache.org/jira/browse/YARN-7461
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0-alpha4
Reporter: Tao Yang
Priority: Minor


Currently DominantResourceCalculator#ratio may return wrong result when right 
resource contains zero value. For example, there are three resource types such 
as , leftResource=<5, 5, 0> and rightResource=<10, 10, 
0>, we expect the result of DominantResourceCalculator#ratio(leftResource, 
rightResource) is 0.5 but currently is NaN.
There should be a verification before divide calculation to ensure that 
dividend is not zero.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7453) RM fail to switch to active after first successful start

2017-11-08 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-7453:
--
Attachment: YARN-7453.001.patch

Reverted YARN-6840's ResourceManager and ZKRMStateStore changes. This solves 
the issue for now. Detailed analysis will be shared a bit later

> RM fail to switch to active after first successful start
> 
>
> Key: YARN-7453
> URL: https://issues.apache.org/jira/browse/YARN-7453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-7453.001.patch
>
>
> It is observed that RM fail to switch to ACTIVE after first successful start! 
> The below exception throws when RM is switching from ACTIVE->STANDBY->ACTIVE. 
> This continues in loop!
> {noformat}
> 2017-11-07 15:08:11,664 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning 
> to active state
> 2017-11-07 15:08:11,669 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery 
> started
> 2017-11-07 15:08:11,669 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded 
> RM state version info 1.5
> 2017-11-07 15:08:11,670 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
>   at 
> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1162)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1202)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1198)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1198)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7454) RMAppAttemptMetrics#getAggregateResourceUsage can NPE due to double lookup


[ 
https://issues.apache.org/jira/browse/YARN-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243820#comment-16243820
 ] 

Hadoop QA commented on YARN-7454:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 4 unchanged - 1 fixed = 4 total (was 5) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 53m  
7s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 98m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7454 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12896453/YARN-7454.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5f6cae70a458 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 
15:49:21 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e4c220e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/18398/testReport/ |
| Max. process+thread count | 810 (vs. ulimit of 5000) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U:

[jira] [Commented] (YARN-7406) Moving logging APIs over to slf4j in hadoop-yarn-api

2017-11-08 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243700#comment-16243700
 ] 

Akira Ajisaka commented on YARN-7406:
-

LGTM, +1

> Moving logging APIs over to slf4j in hadoop-yarn-api
> 
>
> Key: YARN-7406
> URL: https://issues.apache.org/jira/browse/YARN-7406
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Yeliang Cang
>Assignee: Yeliang Cang
> Attachments: YARN-7406.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7462) Render outstanding resource requests on application details page


 [ 
https://issues.apache.org/jira/browse/YARN-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasudevan Skm updated YARN-7462:

Attachment: YARN-7462.002.patch

Fixes the typo in the previous patch [~sunil.gov...@gmail.com]

> Render outstanding resource requests on application details page
> 
>
> Key: YARN-7462
> URL: https://issues.apache.org/jira/browse/YARN-7462
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Vasudevan Skm
>Assignee: Vasudevan Skm
> Attachments: Screen Shot 2017-11-08 at 3.24.30 PM.png, 
> YARN-7462.001.patch, YARN-7462.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7462) Render outstanding resource requests on application details page


[ 
https://issues.apache.org/jira/browse/YARN-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243638#comment-16243638
 ] 

Vasudevan Skm commented on YARN-7462:
-

[~sunil.gov...@gmail.com][~wangda]

> Render outstanding resource requests on application details page
> 
>
> Key: YARN-7462
> URL: https://issues.apache.org/jira/browse/YARN-7462
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Vasudevan Skm
>Assignee: Vasudevan Skm
> Attachments: YARN-7462.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7462) Render outstanding resource requests on application details page


 [ 
https://issues.apache.org/jira/browse/YARN-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasudevan Skm updated YARN-7462:

Attachment: YARN-7462.001.patch

> Render outstanding resource requests on application details page
> 
>
> Key: YARN-7462
> URL: https://issues.apache.org/jira/browse/YARN-7462
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Vasudevan Skm
>Assignee: Vasudevan Skm
> Attachments: YARN-7462.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7462) Render outstanding resource requests on application details page


 [ 
https://issues.apache.org/jira/browse/YARN-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasudevan Skm updated YARN-7462:

Attachment: Screen Shot 2017-11-08 at 3.24.30 PM.png

> Render outstanding resource requests on application details page
> 
>
> Key: YARN-7462
> URL: https://issues.apache.org/jira/browse/YARN-7462
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Vasudevan Skm
>Assignee: Vasudevan Skm
> Attachments: Screen Shot 2017-11-08 at 3.24.30 PM.png, 
> YARN-7462.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7461) DominantResourceCalculator#ratio calculation problem when right resource contains zero value

2017-11-08 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-7461:
---
Attachment: YARN-7461.001.patch

> DominantResourceCalculator#ratio calculation problem when right resource 
> contains zero value
> 
>
> Key: YARN-7461
> URL: https://issues.apache.org/jira/browse/YARN-7461
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Tao Yang
>Priority: Minor
> Attachments: YARN-7461.001.patch
>
>
> Currently DominantResourceCalculator#ratio may return wrong result when right 
> resource contains zero value. For example, there are three resource types 
> such as , leftResource=<5, 5, 0> and 
> rightResource=<10, 10, 0>, we expect the result of 
> DominantResourceCalculator#ratio(leftResource, rightResource) is 0.5 but 
> currently is NaN.
> There should be a verification before divide calculation to ensure that 
> dividend is not zero.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7462) Render outstanding resource requests on application details page

Vasudevan Skm created YARN-7462:
---

 Summary: Render outstanding resource requests on application 
details page
 Key: YARN-7462
 URL: https://issues.apache.org/jira/browse/YARN-7462
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Vasudevan Skm
Assignee: Vasudevan Skm






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7462) Render outstanding resource requests on application details page


 [ 
https://issues.apache.org/jira/browse/YARN-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasudevan Skm updated YARN-7462:

Attachment: Screen Shot 2017-11-08 at 3.38.48 PM.png

> Render outstanding resource requests on application details page
> 
>
> Key: YARN-7462
> URL: https://issues.apache.org/jira/browse/YARN-7462
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Vasudevan Skm
>Assignee: Vasudevan Skm
> Attachments: Screen Shot 2017-11-08 at 3.24.30 PM.png, Screen Shot 
> 2017-11-08 at 3.38.48 PM.png, YARN-7462.001.patch, YARN-7462.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7464) Allow fiters on Nodes page

Vasudevan Skm created YARN-7464:
---

 Summary: Allow fiters on Nodes page
 Key: YARN-7464
 URL: https://issues.apache.org/jira/browse/YARN-7464
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-ui-v2
Reporter: Vasudevan Skm
Assignee: Vasudevan Skm






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7464) Allow fiters on Nodes page


[ 
https://issues.apache.org/jira/browse/YARN-7464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243798#comment-16243798
 ] 

Hadoop QA commented on YARN-7464:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
25m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 57s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7464 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12896635/YARN-7464.001.patch |
| Optional Tests |  asflicense  shadedclient  |
| uname | Linux eaa38cb8eb1c 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e4c220e |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 402 (vs. ulimit of 5000) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/18400/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Allow fiters on Nodes page
> --
>
> Key: YARN-7464
> URL: https://issues.apache.org/jira/browse/YARN-7464
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Vasudevan Skm
>Assignee: Vasudevan Skm
> Attachments: Screen Shot 2017-11-08 at 4.56.04 PM.png, Screen Shot 
> 2017-11-08 at 4.56.12 PM.png, YARN-7464.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container


[ 
https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243982#comment-16243982
 ] 

Shane Kumpf commented on YARN-7430:
---

{quote}
This is not true, see the following examples:
{quote}

I guess I don't understand what those examples are trying to convey. I become 
root with or without privileged if I don't supply the --user/uid flags. The 
centos image has no USER entry, so this is what I would expect.

{code}
[foo@localhost ~]$ docker run -it centos:latest bash
[root@00f0c3ac84cf /]# id
uid=0(root) gid=0(root) groups=0(root)

[foo@localhost ~]$ docker run -it --privileged centos:latest bash
[root@955eb326cb66 /]# id
uid=0(root) gid=0(root) groups=0(root)
{code}

With user remapping disabled, which is the default, the {{docker run}} form is 
different than what you are testing. It is {{docker run --detach=true 
--user= ...}} (not --user=) - that doesn't seem to suffer from 
the issue you call out where the primary group is missing, since the container 
fails to start if the user doesn't exist in the container.

{code:java}
[foo@localhost ~]$ docker run -it --user=foo centos:latest bash
docker: Error response from daemon: linux spec user: unable to find user foo: 
no matching entries in passwd file
{code}

At this point, I'm confused on exactly what conditions result in this exploit. 
Can you clarify? I've yet to see the form you tested occur anywhere. I see the 
following: 

* Without user remapping: docker run --user='skumpf' ... 
* With user remapping: docker run --user='501:502' --group-add='502' ... 

{quote}
When --privileged=true and --user are set, the container is started with root 
privilegs and drop to the user privileges. If there is sticky bits binary in 
the container file system, it is possible for process to resume root 
privileges. If the container filesystem can be tainted by pushing custom image 
with sticky bits, then jailbreak is possible.
{quote}

I don't understand how that is exploitable. The ENTRYPOINT/CMD will be run as 
the user supplied by YARN. If the ENTRYPOINT/CMD is a setuid binary that gives 
that user root access in the container, this becomes true, but I can do that 
without a privileged container.

{quote}
Docker does not make any change to the file permission.
{quote}

That's my point. Consider the following Dockerfile:
{code}
FROM centos

USERADD foo

USER foo

COPY run.sh /

CMD /run.sh
{code}

I then submit an application as user "skumpf" that uses the image above. The 
localized resources and container launch script is owned by "skumpf" on the 
host and will be bind mounted into the container. With the current behavior 
using {{docker run}} and {{--user}} the the launch script will be ran as 
"skumpf" (per our docs skumpf must exist in the container and have the same UID 
as the host), even in the privileged case. If we remove {{--user}} from 
{{docker run}} in the the privileged case, then now the the launch script will 
be executed by user "foo" in my container, using whatever UID "foo" has in the 
container. User "foo" in the container does not have permission to execute the 
launch script owned by "skumpf" and thus the container will fail to launch with 
a permission denied error. We need the {{--user/uid}} option even if privileged 
is requested, because without it, we have no idea what user the container will 
run as.

> User and Group mapping are incorrect in docker container
> 
>
> Key: YARN-7430
> URL: https://issues.apache.org/jira/browse/YARN-7430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security, yarn
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7430.001.patch
>
>
> In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to 
> enforce user and group for the running user.  In YARN-6623, this translated 
> to --user=test --group-add=group1.  The code no longer enforce group 
> correctly for launched process.  
> In addition, the implementation in YARN-6623 requires the user and group 
> information to exist in container to translate username and group to uid/gid. 
>  For users on LDAP, there is no good way to populate container with user and 
> group information. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-7453) Fix issue where RM fails to switch to active after first successful start


 [ 
https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reassigned YARN-7453:
-

Assignee: Rohith Sharma K S  (was: Arun Suresh)

> Fix issue where RM fails to switch to active after first successful start
> -
>
> Key: YARN-7453
> URL: https://issues.apache.org/jira/browse/YARN-7453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-7453.001.patch, YARN-7453.001.patch
>
>
> It is observed that RM fail to switch to ACTIVE after first successful start! 
> The below exception throws when RM is switching from ACTIVE->STANDBY->ACTIVE. 
> This continues in loop!
> {noformat}
> 2017-11-07 15:08:11,664 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning 
> to active state
> 2017-11-07 15:08:11,669 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery 
> started
> 2017-11-07 15:08:11,669 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded 
> RM state version info 1.5
> 2017-11-07 15:08:11,670 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
>   at 
> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1162)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1202)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1198)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1198)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7453) Fix issue where RM fails to switch to active after first successful start

2017-11-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244251#comment-16244251
 ] 

Hudson commented on YARN-7453:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13203 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13203/])
YARN-7453. Fix issue where RM fails to switch to active after first (arun 
suresh: rev a9c70b0e84dab0c41e480a0dc0cb1a22efdc64ee)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/ZKConfigurationStore.java


> Fix issue where RM fails to switch to active after first successful start
> -
>
> Key: YARN-7453
> URL: https://issues.apache.org/jira/browse/YARN-7453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-7453.001.patch, YARN-7453.001.patch
>
>
> It is observed that RM fail to switch to ACTIVE after first successful start! 
> The below exception throws when RM is switching from ACTIVE->STANDBY->ACTIVE. 
> This continues in loop!
> {noformat}
> 2017-11-07 15:08:11,664 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning 
> to active state
> 2017-11-07 15:08:11,669 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery 
> started
> 2017-11-07 15:08:11,669 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded 
> RM state version info 1.5
> 2017-11-07 15:08:11,670 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
>   at 
> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1162)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1202)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1198)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1198)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
>   at 
>

[jira] [Commented] (YARN-7458) TestContainerManagerSecurity is still flakey

2017-11-08 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244226#comment-16244226
 ] 

Jason Lowe commented on YARN-7458:
--

Thanks for the patch!

If the container never completes then the method just moves on as if it did.  
Shouldn't it throw?  Assuming it should, GenericTestUtils.waitFor seems 
appropriate here.

Nit: I'm never a fan of 1 second sleeps in tests (or sleeps at all if we can 
avoid it).  It's almost always overkill and makes the test slower than it needs 
to be.  If a test had to wait for 10 containers to complete serially that's 10 
seconds of wasted test time.  I'd change this to at most 100msec, probably just 
10msec.


> TestContainerManagerSecurity is still flakey
> 
>
> Key: YARN-7458
> URL: https://issues.apache.org/jira/browse/YARN-7458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-7458.001.patch
>
>
> YARN-6150 made this less flakey, but we're still seeing an occasional issue 
> here:
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:420)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:356)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:167)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7343) Add a junit test for ContainerScheduler recovery


 [ 
https://issues.apache.org/jira/browse/YARN-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7343:
--
Fix Version/s: (was: 2.9.0)

> Add a junit test for ContainerScheduler recovery
> 
>
> Key: YARN-7343
> URL: https://issues.apache.org/jira/browse/YARN-7343
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: kartheek muthyala
>Assignee: Sampada Dehankar
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: YARN-7343.001.patch, YARN-7343.002.patch, 
> YARN-7343.003.patch
>
>
> With queuing at NM, Container recovery becomes interesting. Add a junit test 
> for recovering containers in different states. This should test the recovery 
> with the ContainerScheduler class that was introduced for enabling container 
> queuing on contention of resources. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7453) RM fail to switch to active after first successful start


[ 
https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243986#comment-16243986
 ] 

Hadoop QA commented on YARN-7453:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 30s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 37 unchanged - 0 fixed = 38 total (was 37) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 58s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 56s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}114m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
 |
|   | 
hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService 
|
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing |
| Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands |
|   | org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7453 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12896637/YARN-7453.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux aa45fb2f4809 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality |

[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 alpha release


[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244182#comment-16244182
 ] 

Haibo Chen commented on YARN-7346:
--

Please help me understand this. The mapreduce.tar.gz is shipped for every hbase 
mapreduce job as a resource that will be localized by YARN for every container, 
right? If so, mapreduce.tar.gz should ideally contain just mapreduce client 
modules and their dependency modules, and yarn-node-manager is not one of them.
Is the dependency of hbase mapreduce job on node-manager jars necessary?

> Fix compilation errors against hbase2 alpha release
> ---
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Vrushali C
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7453) Fix issue where RM fails to switch to active after first successful start


 [ 
https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7453:
--
Summary: Fix issue where RM fails to switch to active after first 
successful start  (was: RM fail to switch to active after first successful 
start)

> Fix issue where RM fails to switch to active after first successful start
> -
>
> Key: YARN-7453
> URL: https://issues.apache.org/jira/browse/YARN-7453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-7453.001.patch, YARN-7453.001.patch
>
>
> It is observed that RM fail to switch to ACTIVE after first successful start! 
> The below exception throws when RM is switching from ACTIVE->STANDBY->ACTIVE. 
> This continues in loop!
> {noformat}
> 2017-11-07 15:08:11,664 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning 
> to active state
> 2017-11-07 15:08:11,669 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery 
> started
> 2017-11-07 15:08:11,669 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded 
> RM state version info 1.5
> 2017-11-07 15:08:11,670 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
>   at 
> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1162)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1202)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1198)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1198)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-7453) Fix issue where RM fails to switch to active after first successful start


 [ 
https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reassigned YARN-7453:
-

Assignee: Arun Suresh

> Fix issue where RM fails to switch to active after first successful start
> -
>
> Key: YARN-7453
> URL: https://issues.apache.org/jira/browse/YARN-7453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Rohith Sharma K S
>Assignee: Arun Suresh
>Priority: Blocker
> Attachments: YARN-7453.001.patch, YARN-7453.001.patch
>
>
> It is observed that RM fail to switch to ACTIVE after first successful start! 
> The below exception throws when RM is switching from ACTIVE->STANDBY->ACTIVE. 
> This continues in loop!
> {noformat}
> 2017-11-07 15:08:11,664 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning 
> to active state
> 2017-11-07 15:08:11,669 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery 
> started
> 2017-11-07 15:08:11,669 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded 
> RM state version info 1.5
> 2017-11-07 15:08:11,670 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
>   at 
> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1162)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1202)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1198)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1198)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3091) [Umbrella] Improve and fix locks of RM scheduler

2017-11-08 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244191#comment-16244191
 ] 

Jason Lowe commented on YARN-3091:
--

bq. I'm looking at reverting the read/write lock changes within the fair 
scheduler at least. Thoughts?

+1, we've also seen a number of problems around the scheduler's read/write 
locks and have done some short-term fixes to work around them like YARN-6680.  
They are significantly more expensive to acquire than a standard mutex if 
nobody is holding the lock, and there are lots of places where the scheduler 
needs to acquire them during a scheduling pass.


> [Umbrella] Improve and fix locks of RM scheduler
> 
>
> Key: YARN-3091
> URL: https://issues.apache.org/jira/browse/YARN-3091
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: capacityscheduler, fairscheduler, resourcemanager, 
> scheduler
>Reporter: Wangda Tan
>
> In existing YARN RM scheduler, there're some issues of using locks. For 
> example:
> - Many unnecessary synchronized locks, we have seen several cases recently 
> that too frequent access of scheduler makes scheduler hang. Which could be 
> addressed by using read/write lock. Components include scheduler, CS queues, 
> apps
> - Some fields not properly locked (Like clusterResource)
> We can address them together in this ticket.
> (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7453) RM fail to switch to active after first successful start


[ 
https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244205#comment-16244205
 ] 

Arun Suresh commented on YARN-7453:
---

+1 for the patch.
I ran the failed and timeout tests locally - it works for me. They just seem to 
be flaky

Committing this shortly (will take care of checkstyle when I commit)

> RM fail to switch to active after first successful start
> 
>
> Key: YARN-7453
> URL: https://issues.apache.org/jira/browse/YARN-7453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-7453.001.patch, YARN-7453.001.patch
>
>
> It is observed that RM fail to switch to ACTIVE after first successful start! 
> The below exception throws when RM is switching from ACTIVE->STANDBY->ACTIVE. 
> This continues in loop!
> {noformat}
> 2017-11-07 15:08:11,664 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning 
> to active state
> 2017-11-07 15:08:11,669 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery 
> started
> 2017-11-07 15:08:11,669 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded 
> RM state version info 1.5
> 2017-11-07 15:08:11,670 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
>   at 
> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1162)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1202)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1198)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1198)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7343) Add a junit test for ContainerScheduler recovery


 [ 
https://issues.apache.org/jira/browse/YARN-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-7343:
--
Fix Version/s: 3.1.0
   2.9.0

> Add a junit test for ContainerScheduler recovery
> 
>
> Key: YARN-7343
> URL: https://issues.apache.org/jira/browse/YARN-7343
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: kartheek muthyala
>Assignee: Sampada Dehankar
>Priority: Minor
> Fix For: 2.9.0, 3.1.0
>
> Attachments: YARN-7343.001.patch, YARN-7343.002.patch, 
> YARN-7343.003.patch
>
>
> With queuing at NM, Container recovery becomes interesting. Add a junit test 
> for recovering containers in different states. This should test the recovery 
> with the ContainerScheduler class that was introduced for enabling container 
> queuing on contention of resources. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6128) Add support for AMRMProxy HA

2017-11-08 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-6128:
---
Attachment: YARN-6128.v7.patch

> Add support for AMRMProxy HA
> 
>
> Key: YARN-6128
> URL: https://issues.apache.org/jira/browse/YARN-6128
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, nodemanager
>Reporter: Subru Krishnan
>Assignee: Botong Huang
> Attachments: YARN-6128.v0.patch, YARN-6128.v1.patch, 
> YARN-6128.v1.patch, YARN-6128.v2.patch, YARN-6128.v3.patch, 
> YARN-6128.v3.patch, YARN-6128.v4.patch, YARN-6128.v5.patch, 
> YARN-6128.v6.patch, YARN-6128.v7.patch
>
>
> YARN-556 added the ability for RM failover without loosing any running 
> applications. In a Federated YARN environment, there's additional state in 
> the {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we 
> need to enhance {{AMRMProxy}} to support HA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container


[ 
https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244498#comment-16244498
 ] 

Shane Kumpf commented on YARN-7430:
---

I still believe there will be an issue if we do not specify --user. This causes 
problems for launching the container. Please try running distributed shell or 
similar using the Dockerfile I provided with --user removed, and you will see 
the behavior, the container will fail to launch.

IIUC, {{\-\-privileged}} == {{\-\-user=root}} (or {{--user=0:0}}) in your view, 
correct? If so, doing that would satisfy the condition here if we set the user 
to root for privileged containers. I see some cases where that isn't necessary 
and I'm unsure how it might impact log aggregation, but I think it could work.

> User and Group mapping are incorrect in docker container
> 
>
> Key: YARN-7430
> URL: https://issues.apache.org/jira/browse/YARN-7430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security, yarn
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7430.001.patch
>
>
> In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to 
> enforce user and group for the running user.  In YARN-6623, this translated 
> to --user=test --group-add=group1.  The code no longer enforce group 
> correctly for launched process.  
> In addition, the implementation in YARN-6623 requires the user and group 
> information to exist in container to translate username and group to uid/gid. 
>  For users on LDAP, there is no good way to populate container with user and 
> group information. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7457) Delay scheduling should be an individual policy instead of part of scheduler implementation


[ 
https://issues.apache.org/jira/browse/YARN-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244426#comment-16244426
 ] 

Daniel Templeton commented on YARN-7457:


I have a long to-do list. :)  We'll see who gets there first.

> Delay scheduling should be an individual policy instead of part of scheduler 
> implementation
> ---
>
> Key: YARN-7457
> URL: https://issues.apache.org/jira/browse/YARN-7457
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> Currently, different schedulers have slightly different delay scheduling 
> implementations. Ideally we should make delay scheduling independent from 
> scheduler implementation. Benefits of doing this:
> 1) Applications can choose which delay scheduling policy to use, it could be 
> time-based / missed-opportunistic-based or whatever new delay scheduling 
> policy supported by the cluster. Now it is global config of scheduler.
> 2) Make scheduler implementations simpler and reusable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7430) User and Group mapping are incorrect in docker container

[
https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1623#comment-1623
]

Eric Yang edited comment on YARN-7430 at 11/8/17 6:07 PM:
--

[~shaneku...@gmail.com] . Thank you for explaining your point of view. I
understand how you arrived at these conclusions, but some use cases can not be
satisfied by the current implementation.

{quote}
User "foo" in the container does not have permission to execute the launch
script owned by "skumpf" and thus the container will fail to launch with a
permission denied error. We need the -user/uid option even if privileged is
requested, because without it, we have no idea what user the container will run
as.
{quote}

What is the point of using privileged flag, if the process can only run as
"skump" to run properly for privileged container? When container is granted
with root power, root user should have ability to do anything, why drop that
privilege away then reacquire it later using sticky bit? It is counter
intuitive.

Let review the ground rules that docker recommends, and what we are
recommending to Hadoop users.

# Docker security document clearly stated that docker must be run by trusted
user only. This means user either have sudo privileges or they are part of
docker group.
# Privileged container allows ENTRYPOINT to spawn multi-user environment such
as systemd or init like environment for multi-user support.
# Hadoop YARN user can be a trusted user to spawn docker containers on behave
of the end user.
# Hadoop simulates doAs call through container-executor, therefore docker
security recommendation stay intact. If container must run for end user who
isn't part of privileged user nor docker group, then precaution must be taken
to secure point of entry by yarn user or container-executor.
# Docker does not know about external users and group on LDAP, hence use of
{{\-\-user username}} is essentially limited to container's {{/etc/passwd}} and
{{/etc/group}} to lookup group membership. Users/Group can be programmed into
docker container build, however this solution can not be generalized for LDAP
users in Hadoop eco-system. We don't want to end up rebuilding images, each
time a new LDAP user is added.
# Docker added {{\-\-user uid:gid}} and {{\-\-group-add}} to assign user
credential and group membership without the user depends on /etc/passwd and
/etc/group for lookup for dynamic users.

In order to resolve the conflicting user management between docker and Hadoop.
We must streamline the implementation to have capacity of supporting
multi-users docker container (privileged container) as well as single LDAP user
container (non-privileged container). Privileged container can only be spawned
by trusted user for trusted user. Hence, the privileged container image can
contain multiple users that is already pre-approved by system administrator.
Privileged container can acquire additional resources using mount points, and
consistent file system ACL inside and outside of container governs the overall
security.

There should never be a case where we allow localized resource for {{skump}} to
work as {{foo}} user without properly secure file system ACL. At least we
don't want to make this case work to ensure file system ACL rules are not
broken. {{skump}} must do more work to secure localize resource with proper
permission, if he has the power. Ultimately, file system permission is the
last line of security defense that we have for storing files in HDFS via NFS
mount point.

>From this point of view, does it make more sense to run {{\-\-privileged}}
>without {{\-\-user username}}?

was (Author: eyang):
[~shaneku...@gmail.com] . Thank you for explaining your point of view. I
understand how you arrived at these conclusions, but some use cases can not be
satisfied by the current implementation.

Let review the ground rules that docker recommends, and what we are
recommending to Hadoop users.

[jira] [Commented] (YARN-7330) Add support to show GPU on UI/metrics

2017-11-08 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244289#comment-16244289
 ] 

Sunil G commented on YARN-7330:
---

cc/ [~skmvasu]

> Add support to show GPU on UI/metrics
> -
>
> Key: YARN-7330
> URL: https://issues.apache.org/jira/browse/YARN-7330
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-7330.0-wip.patch, YARN-7330.003.patch, 
> YARN-7330.004.patch, YARN-7330.1-wip.patch, YARN-7330.2-wip.patch, 
> screencapture-0-wip.png
>
>
> We should be able to view GPU metrics from UI/REST API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7461) DominantResourceCalculator#ratio calculation problem when right resource contains zero value


[ 
https://issues.apache.org/jira/browse/YARN-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244338#comment-16244338
 ] 

Daniel Templeton commented on YARN-7461:


Thanks for the patch.  Couple of comments:

# Missing a space before the '{' on DominantResourceCalculator:L393
# Instead of setting up the resource by hand in 
{{testRatioWithResourceValuesContainZero()}}, why not call 
{{setupExtraResource()}}?

> DominantResourceCalculator#ratio calculation problem when right resource 
> contains zero value
> 
>
> Key: YARN-7461
> URL: https://issues.apache.org/jira/browse/YARN-7461
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-7461.001.patch
>
>
> Currently DominantResourceCalculator#ratio may return wrong result when right 
> resource contains zero value. For example, there are three resource types 
> such as , leftResource=<5, 5, 0> and 
> rightResource=<10, 10, 0>, we expect the result of 
> DominantResourceCalculator#ratio(leftResource, rightResource) is 0.5 but 
> currently is NaN.
> There should be a verification before divide calculation to ensure that 
> dividend is not zero.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7388) TestAMRestart should be scheduler agnostic


[ 
https://issues.apache.org/jira/browse/YARN-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244381#comment-16244381
 ] 

Haibo Chen commented on YARN-7388:
--

Thanks [~rkanter] for the review! killContainer() is solely called in 
TestAMRestart to simulate AM container failures. In that sense, none of the 
available ContainerExitStatus matches that intension nicely. The closest to the 
method name is probably KILLED_BY_RESOURCEMANAGER if we ignore the real 
intension. Will change the status to that in FairScheduler and address the 
other comments.

> TestAMRestart should be scheduler agnostic
> --
>
> Key: YARN-7388
> URL: https://issues.apache.org/jira/browse/YARN-7388
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7388.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container


[ 
https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1623#comment-1623
 ] 

Eric Yang commented on YARN-7430:
-

[~shaneku...@gmail.com] . Thank you for explaining your point of view.  I 
understand how you arrived at these conclusions, but some use cases can not be 
satisfied by the current implementation.  

{quote}
User "foo" in the container does not have permission to execute the launch 
script owned by "skumpf" and thus the container will fail to launch with a 
permission denied error. We need the -user/uid option even if privileged is 
requested, because without it, we have no idea what user the container will run 
as.
{quote}

What is the point of using privileged flag, if the process can only run as 
"skump" to run properly for privileged container?  When container is granted 
with root power, root user should have ability to do anything, why drop that 
privilege away then reacquire it later using sticky bit?  It is counter 
intuitive.

Let review the ground rules that docker recommends, and what we are 
recommending to Hadoop users.

# Docker security document clearly stated that docker must be run by trusted 
user only.  This means user either have sudo privileges or they are part of 
docker group.
# Privileged container allows ENTRYPOINT to spawn multi-user environment such 
as systemd or init like environment for multi-user support.
# Hadoop YARN user can be a trusted user to spawn docker containers on behave 
of the end user.
# Hadoop simulates doAs call through container-executor, therefore docker 
security recommendation stay intact.  If container must run for end user who 
isn't part of privileged user nor docker group, then precaution must be taken 
to secure point of entry by yarn user or container-executor.
# Docker does not know about external users and group on LDAP, hence use of 
{{--user [username]}} is essentially limited to container's {{/etc/passwd}} and 
{{/etc/group}} to lookup group membership.  Users/Group can be programmed into 
docker container build, however this solution can not be generalized for LDAP 
users in Hadoop eco-system.  We don't want to end up rebuilding images, each 
time a new LDAP user is added.
# Docker added {{--user uid:gid}} and {{--group-add}} to assign user credential 
and group membership without the user depends on /etc/passwd and /etc/group for 
lookup for dynamic users.

In order to resolve the conflicting user management between docker and Hadoop.  
We must streamline the implementation to have capacity of supporting 
multi-users docker container (privileged container) as well as single LDAP user 
container (non-privileged container).  Privileged container can only be spawned 
by trusted user for trusted user.  Hence, the privileged container image can 
contain multiple users that is already pre-approved by system administrator.  
Privileged container can acquire additional resources using mount points, and 
consistent file system ACL inside and outside of container governs the overall 
security.  

There should never be a case where we allow localized resource for {{skump}} to 
work as {{foo}} user without properly secure file system ACL.  At least we 
don't want to make this case work to ensure file system ACL rules are not 
broken.  Ultimately, file system permission is the last line of security 
defense that we have for storing files in HDFS via NFS mount point.

>From this point of view, does it make more sense to run {{--privileged}} 
>without {{--user username}}?



> User and Group mapping are incorrect in docker container
> 
>
> Key: YARN-7430
> URL: https://issues.apache.org/jira/browse/YARN-7430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security, yarn
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7430.001.patch
>
>
> In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to 
> enforce user and group for the running user.  In YARN-6623, this translated 
> to --user=test --group-add=group1.  The code no longer enforce group 
> correctly for launched process.  
> In addition, the implementation in YARN-6623 requires the user and group 
> information to exist in container to translate username and group to uid/gid. 
>  For users on LDAP, there is no good way to populate container with user and 
> group information. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7457) Delay scheduling should be an individual policy instead of part of scheduler implementation


[ 
https://issues.apache.org/jira/browse/YARN-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244360#comment-16244360
 ] 

Daniel Templeton commented on YARN-7457:


I think it makes good sense to abstract that out as a service.  It was actually 
on my todo list.

> Delay scheduling should be an individual policy instead of part of scheduler 
> implementation
> ---
>
> Key: YARN-7457
> URL: https://issues.apache.org/jira/browse/YARN-7457
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> Currently, different schedulers have slightly different delay scheduling 
> implementations. Ideally we should make delay scheduling independent from 
> scheduler implementation. Benefits of doing this:
> 1) Applications can choose which delay scheduling policy to use, it could be 
> time-based / missed-opportunistic-based or whatever new delay scheduling 
> policy supported by the cluster. Now it is global config of scheduler.
> 2) Make scheduler implementations simpler and reusable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7457) Delay scheduling should be an individual policy instead of part of scheduler implementation

2017-11-08 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244474#comment-16244474
 ] 

Wangda Tan commented on YARN-7457:
--

Sounds good :)

> Delay scheduling should be an individual policy instead of part of scheduler 
> implementation
> ---
>
> Key: YARN-7457
> URL: https://issues.apache.org/jira/browse/YARN-7457
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> Currently, different schedulers have slightly different delay scheduling 
> implementations. Ideally we should make delay scheduling independent from 
> scheduler implementation. Benefits of doing this:
> 1) Applications can choose which delay scheduling policy to use, it could be 
> time-based / missed-opportunistic-based or whatever new delay scheduling 
> policy supported by the cluster. Now it is global config of scheduler.
> 2) Make scheduler implementations simpler and reusable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3091) [Umbrella] Improve and fix locks of RM scheduler

2017-11-08 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244511#comment-16244511
 ] 

Wangda Tan commented on YARN-3091:
--

[~templedf]/[~jlowe],

The RW locks are introduced to solve multiple threads to look at container 
allocation problem. From my tests report: 
https://issues.apache.org/jira/secure/attachment/12831662/YARN-5139-Concurrent-scheduling-performance-report.pdf,
 it can get about 2.5X throughput improvement when we have 3 threads looking at 
scheduler at the same time comparing to single thread. 

I agree that some previous locking changes (such as 
YARN-3139/YARN-3140/YARN-3141) can definitely be improved. But I think if we 
change everything to simple reentrant lock may affect throughput when we have 
multiple threads to do allocation.



> [Umbrella] Improve and fix locks of RM scheduler
> 
>
> Key: YARN-3091
> URL: https://issues.apache.org/jira/browse/YARN-3091
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: capacityscheduler, fairscheduler, resourcemanager, 
> scheduler
>Reporter: Wangda Tan
>
> In existing YARN RM scheduler, there're some issues of using locks. For 
> example:
> - Many unnecessary synchronized locks, we have seen several cases recently 
> that too frequent access of scheduler makes scheduler hang. Which could be 
> addressed by using read/write lock. Components include scheduler, CS queues, 
> apps
> - Some fields not properly locked (Like clusterResource)
> We can address them together in this ticket.
> (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-7461) DominantResourceCalculator#ratio calculation problem when right resource contains zero value


 [ 
https://issues.apache.org/jira/browse/YARN-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton reassigned YARN-7461:
--

Assignee: Tao Yang  (was: Daniel Templeton)

> DominantResourceCalculator#ratio calculation problem when right resource 
> contains zero value
> 
>
> Key: YARN-7461
> URL: https://issues.apache.org/jira/browse/YARN-7461
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-7461.001.patch
>
>
> Currently DominantResourceCalculator#ratio may return wrong result when right 
> resource contains zero value. For example, there are three resource types 
> such as , leftResource=<5, 5, 0> and 
> rightResource=<10, 10, 0>, we expect the result of 
> DominantResourceCalculator#ratio(leftResource, rightResource) is 0.5 but 
> currently is NaN.
> There should be a verification before divide calculation to ensure that 
> dividend is not zero.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-7461) DominantResourceCalculator#ratio calculation problem when right resource contains zero value


 [ 
https://issues.apache.org/jira/browse/YARN-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton reassigned YARN-7461:
--

Assignee: Daniel Templeton

> DominantResourceCalculator#ratio calculation problem when right resource 
> contains zero value
> 
>
> Key: YARN-7461
> URL: https://issues.apache.org/jira/browse/YARN-7461
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Tao Yang
>Assignee: Daniel Templeton
>Priority: Minor
> Attachments: YARN-7461.001.patch
>
>
> Currently DominantResourceCalculator#ratio may return wrong result when right 
> resource contains zero value. For example, there are three resource types 
> such as , leftResource=<5, 5, 0> and 
> rightResource=<10, 10, 0>, we expect the result of 
> DominantResourceCalculator#ratio(leftResource, rightResource) is 0.5 but 
> currently is NaN.
> There should be a verification before divide calculation to ensure that 
> dividend is not zero.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7388) TestAMRestart should be scheduler agnostic


 [ 
https://issues.apache.org/jira/browse/YARN-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-7388:
-
Attachment: YARN-7388.01.patch

> TestAMRestart should be scheduler agnostic
> --
>
> Key: YARN-7388
> URL: https://issues.apache.org/jira/browse/YARN-7388
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7388.00.patch, YARN-7388.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7453) RM fail to switch to active after first successful start

2017-11-08 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-7453:

Attachment: YARN-7453.001.patch

Previous patch contains a little bit additional modifications, attached 
required only changes patch!

> RM fail to switch to active after first successful start
> 
>
> Key: YARN-7453
> URL: https://issues.apache.org/jira/browse/YARN-7453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-7453.001.patch, YARN-7453.001.patch
>
>
> It is observed that RM fail to switch to ACTIVE after first successful start! 
> The below exception throws when RM is switching from ACTIVE->STANDBY->ACTIVE. 
> This continues in loop!
> {noformat}
> 2017-11-07 15:08:11,664 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning 
> to active state
> 2017-11-07 15:08:11,669 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery 
> started
> 2017-11-07 15:08:11,669 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded 
> RM state version info 1.5
> 2017-11-07 15:08:11,670 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
> load/recover state
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
>   at 
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>   at 
> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
>   at 
> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:771)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1162)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1202)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1198)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1198)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7453) RM fail to switch to active after first successful start


[ 
https://issues.apache.org/jira/browse/YARN-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243862#comment-16243862
 ] 

Hadoop QA commented on YARN-7453:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 25s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 21 new + 37 unchanged - 0 fixed = 58 total (was 37) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 56m 
28s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}102m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7453 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12896632/YARN-7453.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f015a5df495e 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e4c220e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/18399/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/18399/testReport/ |
| Max. process+thread count | 866 (vs.

[jira] [Commented] (YARN-7440) Optimization to AM recovery when the service record doesn't exist for a container

2017-11-08 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244810#comment-16244810
 ] 

Chandni Singh commented on YARN-7440:
-

When a ServiceRecord doesn't exist for multiple containers belonging to the 
same component, then it is possible that a container is assigned to a different 
component instance when the AM recovers. 
Discussed this issue with [~jianhe] and [~billie.rinaldi] offline. Swapping 
containers to different component instances will cause naming conflicts inside 
the container process.  Currently we don't get the component name from the 
_Container_, so in order to implement this correctly we need to wait for 
https://issues.apache.org/jira/browse/YARN-6594.



> Optimization to AM recovery when the service record doesn't exist for a 
> container
> -
>
> Key: YARN-7440
> URL: https://issues.apache.org/jira/browse/YARN-7440
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
> Fix For: yarn-native-services
>
> Attachments: YARN-7440.001.patch, YARN-7440.002.patch
>
>
> When AM recovers, if the service record doesn’t exist for a container sent 
> from RM, it can re-query the container status from NM, today it will release 
> the container



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7430) User and Group mapping are incorrect in docker container


[ 
https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244825#comment-16244825
 ] 

Shane Kumpf edited comment on YARN-7430 at 11/8/17 10:14 PM:
-

{quote}
User foo should not allow to execute script owned by skumpf, unless skumpf 
granted permission to run the script
{quote}
User foo doesn't execute the script owned by skumpf if we pass the user skumpf. 
This is exactly how every container works today. We pass the user name and run 
the entrypoint in the container as this user, overriding what the image has 
set. This allows localization and logging to work. With the change to turn this 
off, we let the image decide, but only for privileged containers. The result is 
that any image that has "USER " in it, must be modified.

{quote}
--user=0:0 does not mean privileged. It means the entry point is granted with 
pseudo root privileges inside the container.
{quote}
Sorry, poorly worded. Do you think that the entry point process in a privileged 
container should always run as root? if so, we should enforce that by setting 
{{\-\-user=0:0}}.

I think there is a place for containers where we don't set the user, but for 
those types to work, we'd need to get rid of all mounts and avoid overriding 
the entrypoint ("vanilla containers").


was (Author: shaneku...@gmail.com):
{quote}
User foo should not allow to execute script owned by skumpf, unless skumpf 
granted permission to run the script
{quote}
User foo doesn't execute the script owned by skumpf if we pass the user skumpf. 
This is exactly how every container works today. We pass the user name and run 
the entrypoint in the container as this user, overriding what the image has 
set. This allows localization and logging to work. With the change to turn this 
off, we let the image decide, but only for privileged containers. The result is 
that any image that has "USER " in it, must be modified.

{quote}
--user=0:0 does not mean privileged. It means the entry point is granted with 
pseudo root privileges inside the container.
{quote}
Sorry, poorly worded. Do you think that the entry point process in a privileged 
container should always run as root? if so, we should enforce that by setting 
{{\-\-user=0:0}}.

I think there is a place for applications where we don't set the user, but for 
those types to work, we'd need to get rid of all mounts and avoid overriding 
the entrypoint ("vanilla containers").

> User and Group mapping are incorrect in docker container
> 
>
> Key: YARN-7430
> URL: https://issues.apache.org/jira/browse/YARN-7430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security, yarn
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7430.001.patch
>
>
> In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to 
> enforce user and group for the running user.  In YARN-6623, this translated 
> to --user=test --group-add=group1.  The code no longer enforce group 
> correctly for launched process.  
> In addition, the implementation in YARN-6623 requires the user and group 
> information to exist in container to translate username and group to uid/gid. 
>  For users on LDAP, there is no good way to populate container with user and 
> group information. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7399) Yarn services metadata storage improvement


 [ 
https://issues.apache.org/jira/browse/YARN-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7399:

Attachment: YARN-7399.png

See the attached diagram for the current implementation and proposed 
refinement.  This will reduce duplicated code for storing metadata, and support 
multiple storage type.

> Yarn services metadata storage improvement
> --
>
> Key: YARN-7399
> URL: https://issues.apache.org/jira/browse/YARN-7399
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-native-services
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7399.png
>
>
> In Slider, metadata is stored in user's home directory. Slider command line 
> interface interacts with HDFS directly to list deployed applications and 
> invoke YARN API or HDFS API to provide information to user. This design works 
> for a single user manage his/her own applications. When this design has been 
> ported to Yarn services, it becomes apparent that this design is difficult to 
> list all deployed applications on Hadoop cluster for administrator to manage 
> applications. Resource Manager needs to crawl through every user's home 
> directory to compile metadata about deployed applications. This can trigger 
> high load on namenode to list hundreds or thousands of list directory calls 
> owned by different users. Hence, it might be best to centralize the metadata 
> storage to Solr or HBase to reduce number of IO calls to namenode for manage 
> applications.
> In Slider, one application is composed of metainfo, specifications in json, 
> and payload of zip file that contains application code and deployment code. 
> Both meta information, and zip file payload are stored in the same 
> application directory in HDFS. This works well for distributed applications 
> without central application manager that oversee all application.
> In the next generation of application management, we like to centralize 
> metainfo and specifications in json to a centralized storage managed by YARN 
> user, and keep the payload zip file in user's home directory or in docker 
> registry. This arrangement can provide a faster lookup for metainfo when we 
> list all deployed applications and services on YARN dashboard.
> When we centralize metainfo to YARN user, we also need to build ACL to 
> enforce who can manage applications, and make update. The current proposal is:
> yarn.admin.acl - list of groups that can submit/reconfigure/pause/kill all 
> applications
> normal users - submit/reconfigure/pause/kill his/her own applications



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7166) Container REST endpoints should report resource types


[ 
https://issues.apache.org/jira/browse/YARN-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244860#comment-16244860
 ] 

Yufei Gu edited comment on YARN-7166 at 11/8/17 10:32 PM:
--

Look good to me generally. How about marking {{allocatedMB}} and 
{{allocatedVCores}} deprecated? 
Need a  space before {{Long}} in {{protected Map 
allocatedResources;}}


was (Author: yufeigu):
Look good to me generally. How about marking {{allocatedMB}} and 
{{allocatedVCores}} deprecated?

> Container REST endpoints should report resource types
> -
>
> Key: YARN-7166
> URL: https://issues.apache.org/jira/browse/YARN-7166
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-7166.YARN-3926.001.patch, 
> YARN-7166.YARN-3926.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container


[ 
https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244787#comment-16244787
 ] 

Eric Yang commented on YARN-7430:
-

[~shaneku...@gmail.com] . {quote}
I still believe there will be an issue if we do not specify --user. This causes 
problems for launching the container. Please try running distributed shell or 
similar using the Dockerfile I provided with --user removed, and you will see 
the behavior, the container will fail to launch.
{quote}

Container fails for the right reason.  User foo should not allow to execute 
script owned by skumpf, unless skumpf granted permission to run the script.

{quote}
IIUC, --privileged == --user=root (or --user=0:0) in your view, correct? If so, 
doing that would satisfy the condition here if we set the user to root for 
privileged containers. I see some cases where that isn't necessary and I'm 
unsure how it might impact log aggregation, but I think it could work.
{quote}

{{\-\-user=0:0}} does not mean privileged.  It means the entry point is granted 
with pseudo root privileges inside the container.  There is no guarantee that 
capability at host layer is granted.  The {{\-\-privileged}} flag gives all 
capabilities to the container, and it also lifts all the limitations enforced 
by the device cgroup controller. In other words, the container can then do 
almost everything that the host can do. This flag exists to allow special 
use-cases, like running Docker within Docker.  {{\-\-Privileged}} is more 
destructive than pseudo root that should be handled carefully.  System admin 
usually does not allow a user with sudo privileges to change resource 
utilization, hence I haven't seen a valid point to apply {{\-\-user}} flag on 
{{\-\-privileged}} containers.

> User and Group mapping are incorrect in docker container
> 
>
> Key: YARN-7430
> URL: https://issues.apache.org/jira/browse/YARN-7430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security, yarn
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7430.001.patch
>
>
> In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to 
> enforce user and group for the running user.  In YARN-6623, this translated 
> to --user=test --group-add=group1.  The code no longer enforce group 
> correctly for launched process.  
> In addition, the implementation in YARN-6623 requires the user and group 
> information to exist in container to translate username and group to uid/gid. 
>  For users on LDAP, there is no good way to populate container with user and 
> group information. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container

2017-11-08 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244811#comment-16244811
 ] 

Eric Badger commented on YARN-7430:
---

I don't see how running the container as root will work with log aggregation. 
Everything written inside of the container will be written to bind-mounted 
volumes as root, not as the user that submitted the job. This means that root 
will own all of these things once the container finishes. So I'm not sure how 
we can write logs correctly while also allowing escalated privilege inside the 
container. 

> User and Group mapping are incorrect in docker container
> 
>
> Key: YARN-7430
> URL: https://issues.apache.org/jira/browse/YARN-7430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security, yarn
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7430.001.patch
>
>
> In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to 
> enforce user and group for the running user.  In YARN-6623, this translated 
> to --user=test --group-add=group1.  The code no longer enforce group 
> correctly for launched process.  
> In addition, the implementation in YARN-6623 requires the user and group 
> information to exist in container to translate username and group to uid/gid. 
>  For users on LDAP, there is no good way to populate container with user and 
> group information. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7440) Optimization to AM recovery when the service record doesn't exist for a container

2017-11-08 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244822#comment-16244822
 ] 

Billie Rinaldi commented on YARN-7440:
--

Sounds good. Thanks, [~csingh]!

> Optimization to AM recovery when the service record doesn't exist for a 
> container
> -
>
> Key: YARN-7440
> URL: https://issues.apache.org/jira/browse/YARN-7440
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
> Fix For: yarn-native-services
>
> Attachments: YARN-7440.001.patch, YARN-7440.002.patch
>
>
> When AM recovers, if the service record doesn’t exist for a container sent 
> from RM, it can re-query the container status from NM, today it will release 
> the container



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container


[ 
https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244825#comment-16244825
 ] 

Shane Kumpf commented on YARN-7430:
---

{quote}
User foo should not allow to execute script owned by skumpf, unless skumpf 
granted permission to run the script
{quote}
User foo doesn't execute the script owned by skumpf if we pass the user skumpf. 
This is exactly how every container works today. We pass the user name and run 
the entrypoint in the container as this user, overriding what the image has 
set. This allows localization and logging to work. With the change to turn this 
off, we let the image decide, but only for privileged containers. The result is 
that any image that has "USER " in it, must be modified.

{quote}
--user=0:0 does not mean privileged. It means the entry point is granted with 
pseudo root privileges inside the container.
{quote}
Sorry, poorly worded. Do you think that the entry point process in a privileged 
container should always run as root? if so, we should enforce that by setting 
{{\-\-user=0:0}}.

I think there is a place for applications where we don't set the user, but for 
those types to work, we'd need to get rid of all mounts and avoid overriding 
the entrypoint ("vanilla containers").

> User and Group mapping are incorrect in docker container
> 
>
> Key: YARN-7430
> URL: https://issues.apache.org/jira/browse/YARN-7430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security, yarn
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7430.001.patch
>
>
> In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to 
> enforce user and group for the running user.  In YARN-6623, this translated 
> to --user=test --group-add=group1.  The code no longer enforce group 
> correctly for launched process.  
> In addition, the implementation in YARN-6623 requires the user and group 
> information to exist in container to translate username and group to uid/gid. 
>  For users on LDAP, there is no good way to populate container with user and 
> group information. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7388) TestAMRestart should be scheduler agnostic


[ 
https://issues.apache.org/jira/browse/YARN-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244832#comment-16244832
 ] 

Hadoop QA commented on YARN-7388:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 254 unchanged - 4 fixed = 254 total (was 258) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m  1s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}112m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
 |
| Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA |
|   | org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7388 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12896690/YARN-7388.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 59da72dbc653 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cb35a59 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_131 |
|

[jira] [Assigned] (YARN-7399) Yarn services metadata storage improvement


 [ 
https://issues.apache.org/jira/browse/YARN-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reassigned YARN-7399:
---

Assignee: Eric Yang

> Yarn services metadata storage improvement
> --
>
> Key: YARN-7399
> URL: https://issues.apache.org/jira/browse/YARN-7399
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-native-services
>Reporter: Eric Yang
>Assignee: Eric Yang
>
> In Slider, metadata is stored in user's home directory. Slider command line 
> interface interacts with HDFS directly to list deployed applications and 
> invoke YARN API or HDFS API to provide information to user. This design works 
> for a single user manage his/her own applications. When this design has been 
> ported to Yarn services, it becomes apparent that this design is difficult to 
> list all deployed applications on Hadoop cluster for administrator to manage 
> applications. Resource Manager needs to crawl through every user's home 
> directory to compile metadata about deployed applications. This can trigger 
> high load on namenode to list hundreds or thousands of list directory calls 
> owned by different users. Hence, it might be best to centralize the metadata 
> storage to Solr or HBase to reduce number of IO calls to namenode for manage 
> applications.
> In Slider, one application is composed of metainfo, specifications in json, 
> and payload of zip file that contains application code and deployment code. 
> Both meta information, and zip file payload are stored in the same 
> application directory in HDFS. This works well for distributed applications 
> without central application manager that oversee all application.
> In the next generation of application management, we like to centralize 
> metainfo and specifications in json to a centralized storage managed by YARN 
> user, and keep the payload zip file in user's home directory or in docker 
> registry. This arrangement can provide a faster lookup for metainfo when we 
> list all deployed applications and services on YARN dashboard.
> When we centralize metainfo to YARN user, we also need to build ACL to 
> enforce who can manage applications, and make update. The current proposal is:
> yarn.admin.acl - list of groups that can submit/reconfigure/pause/kill all 
> applications
> normal users - submit/reconfigure/pause/kill his/her own applications



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7166) Container REST endpoints should report resource types


[ 
https://issues.apache.org/jira/browse/YARN-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244860#comment-16244860
 ] 

Yufei Gu commented on YARN-7166:


Look good to me generally. How about marking {{allocatedMB}} and 
{{allocatedVCores}} deprecated?

> Container REST endpoints should report resource types
> -
>
> Key: YARN-7166
> URL: https://issues.apache.org/jira/browse/YARN-7166
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-7166.YARN-3926.001.patch, 
> YARN-7166.YARN-3926.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6128) Add support for AMRMProxy HA


[ 
https://issues.apache.org/jira/browse/YARN-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244668#comment-16244668
 ] 

Hadoop QA commented on YARN-6128:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  3m 
48s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
6s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 15s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
31s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
57s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
42s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 17s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 2 new + 313 unchanged - 0 fixed = 315 total (was 313) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
48s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
10s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
27s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 
48s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}

[jira] [Commented] (YARN-7440) Optimization to AM recovery when the service record doesn't exist for a container

2017-11-08 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244669#comment-16244669
 ] 

Chandni Singh commented on YARN-7440:
-

Seems like the test is failing because the previous container for the service 
master during _recovery_ has same {{allocationRequestId}} as one of the 
component containers. Either the {{allocationRequestId}} for the service master 
container should be different or we can check during recovery that if the 
container number is 1, then we just release it.

> Optimization to AM recovery when the service record doesn't exist for a 
> container
> -
>
> Key: YARN-7440
> URL: https://issues.apache.org/jira/browse/YARN-7440
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
> Fix For: yarn-native-services
>
> Attachments: YARN-7440.001.patch, YARN-7440.002.patch
>
>
> When AM recovers, if the service record doesn’t exist for a container sent 
> from RM, it can re-query the container status from NM, today it will release 
> the container



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7465) start-yarn.sh fails to start ResourceManager unless running as root

2017-11-08 Thread Sean Mackrory (JIRA)

Sean Mackrory created YARN-7465:
---

 Summary: start-yarn.sh fails to start ResourceManager unless 
running as root
 Key: YARN-7465
 URL: https://issues.apache.org/jira/browse/YARN-7465
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 3.1.0
Reporter: Sean Mackrory
Priority: Blocker


This was found when testing rolling upgrades in HDFS-11096. It manifests as the 
following:

{quote}Starting resourcemanagers on [ container-8.docker container-9.docker]
/home/hadoop/hadoop-3.0.0-SNAPSHOT/sbin/../libexec/hadoop-functions.sh: line 
298: --config: command not found{quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7465) start-yarn.sh fails to start ResourceManager unless running as root

2017-11-08 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated YARN-7465:

Attachment: YARN-7465.001.patch

> start-yarn.sh fails to start ResourceManager unless running as root
> ---
>
> Key: YARN-7465
> URL: https://issues.apache.org/jira/browse/YARN-7465
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Sean Mackrory
>Priority: Blocker
> Attachments: YARN-7465.001.patch
>
>
> This was found when testing rolling upgrades in HDFS-11096. It manifests as 
> the following:
> {quote}Starting resourcemanagers on [ container-8.docker container-9.docker]
> /home/hadoop/hadoop-3.0.0-SNAPSHOT/sbin/../libexec/hadoop-functions.sh: line 
> 298: --config: command not found{quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7166) Container REST endpoints should report resource types


[ 
https://issues.apache.org/jira/browse/YARN-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244911#comment-16244911
 ] 

Yufei Gu commented on YARN-7166:


+1. Pending for Jenkins.

> Container REST endpoints should report resource types
> -
>
> Key: YARN-7166
> URL: https://issues.apache.org/jira/browse/YARN-7166
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-7166.003.patch, YARN-7166.YARN-3926.001.patch, 
> YARN-7166.YARN-3926.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container

2017-11-08 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244910#comment-16244910
 ] 

Eric Badger commented on YARN-7430:
---

{quote}
For users on LDAP, there is no good way to populate container with user and 
group information.
{quote}
Additionally, bind-mounting the /var/run/nscd will allow the container to use 
the host's ldap configuration to lookup users. That way, there won't be a cache 
miss everytime a new container is started up. We could setup each container to 
correctly use ldap, but that sounds like a waste because of all of the hits on 
the ldap server. That's why entering the container as a uid:gid pair will give 
you the username even if they don't exist in the image. Otherwise, the uid:gid 
pair won't have an associated username and the MRAppMaster will fail. This was 
discussed shortly in 
[YARN-4266|https://issues.apache.org/jira/browse/YARN-4266?focusedCommentId=16076756=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16076756]

> User and Group mapping are incorrect in docker container
> 
>
> Key: YARN-7430
> URL: https://issues.apache.org/jira/browse/YARN-7430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security, yarn
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7430.001.patch
>
>
> In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to 
> enforce user and group for the running user.  In YARN-6623, this translated 
> to --user=test --group-add=group1.  The code no longer enforce group 
> correctly for launched process.  
> In addition, the implementation in YARN-6623 requires the user and group 
> information to exist in container to translate username and group to uid/gid. 
>  For users on LDAP, there is no good way to populate container with user and 
> group information. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7143) FileNotFound handling in ResourceUtils is inconsistent


[ 
https://issues.apache.org/jira/browse/YARN-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245019#comment-16245019
 ] 

Hadoop QA commented on YARN-7143:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-7143 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-7143 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12896772/YARN-7143.003.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/18408/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> FileNotFound handling in ResourceUtils is inconsistent
> --
>
> Key: YARN-7143
> URL: https://issues.apache.org/jira/browse/YARN-7143
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-7143.002.patch, YARN-7143.003.patch, 
> YARN-7143.YARN-3926.001.patch
>
>
> When loading the resource-types.xml file, we warn and move on if it's not 
> found.  When loading the node-resource.xml file, we abort loading resource 
> types if the file isn't found.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7166) Container REST endpoints should report resource types


[ 
https://issues.apache.org/jira/browse/YARN-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244892#comment-16244892
 ] 

Daniel Templeton commented on YARN-7166:


I don't think that's needed.  CPU and memory are accessed frequently enough 
that they deserve dedicated variables and methods.  Maybe later after resource 
types has settled in a bit more...

> Container REST endpoints should report resource types
> -
>
> Key: YARN-7166
> URL: https://issues.apache.org/jira/browse/YARN-7166
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-7166.YARN-3926.001.patch, 
> YARN-7166.YARN-3926.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7166) Container REST endpoints should report resource types


 [ 
https://issues.apache.org/jira/browse/YARN-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-7166:
---
Attachment: YARN-7166.003.patch

> Container REST endpoints should report resource types
> -
>
> Key: YARN-7166
> URL: https://issues.apache.org/jira/browse/YARN-7166
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-7166.003.patch, YARN-7166.YARN-3926.001.patch, 
> YARN-7166.YARN-3926.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7166) Container REST endpoints should report resource types


[ 
https://issues.apache.org/jira/browse/YARN-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244955#comment-16244955
 ] 

Hadoop QA commented on YARN-7166:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 13s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: 
The patch generated 1 new + 15 unchanged - 0 fixed = 16 total (was 15) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
2s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 44m 55s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7166 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12896757/YARN-7166.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 38f42c154e16 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cb35a59 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/18405/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/18405/testReport/ |
| Max. process+thread count | 432 (vs. ulimit of 5000) |
| modules | C:

[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container


[ 
https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244970#comment-16244970
 ] 

Eric Yang commented on YARN-7430:
-

[~ebadger] They are two separate problems.  A lot of conversation here belongs 
to YARN-7446.  This issue is to tackle the problem that we have a implicit 
privilege escalation security hole in the default shipped configuration when 
the following condition is met:

# Privileged container is enabled.
# Deploy docker container with user mapping to a different uid:gid than host 
OS, or using a numeric username to launch app.
# Data output from container is written with as someone else or root group.

In summary, to prevent privileges escalation, we should always pass in primary 
group to improve security.


> User and Group mapping are incorrect in docker container
> 
>
> Key: YARN-7430
> URL: https://issues.apache.org/jira/browse/YARN-7430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security, yarn
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7430.001.patch
>
>
> In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to 
> enforce user and group for the running user.  In YARN-6623, this translated 
> to --user=test --group-add=group1.  The code no longer enforce group 
> correctly for launched process.  
> In addition, the implementation in YARN-6623 requires the user and group 
> information to exist in container to translate username and group to uid/gid. 
>  For users on LDAP, there is no good way to populate container with user and 
> group information. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7466) ResourceRequest has a different default for allocationRequestId than Container

2017-11-08 Thread Chandni Singh (JIRA)

Chandni Singh created YARN-7466:
---

 Summary: ResourceRequest has a different default for 
allocationRequestId than Container
 Key: YARN-7466
 URL: https://issues.apache.org/jira/browse/YARN-7466
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chandni Singh
Assignee: Chandni Singh


The default value of allocationRequestId is inconsistent.
It is  -1 in {{ContainerProto}} but 0 in {{ResourceRequestProto}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7437) Give SchedulingPlacementSet to a better name.

2017-11-08 Thread Konstantinos Karanasos (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245000#comment-16245000
 ] 

Konstantinos Karanasos commented on YARN-7437:
--

Thanks, [~leftnoteasy]! Looks good, will commit it to trunk shortly.

> Give SchedulingPlacementSet to a better name.
> -
>
> Key: YARN-7437
> URL: https://issues.apache.org/jira/browse/YARN-7437
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-7437.001.patch, YARN-7437.002.patch, 
> YARN-7437.003.patch, YARN-7437.004.patch
>
>
> Currently, the SchedulingPlacementSet is very confusing. Here're its 
> responsibilities:
> 1) Store ResourceRequests. (Or SchedulingRequest after YARN-6592).
> 2) Decide order of nodes to allocate when there're multiple node candidates.
> 3) Decide if we should reject node for given requests.
> 4) Store any states/cache can help make decision for #2/#3



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7143) FileNotFound handling in ResourceUtils is inconsistent


 [ 
https://issues.apache.org/jira/browse/YARN-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-7143:
---
Attachment: YARN-7143.003.patch

Good point.

> FileNotFound handling in ResourceUtils is inconsistent
> --
>
> Key: YARN-7143
> URL: https://issues.apache.org/jira/browse/YARN-7143
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-7143.002.patch, YARN-7143.003.patch, 
> YARN-7143.YARN-3926.001.patch
>
>
> When loading the resource-types.xml file, we warn and move on if it's not 
> found.  When loading the node-resource.xml file, we abort loading resource 
> types if the file isn't found.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7399) Yarn services metadata storage improvement


[ 
https://issues.apache.org/jira/browse/YARN-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244883#comment-16244883
 ] 

Eric Yang commented on YARN-7399:
-

The purpose of metadata storage API is to provide a low latency, simple 
key/value lookup for Yarnfiles.  We will call this api "application catalog" as 
a generic term to represent this function.  The feature of application catalog 
are:

1.  Register an application record for deployment.
2.  Update configuration of existing application.
3.  Decommission an application record.
4.  Retrieve information about the application record.
5.  Search application record by user, or application name.


> Yarn services metadata storage improvement
> --
>
> Key: YARN-7399
> URL: https://issues.apache.org/jira/browse/YARN-7399
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-native-services
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7399.png
>
>
> In Slider, metadata is stored in user's home directory. Slider command line 
> interface interacts with HDFS directly to list deployed applications and 
> invoke YARN API or HDFS API to provide information to user. This design works 
> for a single user manage his/her own applications. When this design has been 
> ported to Yarn services, it becomes apparent that this design is difficult to 
> list all deployed applications on Hadoop cluster for administrator to manage 
> applications. Resource Manager needs to crawl through every user's home 
> directory to compile metadata about deployed applications. This can trigger 
> high load on namenode to list hundreds or thousands of list directory calls 
> owned by different users. Hence, it might be best to centralize the metadata 
> storage to Solr or HBase to reduce number of IO calls to namenode for manage 
> applications.
> In Slider, one application is composed of metainfo, specifications in json, 
> and payload of zip file that contains application code and deployment code. 
> Both meta information, and zip file payload are stored in the same 
> application directory in HDFS. This works well for distributed applications 
> without central application manager that oversee all application.
> In the next generation of application management, we like to centralize 
> metainfo and specifications in json to a centralized storage managed by YARN 
> user, and keep the payload zip file in user's home directory or in docker 
> registry. This arrangement can provide a faster lookup for metainfo when we 
> list all deployed applications and services on YARN dashboard.
> When we centralize metainfo to YARN user, we also need to build ACL to 
> enforce who can manage applications, and make update. The current proposal is:
> yarn.admin.acl - list of groups that can submit/reconfigure/pause/kill all 
> applications
> normal users - submit/reconfigure/pause/kill his/her own applications



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7465) start-yarn.sh fails to start ResourceManager unless running as root

2017-11-08 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244896#comment-16244896
 ] 

Sean Mackrory commented on YARN-7465:
-

I suspect Yetus will complain that there are no tests - but this is a trivial 
typo introduced by a major rewrite of the script that is caught by the tests 
I'm trying to commit in HDFS-11096.

> start-yarn.sh fails to start ResourceManager unless running as root
> ---
>
> Key: YARN-7465
> URL: https://issues.apache.org/jira/browse/YARN-7465
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Sean Mackrory
>Priority: Blocker
> Attachments: YARN-7465.001.patch
>
>
> This was found when testing rolling upgrades in HDFS-11096. It manifests as 
> the following:
> {quote}Starting resourcemanagers on [ container-8.docker container-9.docker]
> /home/hadoop/hadoop-3.0.0-SNAPSHOT/sbin/../libexec/hadoop-functions.sh: line 
> 298: --config: command not found{quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7458) TestContainerManagerSecurity is still flakey


[ 
https://issues.apache.org/jira/browse/YARN-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244901#comment-16244901
 ] 

Daniel Templeton commented on YARN-7458:


Couple more issues to fix while you're in there:

# Probably safer to call {{ContainerState.COMPLETE.equals(...)}} on L416-417
# That catch on L421 is bad.  It means that if we interrupt this test, it will 
ignore it and keep waiting.  Probably better to put the catch outside the loop.

> TestContainerManagerSecurity is still flakey
> 
>
> Key: YARN-7458
> URL: https://issues.apache.org/jira/browse/YARN-7458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-7458.001.patch
>
>
> YARN-6150 made this less flakey, but we're still seeing an occasional issue 
> here:
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:420)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:356)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:167)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container


[ 
https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244948#comment-16244948
 ] 

Eric Yang commented on YARN-7430:
-

[~ebadger] In a unix box, when a user run sudo commands, all logs are written 
to syslog or /var/log/messages.  They are owned by root.  There are Enterprise 
log aggregation tools that can search and filter out segment of syslog and 
/var/log/messages belong to certain user by using terminal id, and audit id.  
The log viewer identify user base on terminal id, and audit id to determine if 
user have rights to see the log.  Hadoop doesn't have to be different from 
existing design.  

The information generated by root container should belong to root in the event 
user is revoked of sudo rights.  He will not have access to the logs later.  
Docker console output is already appended to container log if we don't detach 
container, then all logs goes into container log.  Therefore, we have logs that 
is compiled with application id and container id.  We have information 
available to determine if the user is allowed to see the logs.

What log aggregation are we doing in addition to capture the docker console 
output?

If the application is writing to file system directly without tracking, there 
will be no accurate way to identify the origin of the log.  However, this is 
not a special case.  This problem exist today for any shared service user, and 
it is up to the developer to generate logs that either have user name/host name 
in the log filename to support log tracking.  I am not clear on how removing 
{{\-\-user}} flag would result in log aggregation not working.  Could you 
clarify?

[~shaneku...@gmail.com] . If passing --user=0.0 with --privileged flag can keep 
log aggregation to work.  I have no objection with this.  Is there a design of 
how log aggregation works for Yarn Services which is different from classic 
yarn containers?

> User and Group mapping are incorrect in docker container
> 
>
> Key: YARN-7430
> URL: https://issues.apache.org/jira/browse/YARN-7430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security, yarn
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7430.001.patch
>
>
> In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to 
> enforce user and group for the running user.  In YARN-6623, this translated 
> to --user=test --group-add=group1.  The code no longer enforce group 
> correctly for launched process.  
> In addition, the implementation in YARN-6623 requires the user and group 
> information to exist in container to translate username and group to uid/gid. 
>  For users on LDAP, there is no good way to populate container with user and 
> group information. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7430) User and Group mapping are incorrect in docker container


[ 
https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244970#comment-16244970
 ] 

Eric Yang edited comment on YARN-7430 at 11/9/17 12:13 AM:
---

[~ebadger] They are two separate problems.  A lot of conversation here belongs 
to YARN-7446.  This issue is to tackle the problem that we have a implicit 
privilege escalation security hole in the default shipped configuration when 
the following condition is met:

# Privileged container is enabled.
# Deploy docker container with user mapping to a different uid:gid than host 
OS, or using a numeric username to launch app.
# Data output from container is written as someone else or with root group 
ownership.

In summary, to prevent privileges escalation, we should always pass in primary 
group to improve security.



was (Author: eyang):
[~ebadger] They are two separate problems.  A lot of conversation here belongs 
to YARN-7446.  This issue is to tackle the problem that we have a implicit 
privilege escalation security hole in the default shipped configuration when 
the following condition is met:

# Privileged container is enabled.
# Deploy docker container with user mapping to a different uid:gid than host 
OS, or using a numeric username to launch app.
# Data output from container is written with as someone else or root group.

In summary, to prevent privileges escalation, we should always pass in primary 
group to improve security.


> User and Group mapping are incorrect in docker container
> 
>
> Key: YARN-7430
> URL: https://issues.apache.org/jira/browse/YARN-7430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security, yarn
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7430.001.patch
>
>
> In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to 
> enforce user and group for the running user.  In YARN-6623, this translated 
> to --user=test --group-add=group1.  The code no longer enforce group 
> correctly for launched process.  
> In addition, the implementation in YARN-6623 requires the user and group 
> information to exist in container to translate username and group to uid/gid. 
>  For users on LDAP, there is no good way to populate container with user and 
> group information. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container


[ 
https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245016#comment-16245016
 ] 

Shane Kumpf commented on YARN-7430:
---

IMO, I think this issue can be closed as invalid. Most of this does belong in 
YARN-7446 regarding the use of {{\-\-user}} and {{\-\-privileged}}, sorry for 
derailing the conversation.

> User and Group mapping are incorrect in docker container
> 
>
> Key: YARN-7430
> URL: https://issues.apache.org/jira/browse/YARN-7430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security, yarn
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7430.001.patch
>
>
> In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to 
> enforce user and group for the running user.  In YARN-6623, this translated 
> to --user=test --group-add=group1.  The code no longer enforce group 
> correctly for launched process.  
> In addition, the implementation in YARN-6623 requires the user and group 
> information to exist in container to translate username and group to uid/gid. 
>  For users on LDAP, there is no good way to populate container with user and 
> group information. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7386) Duplicate Strings in various places in Yarn memory

2017-11-08 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245027#comment-16245027
 ] 

Robert Kanter commented on YARN-7386:
-

The patch looks good to me.  The Jenkins is too old and the details lost, so 
I've kicked off another run.

> Duplicate Strings in various places in Yarn memory
> --
>
> Key: YARN-7386
> URL: https://issues.apache.org/jira/browse/YARN-7386
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: YARN-7386.01.patch, YARN-7386.02.patch
>
>
> Using jxray (www.jxray.com) I've analyzed a Yarn RM heap dump obtained in a 
> big cluster. The tool uncovered several sources of memory waste. One problem 
> is duplicate strings:
> {code}
> Total strings   Unique strings  Duplicate values   
> Overhead 
>  361,506   86,672  5,928  22,886K (7.6%)
> {code}
> They are spread across a number of locations. The biggest source of waste is 
> the following reference chain:
> {code}
> 7,416K (2.5%), 31292 / 62% dup strings (499 unique), 31292 dup backing arrays:
> ↖{j.u.HashMap}.values
> ↖org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.environment
> ↖org.apache.hadoop.yarn.api.records.impl.pb.ApplicationSubmissionContextPBImpl.amContainer
> ↖org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.submissionContext
> ↖{java.util.concurrent.ConcurrentHashMap}.values
> ↖org.apache.hadoop.yarn.server.resourcemanager.RMActiveServiceContext.applications
> ↖org.apache.hadoop.yarn.server.resourcemanager.RMContextImpl.activeServiceContext
> ↖org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor.rmContext
> ↖Java Local@3ed9ef820 
> (org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor)
> {code}
> However, there are also many others. Mostly they are strings in proto buffer 
> or proto buffer builder objects. I plan to get rid of at least the worst 
> offenders by inserting String.intern() calls. String.intern() used to consume 
> memory in PermGen and was not very scalable up until about the early JDK 7 
> versions, but has greatly improved since then, and I've used it many times 
> without any issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7430) User and Group mapping are incorrect in docker container

2017-11-08 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244915#comment-16244915
 ] 

Eric Badger commented on YARN-7430:
---

Also, this conversation seems to have morphed into a dup of YARN-7446. Are 
there 2 distinct issues here or should we close one as a dup of the other?

> User and Group mapping are incorrect in docker container
> 
>
> Key: YARN-7430
> URL: https://issues.apache.org/jira/browse/YARN-7430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: security, yarn
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-7430.001.patch
>
>
> In YARN-4266, the recommendation was to use -u [uid]:[gid] numeric values to 
> enforce user and group for the running user.  In YARN-6623, this translated 
> to --user=test --group-add=group1.  The code no longer enforce group 
> correctly for launched process.  
> In addition, the implementation in YARN-6623 requires the user and group 
> information to exist in container to translate username and group to uid/gid. 
>  For users on LDAP, there is no good way to populate container with user and 
> group information. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7143) FileNotFound handling in ResourceUtils is inconsistent


[ 
https://issues.apache.org/jira/browse/YARN-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244926#comment-16244926
 ] 

Yufei Gu commented on YARN-7143:


Looks good to me generally. Only one thing, the new {{initializedResources = 
true;}} isn't necessary since {{initializeResourcesMap()}} does that anyway.

> FileNotFound handling in ResourceUtils is inconsistent
> --
>
> Key: YARN-7143
> URL: https://issues.apache.org/jira/browse/YARN-7143
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-7143.002.patch, YARN-7143.YARN-3926.001.patch
>
>
> When loading the resource-types.xml file, we warn and move on if it's not 
> found.  When loading the node-resource.xml file, we abort loading resource 
> types if the file isn't found.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7458) TestContainerManagerSecurity is still flakey

2017-11-08 Thread Robert Kanter (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-7458:

Attachment: YARN-7458.002.patch

Thanks for the reviews.  That all makes sense.  Uploading 002 patch:
- Replaces custom loop with {{GenericTestUitls#waitFor}} and lowered the check 
interval to 10msec.  This also makes it fail the test if the loop expires 
before the container completes and fixes the interrupt issue.
- Reversed the {{equals}} call
- Improved the log message to also print out the current container state for 
easier debuggability

> TestContainerManagerSecurity is still flakey
> 
>
> Key: YARN-7458
> URL: https://issues.apache.org/jira/browse/YARN-7458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-7458.001.patch, YARN-7458.002.patch
>
>
> YARN-6150 made this less flakey, but we're still seeing an occasional issue 
> here:
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:420)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:356)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:167)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7466) ResourceRequest has a different default for allocationRequestId than Container

2017-11-08 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244988#comment-16244988
 ] 

Jian He commented on YARN-7466:
---

[~leftnoteasy], [~subru], opinion on this ? we should make it consistent ?

> ResourceRequest has a different default for allocationRequestId than Container
> --
>
> Key: YARN-7466
> URL: https://issues.apache.org/jira/browse/YARN-7466
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>
> The default value of allocationRequestId is inconsistent.
> It is  -1 in {{ContainerProto}} but 0 in {{ResourceRequestProto}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7465) start-yarn.sh fails to start ResourceManager unless running as root


[ 
https://issues.apache.org/jira/browse/YARN-7465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245012#comment-16245012
 ] 

Hadoop QA commented on YARN-7465:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
13s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
50s{color} | {color:green} hadoop-yarn in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 56m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7465 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12896758/YARN-7465.001.patch |
| Optional Tests |  asflicense  mvnsite  unit  shellcheck  shelldocs  |
| uname | Linux 7093dba60e35 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 
18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cb35a59 |
| maven | version: Apache Maven 3.3.9 |
| shellcheck | v0.4.6 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/18406/testReport/ |
| Max. process+thread count | 339 (vs. ulimit of 5000) |
| modules | C: hadoop-yarn-project/hadoop-yarn U: 
hadoop-yarn-project/hadoop-yarn |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/18406/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> start-yarn.sh fails to start ResourceManager unless running as root
> ---
>
> Key: YARN-7465
> URL: https://issues.apache.org/jira/browse/YARN-7465
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Sean Mackrory
>Priority: Blocker
> Attachments: YARN-7465.001.patch
>
>
> This was found when testing rolling upgrades in HDFS-11096. It manifests as 
> the following:
> {quote}Starting resourcemanagers on [ container-8.docker container-9.docker]
> /home/hadoop/hadoop-3.0.0-SNAPSHOT/sbin/../libexec/hadoop-functions.sh: line 
> 298: --config: command not found{quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7458) TestContainerManagerSecurity is still flakey


[ 
https://issues.apache.org/jira/browse/YARN-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245021#comment-16245021
 ] 

Daniel Templeton commented on YARN-7458:


That's a lot of info level logging!  Do we need that message printed every 10ms?

> TestContainerManagerSecurity is still flakey
> 
>
> Key: YARN-7458
> URL: https://issues.apache.org/jira/browse/YARN-7458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-7458.001.patch, YARN-7458.002.patch
>
>
> YARN-6150 made this less flakey, but we're still seeing an occasional issue 
> here:
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:420)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:356)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:167)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7455) add_mounts can overrun temporary buffer


[ 
https://issues.apache.org/jira/browse/YARN-7455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244579#comment-16244579
 ] 

Eric Yang commented on YARN-7455:
-

There is a max size check in add_mounts to prevent buffer overflow.  The 
current size can contain source and target path of 510 characters deep.  Do we 
want to double it?  Given that we don't add black list into the tmp_buffer, do 
we still need this?

> add_mounts can overrun temporary buffer
> ---
>
> Key: YARN-7455
> URL: https://issues.apache.org/jira/browse/YARN-7455
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Jason Lowe
>
> While reviewing YARN-7197 I noticed that add_mounts in docker_util.c has a 
> potential buffer overflow since tmp_buffer is only 1024 bytes which may not 
> be sufficient to hold the specified mount path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7388) TestAMRestart should be scheduler agnostic


[ 
https://issues.apache.org/jira/browse/YARN-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244605#comment-16244605
 ] 

Hadoop QA commented on YARN-7388:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  9m 
48s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 57s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 254 unchanged - 4 fixed = 254 total (was 258) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  0s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 13s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}114m 32s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7388 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12896690/YARN-7388.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 97520fcb9a7b 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cb35a59 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/18402/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/18402/testReport/ |
| Max. process+thread count | 853 (vs. ulimit of 5000) |
| modules | C:

[jira] [Commented] (YARN-7419) Implement Auto Queue Creation with modifications to queue mapping flow

2017-11-08 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244601#comment-16244601
 ] 

Wangda Tan commented on YARN-7419:
--

Thanks [~suma.shivaprasad] for updating the patch, more comments:

1) CapacityScheduler: 
1.1 Instead of fetching ApplicationPlacementContext from RMApp (to avoid 
perf/locking issue and code flow becomes more clear), you can add the 
ApplicationPlacementContext to {{AppAddedSchedulerEvent}}. And 
getPlacementContext API can be removed from RMApp.

1.2 Move following if ... to addApplication:
{code}
if (placementContext != null) {
  // ...
}
{code}
Like, 
{code}
if (queue == null && placementContext != null) {
  //Could be a potential auto-created leaf queue
}
{code}
Only enter the autoCreateLeafQueue function when necessary.

1.3 Following two catch exception can be merged by using {{YarnException | 
IOException e}} syntax.
{code}
catch (YarnException e) {
  LOG.error("Could not auto-create leaf queue due to : ", e);
  final String message =
  "Application " + applicationId + " submission by user " + user
  + " to  queue: " + queueName + " failed : " + e.getMessage();
  this.rmContext.getDispatcher().getEventHandler().handle(
  new RMAppEvent(applicationId, RMAppEventType.APP_REJECTED,
  message));
} catch (IOException e) {
  final String message =
  "Application " + applicationId + " submission by user " + user
  + " to  queue: " + queueName + " failed : " + e.getMessage();
  LOG.error("Could not auto-create leaf queue due to : ", e);
  this.rmContext.getDispatcher().getEventHandler().handle(
  new RMAppEvent(applicationId, RMAppEventType.APP_REJECTED,
  message));
}
{code}

1.4 Following message is not clear enough:
{code}
  String message =
  "Application " + applicationId + " submission by user " + user
  + " to queue: " + queueName + " failed : "
  + "Queue mapping does not exist for user";
{code} 
It should say directly specify a autocreated queue name is prohibited, it has 
to be automatically mapped, etc. 

1.5 I'm not sure if this check is necessary, I think previous logics should be 
enough to detect this correct? 
{code}
else if (!queue.getParent().getQueueName().equals(
placementContext.getParentQueue())) {
  String message =
  "Auto created Leaf queue " + placementContext.getQueue() + " 
already exists under " + queue
  .getParent().getQueuePath()
  + ".But Queue mapping has a different parent queue "
  + placementContext.getParentQueue()
  + " for the specified user : " + user;
  this.rmContext.getDispatcher().getEventHandler().handle(
  new RMAppEvent(applicationId, RMAppEventType.APP_REJECTED,
  message));
  return;
} 
{code}

1.6 clock is still here, move to a separate patch?

2) CapacitySchedulerConfiguration:
- getQueuePlacementRules is unused. 
- Make sure all new added methods/fields are {{@Private}}
- {{FAIL_AUTO_CREATION_ON_EXCEEDING_CAPACITY}} is this necessary? Should we 
just fail leaf queue creation when it exceeds parent queue's limit? 

Renames: 
- AutoCreatedLeafQueueTemplate.Builder#capacity => capacities 

Unecesssary changes:
- CapacitySchedulerContext
- AbstractCSQueue

Miscs:
- Did you accidentally included YARN-6124 in this patch? Could you revert that 
part? 

> Implement Auto Queue Creation with modifications to queue mapping flow
> --
>
> Key: YARN-7419
> URL: https://issues.apache.org/jira/browse/YARN-7419
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
> Attachments: YARN-7419.1.patch, YARN-7419.2.patch, YARN-7419.3.patch, 
> YARN-7419.patch
>
>
> This involves changes to queue mapping flow to pass along context information 
> for auto queue creation. Auto creation of queues will be part of Capacity 
> Scheduler flow while attempting to resolve queues during application 
> submission. The leaf queues which do not exist are auto created under parent 
> queues which have been explicitly enabled for auto queue creation . In order 
> to determine which parent queue to create the leaf queues under - parent 
> queues need to be specified in queue mapping configuration 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7388) TestAMRestart should be scheduler agnostic


[ 
https://issues.apache.org/jira/browse/YARN-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244622#comment-16244622
 ] 

Haibo Chen commented on YARN-7388:
--

I believe the OOM-led test failures are unrelated, let me retrigger the jenkins 
to double check

> TestAMRestart should be scheduler agnostic
> --
>
> Key: YARN-7388
> URL: https://issues.apache.org/jira/browse/YARN-7388
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7388.00.patch, YARN-7388.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-7425) Failed to renew delegation token when RM user's TGT is expired

2017-11-08 Thread Shen Yinjie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie resolved YARN-7425.
---
Resolution: Won't Fix

> Failed to renew delegation token  when RM user's TGT is expired
> ---
>
> Key: YARN-7425
> URL: https://issues.apache.org/jira/browse/YARN-7425
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.2
>Reporter: Shen Yinjie
>Assignee: Shen Yinjie
>Priority: Critical
> Attachments: rm_log.png
>
>
> we have a secure hadoop cluster with namenode federation.
> submit job fails after kerberos TGT maxLifeTime expired(default 24h), client 
> log shows" failed to renew token: HDFS_DELEGATION_TOKEN...".
> check rm log, found rm tgt is expired but not triggers relogin(),just retry 
> and fail...
> (rm log see screenshot)
> digging in code:
> when rm tries to renewToken(),
> UserGroupInformation.getLoginUser()="rm",
> but UserGroupInformation.getCurrentUser()="testUser".
> this causes Client.shouldAuthenticateOverKrb() returns false, thus cant 
> trigger reloginFromKeytab() or reloginFromTicketCache().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7406) Moving logging APIs over to slf4j in hadoop-yarn-api

2017-11-08 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245217#comment-16245217
 ] 

Bibin A Chundatt commented on YARN-7406:


[~Cyl] Thank you for confirming .Still wondering how did i miss 
{{ResourceUtils}}.
Will commit patch today.

> Moving logging APIs over to slf4j in hadoop-yarn-api
> 
>
> Key: YARN-7406
> URL: https://issues.apache.org/jira/browse/YARN-7406
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Yeliang Cang
>Assignee: Yeliang Cang
> Attachments: YARN-7406.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7143) FileNotFound handling in ResourceUtils is inconsistent


 [ 
https://issues.apache.org/jira/browse/YARN-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-7143:
---
Attachment: YARN-7143.003.patch

> FileNotFound handling in ResourceUtils is inconsistent
> --
>
> Key: YARN-7143
> URL: https://issues.apache.org/jira/browse/YARN-7143
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-7143.002.patch, YARN-7143.003.patch, 
> YARN-7143.YARN-3926.001.patch
>
>
> When loading the resource-types.xml file, we warn and move on if it's not 
> found.  When loading the node-resource.xml file, we abort loading resource 
> types if the file isn't found.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7143) FileNotFound handling in ResourceUtils is inconsistent


 [ 
https://issues.apache.org/jira/browse/YARN-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-7143:
---
Attachment: (was: YARN-7143.003.patch)

> FileNotFound handling in ResourceUtils is inconsistent
> --
>
> Key: YARN-7143
> URL: https://issues.apache.org/jira/browse/YARN-7143
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-7143.002.patch, YARN-7143.003.patch, 
> YARN-7143.YARN-3926.001.patch
>
>
> When loading the resource-types.xml file, we warn and move on if it's not 
> found.  When loading the node-resource.xml file, we abort loading resource 
> types if the file isn't found.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7437) Give SchedulingPlacementSet to a better name.

2017-11-08 Thread Konstantinos Karanasos (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Karanasos updated YARN-7437:
-
Attachment: YARN-7437.005.patch

Fixed some checkstyle issues before committing. Uploading patch here first to 
make sure Jenkins is OK.

> Give SchedulingPlacementSet to a better name.
> -
>
> Key: YARN-7437
> URL: https://issues.apache.org/jira/browse/YARN-7437
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-7437.001.patch, YARN-7437.002.patch, 
> YARN-7437.003.patch, YARN-7437.004.patch, YARN-7437.005.patch
>
>
> Currently, the SchedulingPlacementSet is very confusing. Here're its 
> responsibilities:
> 1) Store ResourceRequests. (Or SchedulingRequest after YARN-6592).
> 2) Decide order of nodes to allocate when there're multiple node candidates.
> 3) Decide if we should reject node for given requests.
> 4) Store any states/cache can help make decision for #2/#3



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7413) Support resource type in SLS


 [ 
https://issues.apache.org/jira/browse/YARN-7413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-7413:
---
Attachment: YARN-7413.003.patch

Uploaded patch v3 for the style issues and whitespace issues.

> Support resource type in SLS
> 
>
> Key: YARN-7413
> URL: https://issues.apache.org/jira/browse/YARN-7413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7413.001.patch, YARN-7413.002.patch, 
> YARN-7413.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7388) TestAMRestart should be scheduler agnostic


[ 
https://issues.apache.org/jira/browse/YARN-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245042#comment-16245042
 ] 

Haibo Chen commented on YARN-7388:
--

The unit test failure is unrelated, tracked at YARN-5684

> TestAMRestart should be scheduler agnostic
> --
>
> Key: YARN-7388
> URL: https://issues.apache.org/jira/browse/YARN-7388
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7388.00.patch, YARN-7388.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7458) TestContainerManagerSecurity is still flakey

2017-11-08 Thread Robert Kanter (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-7458:

Attachment: YARN-7458.003.patch

It wasn't too spammy on my computer, but I guess it could be on something 
slower.  The 003 patch removes that log message.  Instead, we log once before 
starting the {{waitFor}} and also if there's a {{TimeoutException}}, to make 
things clearer.

> TestContainerManagerSecurity is still flakey
> 
>
> Key: YARN-7458
> URL: https://issues.apache.org/jira/browse/YARN-7458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-7458.001.patch, YARN-7458.002.patch, 
> YARN-7458.003.patch
>
>
> YARN-6150 made this less flakey, but we're still seeing an occasional issue 
> here:
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:420)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:356)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:167)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7143) FileNotFound handling in ResourceUtils is inconsistent