[jira] [Commented] (YARN-8830) SLS tool not working in trunk

2018-09-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631390#comment-16631390
 ] 

Hadoop QA commented on YARN-8830:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 56s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 17s{color} | {color:orange} hadoop-tools/hadoop-sls: The patch generated 4 
new + 7 unchanged - 0 fixed = 11 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
15s{color} | {color:green} hadoop-sls in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 64m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8830 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941569/YARN-8830.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fc050ee2317c 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 5c8d907 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/21998/artifact/out/diff-checkstyle-hadoop-tools_hadoop-sls.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21998/testReport/ |
| Max. process+thread count | 448 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls |
| Console output | 

[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception

2018-09-27 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631387#comment-16631387
 ] 

Bibin A Chundatt commented on YARN-8829:


Committing patch shortly.

> Cluster metrics can  fail with IndexOutOfBound Exception
> 
>
> Key: YARN-8829
> URL: https://issues.apache.org/jira/browse/YARN-8829
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akshay Agarwal
>Assignee: Akshay Agarwal
>Priority: Minor
> Attachments: YARN-8829.001.patch
>
>
> If no sub clusters are available in Router Based Federation Setup ,cluster 
> metrics can throw "IndexOutOfBoundException".
> Additional check is required for sub cluster is Empty.
> *Exception details:*
> {noformat}
> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, 
> Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>  at java.util.ArrayList.get(ArrayList.java:429)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception

2018-09-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631377#comment-16631377
 ] 

Hadoop QA commented on YARN-8829:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
34s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8829 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941526/YARN-8829.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 572be2144271 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 5c8d907 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21997/testReport/ |
| Max. process+thread count | 706 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21997/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Cluster metrics can  fail with IndexOutOfBound Exception
> 

[jira] [Assigned] (YARN-8792) Revisit FairScheduler QueuePlacementPolicy

2018-09-27 Thread Shuai Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Zhang reassigned YARN-8792:
-

Assignee: Shuai Zhang

> Revisit FairScheduler QueuePlacementPolicy 
> ---
>
> Key: YARN-8792
> URL: https://issues.apache.org/jira/browse/YARN-8792
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Assignee: Shuai Zhang
>Priority: Major
>
> Fair scheduler use `QueuePlacementPolicy` to map a request to queue. There 
> are several problems:
>  # The termination of the responsibility chain should bind to the assigning 
> result instead of the rule.
>  # It should provide a reason when rejecting a request.
>  # Still need more useful rules:
>  ## RejectNonLeafQueue
>  ## RejectDefaultQueue
>  ## RejectUsers
>  ## RejectQueues
>  ## DefaultByUser



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8794) QueuePlacementPolicy add more rules

2018-09-27 Thread Shuai Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Zhang reassigned YARN-8794:
-

Assignee: Shuai Zhang

> QueuePlacementPolicy add more rules
> ---
>
> Key: YARN-8794
> URL: https://issues.apache.org/jira/browse/YARN-8794
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Shuai Zhang
>Assignee: Shuai Zhang
>Priority: Major
> Attachments: YARN-8794.001.patch, YARN-8794.002.patch
>
>
> Still need more useful rules:
>  # RejectNonLeafQueue
>  # RejectDefaultQueue
>  # RejectUsers
>  # RejectQueues
>  # DefaultByUser



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8795) QueuePlacementRule move to separate files

2018-09-27 Thread Shuai Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Zhang reassigned YARN-8795:
-

Assignee: Shuai Zhang

> QueuePlacementRule move to separate files
> -
>
> Key: YARN-8795
> URL: https://issues.apache.org/jira/browse/YARN-8795
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Assignee: Shuai Zhang
>Priority: Major
> Attachments: YARN-8795.002.patch, YARN-8795.003.patch, 
> YARN-8795.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8793) QueuePlacementPolicy bind more information to assigning result

2018-09-27 Thread Shuai Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Zhang reassigned YARN-8793:
-

Assignee: Shuai Zhang

> QueuePlacementPolicy bind more information to assigning result
> --
>
> Key: YARN-8793
> URL: https://issues.apache.org/jira/browse/YARN-8793
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Assignee: Shuai Zhang
>Priority: Major
> Attachments: YARN-8793.001.patch, YARN-8793.002.patch, 
> YARN-8793.003.patch, YARN-8793.004.patch, YARN-8793.005.patch, 
> YARN-8793.006.patch, YARN-8793.007.patch, YARN-8793.008.patch
>
>
> Fair scheduler's QueuePlacementPolicy should bind more information to 
> assigning result:
>  # Whether to terminate the chain of responsibility
>  # The reason to reject a request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8760) [AMRMProxy] Fix concurrent re-register due to YarnRM failover in AMRMClientRelayer

2018-09-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631364#comment-16631364
 ] 

Hadoop QA commented on YARN-8760:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
48s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 49s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
23s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 41s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 80m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestNMProxy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8760 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941607/YARN-8760.v1.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 23dee196230c 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 5c8d907 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| unit | 

[jira] [Comment Edited] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription

2018-09-27 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631359#comment-16631359
 ] 

Arun Suresh edited comment on YARN-8808 at 9/28/18 5:15 AM:


bq. containersUtilization and nodeUtilization in SchedulerNode are always 
instantiated (to ResourceUtilization.newInstance(0, 0, 0f)), so 
getNodeUtilization() / getAggregatedContainersUtilization() should never return 
null, unless I am missing somthing
Again, I noticed the NPE in a testcase setup and I had to put the check. It is 
possible it might not happen in a real cluster setup (and also my branch was a 
bit stale). Maybe a good Idea to just put the check in there just as a safety. 

Am +1 on the patch otherwise


was (Author: asuresh):
bq. containersUtilization and nodeUtilization in SchedulerNode are always 
instantiated (to ResourceUtilization.newInstance(0, 0, 0f)), so 
getNodeUtilization() / getAggregatedContainersUtilization() should never return 
null, unless I am missing somthing
Again, I noticed the NPE in a testcase setup and I had to put the check. It is 
possible it might not happen in a real cluster.

> Use aggregate container utilization instead of node utilization to determine 
> resources available for oversubscription
> -
>
> Key: YARN-8808
> URL: https://issues.apache.org/jira/browse/YARN-8808
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8088-YARN-1011.01.patch, 
> YARN-8808-YARN-1011.00.patch
>
>
> Resource oversubscription should be bound to the amount of the resources that 
> can be allocated to containers, hence the allocation threshold should be with 
> respect to aggregate container utilization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription

2018-09-27 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631359#comment-16631359
 ] 

Arun Suresh commented on YARN-8808:
---

bq. containersUtilization and nodeUtilization in SchedulerNode are always 
instantiated (to ResourceUtilization.newInstance(0, 0, 0f)), so 
getNodeUtilization() / getAggregatedContainersUtilization() should never return 
null, unless I am missing somthing
Again, I noticed the NPE in a testcase setup and I had to put the check. It is 
possible it might not happen in a real cluster.

> Use aggregate container utilization instead of node utilization to determine 
> resources available for oversubscription
> -
>
> Key: YARN-8808
> URL: https://issues.apache.org/jira/browse/YARN-8808
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8088-YARN-1011.01.patch, 
> YARN-8808-YARN-1011.00.patch
>
>
> Resource oversubscription should be bound to the amount of the resources that 
> can be allocated to containers, hence the allocation threshold should be with 
> respect to aggregate container utilization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription

2018-09-27 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631358#comment-16631358
 ] 

Arun Suresh commented on YARN-8808:
---

bq. The scheduler would see this 'Node' has 10GBs to allocate because that's 
what NM tells RM. I believe in this case, YARN should try to fully utilize just 
10GBs instead of the whole node (100 GBs), because YARN is entitled to use only 
10GBs.  If 10GBs is indeed fully utilized, the aggregate container utilization 
is 100%, but the nodeUtilization is 10% (Again, node utilization by default is 
detected by some plugin on NM side that reads from /proc and sees the remaining 
system-wide 90GBs as available). Don't think we shall check if nodeUtilization 
is low.
Makes sense.. to be honest, in my testing I had also changed it from 
nodeUtilization to aggregateContainerUtilization :) I was just wondering if 
there is still a case where we might need to factor in nodeUtilization.

> Use aggregate container utilization instead of node utilization to determine 
> resources available for oversubscription
> -
>
> Key: YARN-8808
> URL: https://issues.apache.org/jira/browse/YARN-8808
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8088-YARN-1011.01.patch, 
> YARN-8808-YARN-1011.00.patch
>
>
> Resource oversubscription should be bound to the amount of the resources that 
> can be allocated to containers, hence the allocation threshold should be with 
> respect to aggregate container utilization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader

2018-09-27 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631355#comment-16631355
 ] 

Rohith Sharma K S commented on YARN-8270:
-

I back ported to branch-3.1 also. Updated the Fix Version.

> Adding JMX Metrics for Timeline Collector and Reader
> 
>
> Key: YARN-8270
> URL: https://issues.apache.org/jira/browse/YARN-8270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2, timelineserver
>Reporter: Sushil Ks
>Assignee: Sushil Ks
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8270.001.patch, YARN-8270.002.patch, 
> YARN-8270.003.patch, YARN-8270.004.patch, YARN-8270.005.patch
>
>
> This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and 
> Timeline Reader, basically for Timeline Collector it tries to capture success 
> and failure latencies for *putEntities* and *putEntitiesAsync*  from 
> *TimelineCollectorWebService* , similarly all the API's success and failure 
> latencies for fetching TimelineEntities from *TimelineReaderWebServices*. 
> This would actually help in monitoring and measuring performance for ATSv2 at 
> scale.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader

2018-09-27 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-8270:

Fix Version/s: 3.1.2

> Adding JMX Metrics for Timeline Collector and Reader
> 
>
> Key: YARN-8270
> URL: https://issues.apache.org/jira/browse/YARN-8270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2, timelineserver
>Reporter: Sushil Ks
>Assignee: Sushil Ks
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8270.001.patch, YARN-8270.002.patch, 
> YARN-8270.003.patch, YARN-8270.004.patch, YARN-8270.005.patch
>
>
> This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and 
> Timeline Reader, basically for Timeline Collector it tries to capture success 
> and failure latencies for *putEntities* and *putEntitiesAsync*  from 
> *TimelineCollectorWebService* , similarly all the API's success and failure 
> latencies for fetching TimelineEntities from *TimelineReaderWebServices*. 
> This would actually help in monitoring and measuring performance for ATSv2 at 
> scale.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader

2018-09-27 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631353#comment-16631353
 ] 

Rohith Sharma K S commented on YARN-8270:
-

[~vrushalic] I updated the fix version to 3.2.0. Patch is committed to trunk 
only which corresponding version is 3.2.0

> Adding JMX Metrics for Timeline Collector and Reader
> 
>
> Key: YARN-8270
> URL: https://issues.apache.org/jira/browse/YARN-8270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2, timelineserver
>Reporter: Sushil Ks
>Assignee: Sushil Ks
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8270.001.patch, YARN-8270.002.patch, 
> YARN-8270.003.patch, YARN-8270.004.patch, YARN-8270.005.patch
>
>
> This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and 
> Timeline Reader, basically for Timeline Collector it tries to capture success 
> and failure latencies for *putEntities* and *putEntitiesAsync*  from 
> *TimelineCollectorWebService* , similarly all the API's success and failure 
> latencies for fetching TimelineEntities from *TimelineReaderWebServices*. 
> This would actually help in monitoring and measuring performance for ATSv2 at 
> scale.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader

2018-09-27 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-8270:

Fix Version/s: (was: 3.1.2)
   3.2.0

> Adding JMX Metrics for Timeline Collector and Reader
> 
>
> Key: YARN-8270
> URL: https://issues.apache.org/jira/browse/YARN-8270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2, timelineserver
>Reporter: Sushil Ks
>Assignee: Sushil Ks
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8270.001.patch, YARN-8270.002.patch, 
> YARN-8270.003.patch, YARN-8270.004.patch, YARN-8270.005.patch
>
>
> This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and 
> Timeline Reader, basically for Timeline Collector it tries to capture success 
> and failure latencies for *putEntities* and *putEntitiesAsync*  from 
> *TimelineCollectorWebService* , similarly all the API's success and failure 
> latencies for fetching TimelineEntities from *TimelineReaderWebServices*. 
> This would actually help in monitoring and measuring performance for ATSv2 at 
> scale.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8832) Review of RMCommunicator Class

2018-09-27 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR reassigned YARN-8832:
-

Assignee: BELUGA BEHR

> Review of RMCommunicator Class
> --
>
> Key: YARN-8832
> URL: https://issues.apache.org/jira/browse/YARN-8832
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: YARN-88321.patch
>
>
> Various improvements to the {{RMCommunicator}} class.
>  
>  * Use SLF4J parameterized logging
>  * Use switch statement instead of {{if}}-{{else statements}}
>  * Remove anti-pattern of "log and throw" (just throw)
>  * Use a flag to stop thread instead of an interrupt (it may be interrupting 
> the heartbeat code and not the thread loop)
>  * The main thread loops performs loops on the heartbeat callback queue until 
> the queue is empty.  It's technically possible that other threads could 
> constantly put new callbacks into the queue and therefore the main thread 
> never progresses past the callbacks.  Put a cap on the number of callbacks 
> that will be processed in any iteration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8832) Review of RMCommunicator Class

2018-09-27 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8832:
--
Attachment: YARN-88321.patch

> Review of RMCommunicator Class
> --
>
> Key: YARN-8832
> URL: https://issues.apache.org/jira/browse/YARN-8832
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: YARN-88321.patch
>
>
> Various improvements to the {{RMCommunicator}} class.
>  
>  * Use SLF4J parameterized logging
>  * Use switch statement instead of {{if}}-{{else statements}}
>  * Remove anti-pattern of "log and throw" (just throw)
>  * Use a flag to stop thread instead of an interrupt (it may be interrupting 
> the heartbeat code and not the thread loop)
>  * The main thread loops performs loops on the heartbeat callback queue until 
> the queue is empty.  It's technically possible that other threads could 
> constantly put new callbacks into the queue and therefore the main thread 
> never progresses past the callbacks.  Put a cap on the number of callbacks 
> that will be processed in any iteration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8832) Review of RMCommunicator Class

2018-09-27 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created YARN-8832:
-

 Summary: Review of RMCommunicator Class
 Key: YARN-8832
 URL: https://issues.apache.org/jira/browse/YARN-8832
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications
Affects Versions: 3.2.0
Reporter: BELUGA BEHR


Various improvements to the {{RMCommunicator}} class.

 
 * Use SLF4J parameterized logging
 * Use switch statement instead of {{if}}-{{else statements}}
 * Remove anti-pattern of "log and throw" (just throw)
 * Use a flag to stop thread instead of an interrupt (it may be interrupting 
the heartbeat code and not the thread loop)
 * The main thread loops performs loops on the heartbeat callback queue until 
the queue is empty.  It's technically possible that other threads could 
constantly put new callbacks into the queue and therefore the main thread never 
progresses past the callbacks.  Put a cap on the number of callbacks that will 
be processed in any iteration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8792) Revisit FairScheduler QueuePlacementPolicy

2018-09-27 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631207#comment-16631207
 ] 

Yufei Gu commented on YARN-8792:


[~HCOONa], I've added you as a contributor, so that you can assign these jiras 
to yourself.

> Revisit FairScheduler QueuePlacementPolicy 
> ---
>
> Key: YARN-8792
> URL: https://issues.apache.org/jira/browse/YARN-8792
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Priority: Major
>
> Fair scheduler use `QueuePlacementPolicy` to map a request to queue. There 
> are several problems:
>  # The termination of the responsibility chain should bind to the assigning 
> result instead of the rule.
>  # It should provide a reason when rejecting a request.
>  # Still need more useful rules:
>  ## RejectNonLeafQueue
>  ## RejectDefaultQueue
>  ## RejectUsers
>  ## RejectQueues
>  ## DefaultByUser



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8831) Review of LocalContainerAllocator

2018-09-27 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created YARN-8831:
-

 Summary: Review of LocalContainerAllocator
 Key: YARN-8831
 URL: https://issues.apache.org/jira/browse/YARN-8831
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications
Affects Versions: 3.2.0
Reporter: BELUGA BEHR
 Attachments: YARN-8831.1.patch

Some trivial cleanup of class {{LocalContainerAllocator}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8831) Review of LocalContainerAllocator

2018-09-27 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8831:
--
Priority: Trivial  (was: Minor)

> Review of LocalContainerAllocator
> -
>
> Key: YARN-8831
> URL: https://issues.apache.org/jira/browse/YARN-8831
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Priority: Trivial
> Attachments: YARN-8831.1.patch
>
>
> Some trivial cleanup of class {{LocalContainerAllocator}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8831) Review of LocalContainerAllocator

2018-09-27 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR reassigned YARN-8831:
-

Assignee: BELUGA BEHR

> Review of LocalContainerAllocator
> -
>
> Key: YARN-8831
> URL: https://issues.apache.org/jira/browse/YARN-8831
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: YARN-8831.1.patch
>
>
> Some trivial cleanup of class {{LocalContainerAllocator}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8831) Review of LocalContainerAllocator

2018-09-27 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8831:
--
Attachment: YARN-8831.1.patch

> Review of LocalContainerAllocator
> -
>
> Key: YARN-8831
> URL: https://issues.apache.org/jira/browse/YARN-8831
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Priority: Trivial
> Attachments: YARN-8831.1.patch
>
>
> Some trivial cleanup of class {{LocalContainerAllocator}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631181#comment-16631181
 ] 

Hudson commented on YARN-8270:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15069 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15069/])
YARN-8270 Adding JMX Metrics for Timeline Collector and Reader. (vrushali: rev 
90e2e493b3dc8be54f655b957b98a4bc0e003684)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/reader/TimelineReaderWebServices.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/metrics/package-info.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/metrics/TimelineReaderMetrics.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderMetrics.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/metrics/PerNodeAggTimelineCollectorMetrics.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorWebService.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/collector/TestPerNodeAggTimelineCollectorMetrics.java


> Adding JMX Metrics for Timeline Collector and Reader
> 
>
> Key: YARN-8270
> URL: https://issues.apache.org/jira/browse/YARN-8270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2, timelineserver
>Reporter: Sushil Ks
>Assignee: Sushil Ks
>Priority: Major
> Fix For: 3.1.2
>
> Attachments: YARN-8270.001.patch, YARN-8270.002.patch, 
> YARN-8270.003.patch, YARN-8270.004.patch, YARN-8270.005.patch
>
>
> This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and 
> Timeline Reader, basically for Timeline Collector it tries to capture success 
> and failure latencies for *putEntities* and *putEntitiesAsync*  from 
> *TimelineCollectorWebService* , similarly all the API's success and failure 
> latencies for fetching TimelineEntities from *TimelineReaderWebServices*. 
> This would actually help in monitoring and measuring performance for ATSv2 at 
> scale.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3525) Rename fair scheduler properties increment-allocation-mb and increment-allocation-vcores

2018-09-27 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631174#comment-16631174
 ] 

Haibo Chen commented on YARN-3525:
--

I'm not sure if we still need to address this.

The two properties have now been deprecated by 
yarn.resource-types.memory-mb.increment-allocation and 
yarn.resource-types.vcores.increment-allocation after resource type is merged. 
Because cpu and vcores are two built-in resource types, people should use the 
resource-types-prefixed configuration properties, which are scheduler agnostic.

> Rename fair scheduler properties increment-allocation-mb and 
> increment-allocation-vcores
> 
>
> Key: YARN-3525
> URL: https://issues.apache.org/jira/browse/YARN-3525
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Zoltan Siegl
>Priority: Minor
> Attachments: YARN-3525.001.patch, YARN-3525.003.patch
>
>
> Rename below two properties since only used by fair scheduler 
> {color:blue}yarn.scheduler.increment-allocation-mb{color} to 
> {color:red}yarn.scheduler.fair.increment-allocation-mb{color}
> {color:blue}yarn.scheduler.increment-allocation-vcores{color} to  
> {color:red}yarn.scheduler.fair.increment-allocation-vcores{color}
> All other properties only for fair scheduler are using {color:red} 
> yarn.scheduler.fair{color} prefix .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription

2018-09-27 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631145#comment-16631145
 ] 

Arun Suresh commented on YARN-8808:
---

bq. Node has capacity for 4 1GB containers, but is currently running 2 
containers each using more than 1.9GB - in this case, overallocation should be 
allowed.
sorry, I meant 2 container each allocated 1 GB but using 0.9GB. This would 
result in the container utilization = 90% (1.8 / 2.0)
but the Node utilization would be = 45% (1.8 / 4.0). Here Container Utilization 
is high by node utilization is low.

> Use aggregate container utilization instead of node utilization to determine 
> resources available for oversubscription
> -
>
> Key: YARN-8808
> URL: https://issues.apache.org/jira/browse/YARN-8808
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8088-YARN-1011.01.patch, 
> YARN-8808-YARN-1011.00.patch
>
>
> Resource oversubscription should be bound to the amount of the resources that 
> can be allocated to containers, hence the allocation threshold should be with 
> respect to aggregate container utilization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8732) Add unit tests of min/max allocation for custom resource types in FairScheduler

2018-09-27 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8732:
-
Summary: Add unit tests of min/max allocation for custom resource types in 
FairScheduler  (was: Add unit tests of min/max allocation for FairScheduler)

> Add unit tests of min/max allocation for custom resource types in 
> FairScheduler
> ---
>
> Key: YARN-8732
> URL: https://issues.apache.org/jira/browse/YARN-8732
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-8732.001.patch
>
>
> Create testcase like this, but for FS: 
> org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService#testValidateRequestCapacityAgainstMinMaxAllocationFor3rdResourceTypes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8760) [AMRMProxy] Fix concurrent re-register due to YarnRM failover in AMRMClientRelayer

2018-09-27 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8760:
---
Attachment: YARN-8760.v1.patch

> [AMRMProxy] Fix concurrent re-register due to YarnRM failover in 
> AMRMClientRelayer
> --
>
> Key: YARN-8760
> URL: https://issues.apache.org/jira/browse/YARN-8760
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8760.v1.patch
>
>
> When home YarnRM is failing over, FinishApplicationMaster call from AM can 
> have multiple retry threads outstanding in FederationInterceptor. When new 
> YarnRM come back up, all retry threads will re-register to YarnRM. The first 
> one will succeed but the rest will get "Application Master is already 
> registered" exception. We should catch and swallow this exception and move 
> on. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8732) Add unit tests of min/max allocation for FairScheduler

2018-09-27 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8732:
-
Summary: Add unit tests of min/max allocation for FairScheduler  (was: 
Create new testcase in TestApplicationMasterService that tests min/max 
allocation but for FairScheduler)

> Add unit tests of min/max allocation for FairScheduler
> --
>
> Key: YARN-8732
> URL: https://issues.apache.org/jira/browse/YARN-8732
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-8732.001.patch
>
>
> Create testcase like this, but for FS: 
> org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService#testValidateRequestCapacityAgainstMinMaxAllocationFor3rdResourceTypes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8826) Fix lingering timeline collector after serviceStop in TimelineCollectorManager

2018-09-27 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631136#comment-16631136
 ] 

Vrushali C commented on YARN-8826:
--

None of the pre commit builds are running. All are failing with could not apply 
patch to trunk

> Fix lingering timeline collector after serviceStop in TimelineCollectorManager
> --
>
> Key: YARN-8826
> URL: https://issues.apache.org/jira/browse/YARN-8826
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Reporter: Prabha Manepalli
>Assignee: Prabha Manepalli
>Priority: Trivial
> Attachments: YARN-8826.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8826) Fix lingering timeline collector after serviceStop in TimelineCollectorManager

2018-09-27 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631133#comment-16631133
 ] 

Vrushali C commented on YARN-8826:
--

Hi [~prabham]
Here is the precommit build link
https://builds.apache.org/job/PreCommit-YARN-Build/

I have retriggered the build: 
https://builds.apache.org/job/PreCommit-YARN-Build/21991/

> Fix lingering timeline collector after serviceStop in TimelineCollectorManager
> --
>
> Key: YARN-8826
> URL: https://issues.apache.org/jira/browse/YARN-8826
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Reporter: Prabha Manepalli
>Assignee: Prabha Manepalli
>Priority: Trivial
> Attachments: YARN-8826.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8775) TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File modifications

2018-09-27 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631131#comment-16631131
 ] 

Haibo Chen commented on YARN-8775:
--

Thanks [~bsteinbach] for the patch. I think we can reduce the leak of 
LocalDirsHandlerService implementation details in TestDiskFailures,

by disabling the periodical health check in LocalDirHandlerService, and calling 
LocalDirsHandlerService.checkDirs() every time before we check verify disk 
health.

checkDirs() is currently private, so we'll need to make it public (make sure to 
add '@VisibleForTesting')

 

One question I have is why do we need to retry inside prepareDirToFail()?

> TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File 
> modifications
> --
>
> Key: YARN-8775
> URL: https://issues.apache.org/jira/browse/YARN-8775
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn
>Affects Versions: 3.0.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Major
> Attachments: YARN-8775.001.patch, YARN-8775.002.patch
>
>
> The test can fail sometimes when file operations were done during the check 
> done by the thread in _LocalDirsHandlerService._
> {code:java}
> java.lang.AssertionError: NodeManager could not identify disk failure.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.yarn.server.TestDiskFailures.verifyDisksHealth(TestDiskFailures.java:239)
>   at 
> org.apache.hadoop.yarn.server.TestDiskFailures.testDirsFailures(TestDiskFailures.java:202)
>   at 
> org.apache.hadoop.yarn.server.TestDiskFailures.testLocalDirsFailures(TestDiskFailures.java:99)
> Stderr
> 2018-09-13 08:21:49,822 INFO [main] server.TestDiskFailures 
> (TestDiskFailures.java:prepareDirToFail(277)) - Prepared 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1
>  to fail.
> 2018-09-13 08:21:49,823 INFO [main] server.TestDiskFailures 
> (TestDiskFailures.java:prepareDirToFail(277)) - Prepared 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
>  to fail.
> 2018-09-13 08:21:49,823 WARN [DiskHealthMonitor-Timer] 
> nodemanager.DirectoryCollection (DirectoryCollection.java:checkDirs(283)) - 
> Directory 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1
>  error, Not a directory: 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1,
>  removing from list of valid directories
> 2018-09-13 08:21:49,824 WARN [DiskHealthMonitor-Timer] 
> localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:initializeLogDir(1329)) - Could not 
> initialize log dir 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
> java.io.FileNotFoundException: Destination exists and is not a directory: 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:515)
> at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:496)
> at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1081)
> at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:178)
> at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:205)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:747)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:743)
> at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:743)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDir(ResourceLocalizationService.java:1324)
> at 
> 

[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-27 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8789:
--
Attachment: YARN-8789.10.patch

> Add BoundedQueue to AsyncDispatcher
> ---
>
> Key: YARN-8789
> URL: https://issues.apache.org/jira/browse/YARN-8789
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: YARN-8789.1.patch, YARN-8789.10.patch, 
> YARN-8789.2.patch, YARN-8789.3.patch, YARN-8789.4.patch, YARN-8789.5.patch, 
> YARN-8789.6.patch, YARN-8789.7.patch, YARN-8789.7.patch, YARN-8789.8.patch, 
> YARN-8789.9.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing 
> with an OOM exception.  It had many thousands of Mappers and thousands of 
> Reducers.  It was noted that in the logging that the event-queue of 
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly 
> never decreasing.
> I started looking at the code and thought it could use some clean up, 
> simplification, and the ability to specify a bounded queue so that any 
> incoming events are throttled until they can be processed.  This will protect 
> the ApplicationMaster from a flood of events.
> Logging Message:
> Size of event-queue is xxx



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8793) QueuePlacementPolicy bind more information to assigning result

2018-09-27 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8793:
-
Summary: QueuePlacementPolicy bind more information to assigning result  
(was: QueuePlacementPolicy bind more information to assgining result)

> QueuePlacementPolicy bind more information to assigning result
> --
>
> Key: YARN-8793
> URL: https://issues.apache.org/jira/browse/YARN-8793
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Priority: Major
> Attachments: YARN-8793.001.patch, YARN-8793.002.patch, 
> YARN-8793.003.patch, YARN-8793.004.patch, YARN-8793.005.patch, 
> YARN-8793.006.patch, YARN-8793.007.patch, YARN-8793.008.patch
>
>
> Fair scheduler's QueuePlacementPolicy should bind more information to 
> assigning result:
>  # Whether to terminate the chain of responsibility
>  # The reason to reject a request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8792) Revisit FairScheduler QueuePlacementPolicy

2018-09-27 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8792:
-
Summary: Revisit FairScheduler QueuePlacementPolicy   (was: Revisit 
FiarScheduler QueuePlacementPolicy )

> Revisit FairScheduler QueuePlacementPolicy 
> ---
>
> Key: YARN-8792
> URL: https://issues.apache.org/jira/browse/YARN-8792
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Priority: Major
>
> Fair scheduler use `QueuePlacementPolicy` to map a request to queue. There 
> are several problems:
>  # The termination of the responsibility chain should bind to the assigning 
> result instead of the rule.
>  # It should provide a reason when rejecting a request.
>  # Still need more useful rules:
>  ## RejectNonLeafQueue
>  ## RejectDefaultQueue
>  ## RejectUsers
>  ## RejectQueues
>  ## DefaultByUser



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8734) Readiness check for remote service belongs to the same user

2018-09-27 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631083#comment-16631083
 ] 

Eric Yang commented on YARN-8734:
-

Precommit build failed due to other Jenkins problem.  Trigger the test job 
again.

> Readiness check for remote service belongs to the same user
> ---
>
> Key: YARN-8734
> URL: https://issues.apache.org/jira/browse/YARN-8734
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn-native-services
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: Component_dependencies.png, Dependency check vs.pdf, 
> Service_dependencies.png, YARN-8734.001.patch, YARN-8734.002.patch, 
> YARN-8734.003.patch, YARN-8734.004.patch, YARN-8734.005.patch, 
> YARN-8734.006.patch
>
>
> When a service is deploying, there can be remote service dependency.  It 
> would be nice to describe ZooKeeper as a dependent service, and the service 
> has reached a stable state, then deploy HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription

2018-09-27 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631043#comment-16631043
 ] 

Haibo Chen commented on YARN-8808:
--

containersUtilization and nodeUtilization in SchedulerNode are always 
instantiated (to ResourceUtilization.newInstance(0, 0, 0f)), so 
getNodeUtilization() / getAggregatedContainersUtilization() should never return 
null, unless I am missing somthing.

> Use aggregate container utilization instead of node utilization to determine 
> resources available for oversubscription
> -
>
> Key: YARN-8808
> URL: https://issues.apache.org/jira/browse/YARN-8808
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8088-YARN-1011.01.patch, 
> YARN-8808-YARN-1011.00.patch
>
>
> Resource oversubscription should be bound to the amount of the resources that 
> can be allocated to containers, hence the allocation threshold should be with 
> respect to aggregate container utilization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription

2018-09-27 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631039#comment-16631039
 ] 

Haibo Chen commented on YARN-8808:
--

Ah.. Got you.
{quote}I was just saying.. we need an additional check to see if either one of 
them (you are proposing to use the former in this JIRA) is {{0}}
{quote}
Not sure checking nodeUtilization makes sense to me.  Let's take a more extreme 
case for example:

1) A node (hardware) has 100 GB capacity, and we're sharing the node with other 
YARN stuff, so we'd configure NMs to limit aggregate container allocation on 
each node to 10 GBs. 

2) The scheduler would see this 'Node' has 10GBs to allocate because that's 
what NM tells RM. I believe in this case, YARN should try to fully utilize just 
10GBs instead of the whole node (100 GBs), because YARN is entitled to use only 
10GBs.  If 10GBs is indeed fully utilized, the aggregate container utilization 
is 100%, but the nodeUtilization is 10% (Again, node utilization by default is 
detected by some plugin on NM side that reads from /proc and sees the remaining 
system-wide 90GBs as available). Don't think we shall check if nodeUtilization 
is low.
{quote}- Node has capacity for 4 1GB containers, but is currently running 2 
containers each using more than 1.9GB - in this case, overallocation should be 
allowed.
{quote}
I am not following here. Node has a capacity of 4GBs, 2 containers each using 
1.9GB, so the aggregate container utilization and node utilization are both 
high, no?  Node capacity and utilization don't have anything to do with # of 
containers, do they?

 

> Use aggregate container utilization instead of node utilization to determine 
> resources available for oversubscription
> -
>
> Key: YARN-8808
> URL: https://issues.apache.org/jira/browse/YARN-8808
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8088-YARN-1011.01.patch, 
> YARN-8808-YARN-1011.00.patch
>
>
> Resource oversubscription should be bound to the amount of the resources that 
> can be allocated to containers, hence the allocation threshold should be with 
> respect to aggregate container utilization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6456) Allow administrators to control available container runtimes and set defaults for all containers

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630977#comment-16630977
 ] 

Hudson commented on YARN-6456:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15068 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15068/])
YARN-6456.  Added config to set default container runtimes. (eyang: 
rev b237a0dd44ab285941983648d7ef26b99b30d624)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DefaultLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceHandlerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java


> Allow administrators to control available container runtimes and set defaults 
> for all containers
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Craig Condit
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0
>
> Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, 
> YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, 
> YARN-6456.004.patch, YARN-6456.005.patch
>
>
>  
> With LCE, there are multiple ContainerRuntimes available for handling 
> different types of containers; default, docker, java sandbox. Admins should 
> have the ability to override the user decision and set a single global 
> ContainerRuntime to be used for all containers.
> Original Description:
> {quote}One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
>  1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
>  2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
>  3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6456) Allow administrators to control available container runtimes and set defaults for all containers

2018-09-27 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630930#comment-16630930
 ] 

Eric Yang commented on YARN-6456:
-

+1 looks good to me for patch 005, will commit shortly.

> Allow administrators to control available container runtimes and set defaults 
> for all containers
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Craig Condit
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, 
> YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, 
> YARN-6456.004.patch, YARN-6456.005.patch
>
>
>  
> With LCE, there are multiple ContainerRuntimes available for handling 
> different types of containers; default, docker, java sandbox. Admins should 
> have the ability to override the user decision and set a single global 
> ContainerRuntime to be used for all containers.
> Original Description:
> {quote}One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
>  1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
>  2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
>  3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6456) Allow administrators to control available container runtimes and set defaults for all containers

2018-09-27 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630892#comment-16630892
 ] 

Craig Condit commented on YARN-6456:


[~eyang], title updated.

> Allow administrators to control available container runtimes and set defaults 
> for all containers
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Craig Condit
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, 
> YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, 
> YARN-6456.004.patch, YARN-6456.005.patch
>
>
>  
> With LCE, there are multiple ContainerRuntimes available for handling 
> different types of containers; default, docker, java sandbox. Admins should 
> have the ability to override the user decision and set a single global 
> ContainerRuntime to be used for all containers.
> Original Description:
> {quote}One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
>  1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
>  2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
>  3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6456) Allow administrators to control available container runtimes and set defaults for all containers

2018-09-27 Thread Craig Condit (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit updated YARN-6456:
---
Summary: Allow administrators to control available container runtimes and 
set defaults for all containers  (was: Allow administrators to set a single 
ContainerRuntime for all containers)

> Allow administrators to control available container runtimes and set defaults 
> for all containers
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Craig Condit
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, 
> YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, 
> YARN-6456.004.patch, YARN-6456.005.patch
>
>
>  
> With LCE, there are multiple ContainerRuntimes available for handling 
> different types of containers; default, docker, java sandbox. Admins should 
> have the ability to override the user decision and set a single global 
> ContainerRuntime to be used for all containers.
> Original Description:
> {quote}One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
>  1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
>  2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
>  3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8800) Updated documentation of Submarine with latest examples.

2018-09-27 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630857#comment-16630857
 ] 

Wangda Tan commented on YARN-8800:
--

Found the images are problematic because of the size. Removed all images, 
verified latest patch, should work now. [~sunilg] mind to review again? 

> Updated documentation of Submarine with latest examples.
> 
>
> Key: YARN-8800
> URL: https://issues.apache.org/jira/browse/YARN-8800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8800.001.patch, YARN-8800.002.patch, 
> YARN-8800.003.patch, YARN-8800.004.patch, YARN-8800.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8800) Updated documentation of Submarine with latest examples.

2018-09-27 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8800:
-
Attachment: YARN-8800.005.patch

> Updated documentation of Submarine with latest examples.
> 
>
> Key: YARN-8800
> URL: https://issues.apache.org/jira/browse/YARN-8800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8800.001.patch, YARN-8800.002.patch, 
> YARN-8800.003.patch, YARN-8800.004.patch, YARN-8800.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers

2018-09-27 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630838#comment-16630838
 ] 

Eric Yang commented on YARN-6456:
-

The title doesn't match the implementation though.  The implementation is 
allowing more than one runtime, and set one, if it is not explicitly defined.  
Where the title says to enforce all container to run with one runtime.  The 
feature requested by the title can be accomplished by using existing 
yarn.nodemanager.runtime.linux.allowed-runtimes setting and set it to one 
runtime without code change.  Do we want to change the title to reflect the 
implementation for correctness?


> Allow administrators to set a single ContainerRuntime for all containers
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Craig Condit
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, 
> YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, 
> YARN-6456.004.patch, YARN-6456.005.patch
>
>
>  
> With LCE, there are multiple ContainerRuntimes available for handling 
> different types of containers; default, docker, java sandbox. Admins should 
> have the ability to override the user decision and set a single global 
> ContainerRuntime to be used for all containers.
> Original Description:
> {quote}One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
>  1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
>  2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
>  3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception

2018-09-27 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630814#comment-16630814
 ] 

Giovanni Matteo Fumarola commented on YARN-8829:


+1. LGTM.

> Cluster metrics can  fail with IndexOutOfBound Exception
> 
>
> Key: YARN-8829
> URL: https://issues.apache.org/jira/browse/YARN-8829
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akshay Agarwal
>Assignee: Akshay Agarwal
>Priority: Minor
> Attachments: YARN-8829.001.patch
>
>
> If no sub clusters are available in Router Based Federation Setup ,cluster 
> metrics can throw "IndexOutOfBoundException".
> Additional check is required for sub cluster is Empty.
> *Exception details:*
> {noformat}
> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, 
> Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>  at java.util.ArrayList.get(ArrayList.java:429)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers

2018-09-27 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630787#comment-16630787
 ] 

Eric Yang edited comment on YARN-6456 at 9/27/18 5:38 PM:
--

[~ccondit-target] Thank you for the explanation.  In YARN service, user does 
not require to specify the runtime, and it will run the 
DefaultLinuxContainerRuntime because it is the default.  I understand more 
clearly that this JIRA is to focus on setting a default other than 
DefaultLinuxContainerRuntime, when user does not specify the default.  User who 
is using YARN service LLAP application which did not specify the default, and 
system administrator override the default.  It might break existing users 
because the behavior has changed.  The responsibility of the compatibility 
breakage depends on the system administrator rather than code change.  Hence, 
this change is backward compatible by default, and give system admin more 
control to steer the developer to work with specific runtime.  The 
documentation can explain this more clearly by showing 
yarn.nodemanager.runtime.linux.type and 
yarn.nodemanager.runtime.linux.allowed-runtimes side by side for clarity, or 
find this JIRA for the explanation.  I think the patch is ready.


was (Author: eyang):
[~ccondit-target] Thank you for the explanation.  In YARN service, user does 
not require to specify the runtime, and it will run the 
DefaultLinuxContainerRuntime because it is the default.  I understand more 
clearly that this JIRA is to focus on setting a default other than 
DefaultLinuxContainerRuntime, when user does not specify the default.  User who 
is using YARN service LLAP application which did not specify the default, and 
system administrator override the default.  It might break existing users 
because the behavior has changed.  The responsibility of the compatibility 
breakage depends on the system administrator rather than code change.  Hence, 
this change is backward compatible by default, and give system admin more 
control to steer the developer to work with specific runtime.  The 
documentation can explain this more clearly by showing 
yarn.nodemanager.runtime.linux.type and 
yarn.nodemanager.runtime.linux.allowed-runtimes side by side for clarity, or 
find this JIRA for the explaination.  I think the patch is ready.

> Allow administrators to set a single ContainerRuntime for all containers
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Craig Condit
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, 
> YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, 
> YARN-6456.004.patch, YARN-6456.005.patch
>
>
>  
> With LCE, there are multiple ContainerRuntimes available for handling 
> different types of containers; default, docker, java sandbox. Admins should 
> have the ability to override the user decision and set a single global 
> ContainerRuntime to be used for all containers.
> Original Description:
> {quote}One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
>  1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
>  2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
>  3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers

2018-09-27 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630787#comment-16630787
 ] 

Eric Yang commented on YARN-6456:
-

[~ccondit-target] Thank you for the explanation.  In YARN service, user does 
not require to specify the runtime, and it will run the 
DefaultLinuxContainerRuntime because it is the default.  I understand more 
clearly that this JIRA is to focus on setting a default other than 
DefaultLinuxContainerRuntime, when user does not specify the default.  User who 
is using YARN service LLAP application which did not specify the default, and 
system administrator override the default.  It might break existing users 
because the behavior has changed.  The responsibility of the compatibility 
breakage depends on the system administrator rather than code change.  Hence, 
this change is backward compatible by default, and give system admin more 
control to steer the developer to work with specific runtime.  The 
documentation can explain this more clearly by showing 
yarn.nodemanager.runtime.linux.type and 
yarn.nodemanager.runtime.linux.allowed-runtimes side by side for clarity, or 
find this JIRA for the explaination.  I think the patch is ready.

> Allow administrators to set a single ContainerRuntime for all containers
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Craig Condit
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, 
> YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, 
> YARN-6456.004.patch, YARN-6456.005.patch
>
>
>  
> With LCE, there are multiple ContainerRuntimes available for handling 
> different types of containers; default, docker, java sandbox. Admins should 
> have the ability to override the user decision and set a single global 
> ContainerRuntime to be used for all containers.
> Original Description:
> {quote}One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
>  1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
>  2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
>  3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8830) SLS tool not working in trunk

2018-09-27 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt reassigned YARN-8830:
--

Assignee: Bibin A Chundatt

> SLS tool not working in trunk
> -
>
> Key: YARN-8830
> URL: https://issues.apache.org/jira/browse/YARN-8830
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8830.001.patch
>
>
> Seems NodeDetails hashCode() and equals() causing too many node registration 
> for large data set



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8830) SLS tool not working in trunk

2018-09-27 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8830:
---
Attachment: YARN-8830.001.patch

> SLS tool not working in trunk
> -
>
> Key: YARN-8830
> URL: https://issues.apache.org/jira/browse/YARN-8830
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8830.001.patch
>
>
> Seems NodeDetails hashCode() and equals() causing too many node registration 
> for large data set



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async

2018-09-27 Thread Botong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630739#comment-16630739
 ] 

Botong Huang commented on YARN-8696:


Thanks [~giovanni.fumarola] for the review and commit!

> [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
> ---
>
> Key: YARN-8696
> URL: https://issues.apache.org/jira/browse/YARN-8696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8696-branch-2.v6.patch, YARN-8696.v1.patch, 
> YARN-8696.v2.patch, YARN-8696.v3.patch, YARN-8696.v4.patch, 
> YARN-8696.v5.patch, YARN-8696.v6.patch
>
>
> Today in _FederationInterceptor_, the heartbeat to home sub-cluster is 
> synchronous. After the heartbeat is sent out to home sub-cluster, it waits 
> for the home response to come back before merging and returning the (merged) 
> heartbeat result to back AM. If home sub-cluster is suffering from connection 
> issues, or down during an YarnRM master-slave switch, all heartbeat threads 
> in _FederationInterceptor_ will be blocked waiting for home response. As a 
> result, the successful UAM heartbeats from secondary sub-clusters will not be 
> returned to AM at all. Additionally, because of the fact that we kept the 
> same heartbeat responseId between AM and home RM, lots of tricky handling are 
> needed regarding the responseId resync when it comes to 
> _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart 
> (YARN-6127, YARN-1336), home RM master-slave switch etc. 
> In this patch, we change the heartbeat to home sub-cluster to asynchronous, 
> same as the way we handle UAM heartbeats in secondaries. So that any 
> sub-cluster down or connection issues won't impact AM getting responses from 
> other sub-clusters. The responseId is also managed separately for home 
> sub-cluster and AM, and they increment independently. The resync logic 
> becomes much cleaner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8819) Fix findbugs warnings in YarnServiceUtils

2018-09-27 Thread Vidura Bhathiya Mudalige (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630708#comment-16630708
 ] 

Vidura Bhathiya Mudalige commented on YARN-8819:


[~ajisakaa], can you please assign this Jira to me?

> Fix findbugs warnings in YarnServiceUtils
> -
>
> Key: YARN-8819
> URL: https://issues.apache.org/jira/browse/YARN-8819
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akira Ajisaka
>Priority: Minor
>  Labels: newbie
>
> {noformat}
> module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine
>
> org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceUtils.getComponentArrayJson(String,
>  int, String) concatenates strings using + in a loop At 
> YarnServiceUtils.java:using + in a loop At YarnServiceUtils.java:[line 123]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8819) Fix findbugs warnings in YarnServiceUtils

2018-09-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630675#comment-16630675
 ] 

ASF GitHub Bot commented on YARN-8819:
--

GitHub user vbmudalige opened a pull request:

https://github.com/apache/hadoop/pull/419

YARN-8819. Fix findbugs warnings in YarnServiceUtils



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vbmudalige/hadoop YARN-8819

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/419.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #419


commit 57c872eb8bb7dd69599e899876417cc541018f0d
Author: Vidura Mudalige 
Date:   2018-09-27T16:10:13Z

YARN-8819. Fix findbugs warnings in YarnServiceUtils




> Fix findbugs warnings in YarnServiceUtils
> -
>
> Key: YARN-8819
> URL: https://issues.apache.org/jira/browse/YARN-8819
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akira Ajisaka
>Priority: Minor
>  Labels: newbie
>
> {noformat}
> module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine
>
> org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceUtils.getComponentArrayJson(String,
>  int, String) concatenates strings using + in a loop At 
> YarnServiceUtils.java:using + in a loop At YarnServiceUtils.java:[line 123]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8303) YarnClient should contact TimelineReader for application/attempt/container report

2018-09-27 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-8303:

Attachment: YARN-8303.001.patch

> YarnClient should contact TimelineReader for application/attempt/container 
> report
> -
>
> Key: YARN-8303
> URL: https://issues.apache.org/jira/browse/YARN-8303
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Abhishek Modi
>Priority: Critical
> Attachments: YARN-8303.001.patch, YARN-8303.poc.patch
>
>
> YarnClient get app/attempt/container information from RM. If RM doesn't have 
> then queried to ahsClient. When ATSv2 is only enabled, yarnClient will result 
> empty. 
> YarnClient is used by many users which result in empty information for 
> app/attempt/container report. 
> Proposal is to have adapter from yarn client so that app/attempt/container 
> reports can be generated from AHSv2Client which does REST API to 
> TimelineReader and get the entity and convert it into app/attempt/container 
> report.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8830) SLS tool not working in trunk

2018-09-27 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-8830:
--

 Summary: SLS tool not working in trunk
 Key: YARN-8830
 URL: https://issues.apache.org/jira/browse/YARN-8830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt


Seems NodeDetails hashCode() and equals() causing too many node registration 
for large data set



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8827) Plumb per app, per user and per queue resource utilization from the NM to RM

2018-09-27 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630610#comment-16630610
 ] 

Arun Suresh commented on YARN-8827:
---

I guess we just need per app utilization.. since the queue and user etc can be 
derived at the RM.

> Plumb per app, per user and per queue resource utilization from the NM to RM
> 
>
> Key: YARN-8827
> URL: https://issues.apache.org/jira/browse/YARN-8827
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Major
>
> Opportunistic Containers for OverAllocation need to be allocated to pending 
> applications in some fair manner. Rather than evaluating queue and user 
> resource usage (allocated resource usage) and comparing against queue and 
> user limits to decide the allocation, it might  make more sense to use a 
> snapshot of actual resource utilization of the queue and user.
> To facilitate this, this JIRA proposes to aggregate per user, per app (and 
> maybe per queue) resource utilization in addition to aggregated Container and 
> Node Utilization and send it along with the NM heartbeat. It should be fairly 
> inexpensive to aggregate - since it can be performed in the same loop of the 
> {{ContainersMonitorImpl}}'s Monitoring thread.
> A snapshot aggregate can be made every couple of seconds in the RM. This 
> instantaneous resource utilization should be used to decide if Opportunistic 
> containers can be allocated to an App, Queue or User.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8800) Updated documentation of Submarine with latest examples.

2018-09-27 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630553#comment-16630553
 ] 

Sunil Govindan commented on YARN-8800:
--

Yes. I was also seeing that. Images were not coming.

contents are all good.

> Updated documentation of Submarine with latest examples.
> 
>
> Key: YARN-8800
> URL: https://issues.apache.org/jira/browse/YARN-8800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8800.001.patch, YARN-8800.002.patch, 
> YARN-8800.003.patch, YARN-8800.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8800) Updated documentation of Submarine with latest examples.

2018-09-27 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630553#comment-16630553
 ] 

Sunil Govindan edited comment on YARN-8800 at 9/27/18 2:52 PM:
---

Yes. I was also seeing that. Images were not coming, i thought some pblm is my 
setup. :)

contents are all good.


was (Author: sunilg):
Yes. I was also seeing that. Images were not coming.

contents are all good.

> Updated documentation of Submarine with latest examples.
> 
>
> Key: YARN-8800
> URL: https://issues.apache.org/jira/browse/YARN-8800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8800.001.patch, YARN-8800.002.patch, 
> YARN-8800.003.patch, YARN-8800.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8800) Updated documentation of Submarine with latest examples.

2018-09-27 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630534#comment-16630534
 ] 

Wangda Tan commented on YARN-8800:
--

Please hold on, there're a few issues of the images, let me fix it and upload 
the patch by today.

> Updated documentation of Submarine with latest examples.
> 
>
> Key: YARN-8800
> URL: https://issues.apache.org/jira/browse/YARN-8800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8800.001.patch, YARN-8800.002.patch, 
> YARN-8800.003.patch, YARN-8800.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-09-27 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630442#comment-16630442
 ] 

Weiwei Yang commented on YARN-8468:
---

Hi [~bsteinbach]

I've read the patch again, it almost looks fine to me. Just one more thing: I 
think we can remove {{TestAllocationFileLoaderService#initResourceTypes}}, and 
replace that with

{code}
TestResourceUtils.addNewTypesToResources(A_CUSTOM_RESOURCE);
{code}

Some other minor issues:
TestRMServerUtils: line 66, 105, 107 exceeds 80 char limit, and some checkstyle 
issues in {{TestApplicationMasterServiceWithFS}} too, see more in the jenkins 
report. You can try fix as much as you can.
Thanks


> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, 
> YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, 
> YARN-8468.017.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
> The goal of this ticket is to allow this value to be set on a per queue basis.
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
> Suggested solution:
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability(String queueName) in both 
> FSParentQueue and FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * Enforce the use of queue based maximum allocation limit if it is 
> available, if not use the general scheduler level setting
>  ** Use it during validation and normalization of requests in 
> scheduler.allocate, app submit and resource request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-09-27 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630368#comment-16630368
 ] 

Weiwei Yang commented on YARN-8468:
---

Hi [~bsteinbach]

Thanks for taking efforts on investigating this approach, I appreciate that.

I agree most of your analysis, but some un-avoid changes were due to historic 
reasons. It looks like what I suggested earlier was difficult to achieve at 
this point, it can not make the change any simpler or less risky. Due to these 
facts, I agree to follow the approach in your patch, I will review the latest 
patch again.

Thanks!

> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, 
> YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, 
> YARN-8468.017.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
> The goal of this ticket is to allow this value to be set on a per queue basis.
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
> Suggested solution:
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability(String queueName) in both 
> FSParentQueue and FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * Enforce the use of queue based maximum allocation limit if it is 
> available, if not use the general scheduler level setting
>  ** Use it during validation and normalization of requests in 
> scheduler.allocate, app submit and resource request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers

2018-09-27 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630305#comment-16630305
 ] 

Craig Condit commented on YARN-6456:


[~eyang], your assessment of how the properties interact is correct. The 
{{yarn.nodemanager.runtime.linux.allowed-runtimes}} property dictates the set 
of runtimes which may be selected from, while 
{{yarn.nodemanager.runtime.linux.type}} sets the default. Without this, all 
application submissions would need to specify a runtime type or they would 
fail. This is also why a default docker image can be specified. The idea is 
that an administrator can allow jobs to run under any runtime without 
user-visible configuration.

The mapping between runtime type names and classes uses the same logic as the 
{{YARN_CONTAINER_RUNTIME_TYPE}} environment variable (and in fact uses the same 
code). The value of {{yarn.nodeanager.runtime.linux.type}} is used as a default 
for {{YARN_CONTAINER_RUNTIME_TYPE}} if it is not provided by the user. 
Similarly, {{yarn.nodemanager.runtime.linux.docker.image-name}} is used as a 
default for {{YARN_CONTAINER_RUNTIME_DOCKER_IMAGE}}.

> Allow administrators to set a single ContainerRuntime for all containers
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Craig Condit
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, 
> YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, 
> YARN-6456.004.patch, YARN-6456.005.patch
>
>
>  
> With LCE, there are multiple ContainerRuntimes available for handling 
> different types of containers; default, docker, java sandbox. Admins should 
> have the ability to override the user decision and set a single global 
> ContainerRuntime to be used for all containers.
> Original Description:
> {quote}One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
>  1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
>  2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
>  3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8788) mvn package -Pyarn-ui fails on JDK9

2018-09-27 Thread Vidura Bhathiya Mudalige (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630290#comment-16630290
 ] 

Vidura Bhathiya Mudalige commented on YARN-8788:


Hi [~ajisakaa],

I have recreated the issue. But it seems we have to wait until wro4j 1.8.1 is 
released. right?

> mvn package -Pyarn-ui fails on JDK9
> ---
>
> Key: YARN-8788
> URL: https://issues.apache.org/jira/browse/YARN-8788
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: Java 9.0.4, CentOS 7.5
>Reporter: Akira Ajisaka
>Priority: Major
>  Labels: newbie
>
> {{mvn package -Pdist,native,yarn-ui -Dtar -DskipTests}} failed on trunk.
> {noformat}
> [ERROR] Failed to execute goal ro.isdc.wro4j:wro4j-maven-plugin:1.7.9:run 
> (default) on project hadoop-yarn-ui: Execution default of goal 
> ro.isdc.wro4j:wro4j-maven-plugin:1.7.9:run failed: An API incompatibility was 
> encountered while executing ro.isdc.wro4j:wro4j-maven-plugin:1.7.9:run: 
> java.lang.ExceptionInInitializerError: null
> [ERROR] -
> [ERROR] realm =plugin>ro.isdc.wro4j:wro4j-maven-plugin:1.7.9
> [ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
> [ERROR] urls[0] = 
> file:/home/aajisaka/.m2/repository/ro/isdc/wro4j/wro4j-maven-plugin/1.7.9/wro4j-maven-plugin-1.7.9.jar
> [ERROR] urls[1] = 
> file:/home/aajisaka/.m2/repository/ro/isdc/wro4j/wro4j-core/1.7.9/wro4j-core-1.7.9.jar
> [ERROR] urls[2] = 
> file:/home/aajisaka/.m2/repository/org/apache/commons/commons-lang3/3.4/commons-lang3-3.4.jar
> (snip)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception

2018-09-27 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630246#comment-16630246
 ] 

Bibin A Chundatt commented on YARN-8829:


[~akki261001]

This is also possible if subclusters are not started also. Active subclusters 
are empty exception is possible.


> Cluster metrics can  fail with IndexOutOfBound Exception
> 
>
> Key: YARN-8829
> URL: https://issues.apache.org/jira/browse/YARN-8829
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akshay Agarwal
>Assignee: Akshay Agarwal
>Priority: Minor
> Attachments: YARN-8829.001.patch
>
>
> If no sub clusters are available in Router Based Federation Setup ,cluster 
> metrics can throw "IndexOutOfBoundException".
> Additional check is required for sub cluster is Empty.
> *Exception details:*
> {noformat}
> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, 
> Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>  at java.util.ArrayList.get(ArrayList.java:429)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception

2018-09-27 Thread Akshay Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630234#comment-16630234
 ] 

Akshay Agarwal commented on YARN-8829:
--

[~sunilg]

After cluster setup, did shutdown of all the sub-clusters , in that case it is 
showing this behaviour.

> Cluster metrics can  fail with IndexOutOfBound Exception
> 
>
> Key: YARN-8829
> URL: https://issues.apache.org/jira/browse/YARN-8829
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akshay Agarwal
>Assignee: Akshay Agarwal
>Priority: Minor
> Attachments: YARN-8829.001.patch
>
>
> If no sub clusters are available in Router Based Federation Setup ,cluster 
> metrics can throw "IndexOutOfBoundException".
> Additional check is required for sub cluster is Empty.
> *Exception details:*
> {noformat}
> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, 
> Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>  at java.util.ArrayList.get(ArrayList.java:429)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception

2018-09-27 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630209#comment-16630209
 ] 

Sunil Govindan commented on YARN-8829:
--

[~akki261001]

In which case "no sub clusters are available in Router Based Federation Setup" 
is possible? Are we missing some validation while configuring?

Its interesting to see its  breaking here.

> Cluster metrics can  fail with IndexOutOfBound Exception
> 
>
> Key: YARN-8829
> URL: https://issues.apache.org/jira/browse/YARN-8829
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akshay Agarwal
>Assignee: Akshay Agarwal
>Priority: Minor
> Attachments: YARN-8829.001.patch
>
>
> If no sub clusters are available in Router Based Federation Setup ,cluster 
> metrics can throw "IndexOutOfBoundException".
> Additional check is required for sub cluster is Empty.
> *Exception details:*
> {noformat}
> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, 
> Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>  at java.util.ArrayList.get(ArrayList.java:429)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception

2018-09-27 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630199#comment-16630199
 ] 

Bibin A Chundatt commented on YARN-8829:


+1 lgtm . Looks direct to me. 

Will commit once jenkins completes.

> Cluster metrics can  fail with IndexOutOfBound Exception
> 
>
> Key: YARN-8829
> URL: https://issues.apache.org/jira/browse/YARN-8829
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akshay Agarwal
>Assignee: Akshay Agarwal
>Priority: Minor
> Attachments: YARN-8829.001.patch
>
>
> If no sub clusters are available in Router Based Federation Setup ,cluster 
> metrics can throw "IndexOutOfBoundException".
> Additional check is required for sub cluster is Empty.
> *Exception details:*
> {noformat}
> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, 
> Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>  at java.util.ArrayList.get(ArrayList.java:429)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception

2018-09-27 Thread Akshay Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay Agarwal updated YARN-8829:
-
Attachment: YARN-8829.001.patch

> Cluster metrics can  fail with IndexOutOfBound Exception
> 
>
> Key: YARN-8829
> URL: https://issues.apache.org/jira/browse/YARN-8829
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akshay Agarwal
>Assignee: Akshay Agarwal
>Priority: Minor
> Attachments: YARN-8829.001.patch
>
>
> If no sub clusters are available in Router Based Federation Setup ,cluster 
> metrics can throw "IndexOutOfBoundException".
> Additional check is required for sub cluster is Empty.
> *Exception details:*
> {noformat}
> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, 
> Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>  at java.util.ArrayList.get(ArrayList.java:429)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception

2018-09-27 Thread Akshay Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630195#comment-16630195
 ] 

Akshay Agarwal commented on YARN-8829:
--

Attached the patch ! Please review.

> Cluster metrics can  fail with IndexOutOfBound Exception
> 
>
> Key: YARN-8829
> URL: https://issues.apache.org/jira/browse/YARN-8829
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akshay Agarwal
>Assignee: Akshay Agarwal
>Priority: Minor
> Attachments: YARN-8829.001.patch
>
>
> If no sub clusters are available in Router Based Federation Setup ,cluster 
> metrics can throw "IndexOutOfBoundException".
> Additional check is required for sub cluster is Empty.
> *Exception details:*
> {noformat}
> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, 
> Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>  at java.util.ArrayList.get(ArrayList.java:429)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-09-27 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630188#comment-16630188
 ] 

Antal Bálint Steinbach commented on YARN-8468:
--

Hi [~cheersyang], [~haibochen] ,

I spent some hours to dig deeper and try to do this change. I would say it is 
partly possible to do it. Let me tell you my findings and decide based on that. 
Sorry, for the long post.

There are 2 things here.
h3. *Normalization on 2 levels*

as Weiwei called it (*#1*) RMAppManager({{ApplicationClientProtocol}}), 
DefaultAMSProcessor and (*#2*) {{FS/CS#allocate}}

It turned out that it makes sense to do both because they are doing slightly 
different things. I found this after I removed #2 and there were some tests 
failing which were building unconsciously on this feature.

#1 _RMServerUtils.normalizeAndValidateRequests(...)_
Throws an exception if the allocation is higher then the allowed one

#2 _ask.setCapability(SchedulerUtils.getNormalizedResource(...))_
Sets the required allocation to the allowed maximum if it is higher than the 
allowed maximum or increments the required allocation to be higher than the 
minimum allocation if it is lower then the minimum allocation.

The solution can be to merge them into level #1 as you suggested but then we 
have to figure out if we would like to keep the exception throwing or the 
fixing behavior or both (throw on higher, fix on lower)
h3. *Protect Yarnscheduler* api

I would say we cannot do this.
 * We have to calculate maxResourceAllocation for 
_SchedulerUtils.getNormalizedResource(...)_ for that we need the queueName. 
There is no workaround for that.
 * We would like to calculate maxResourceAllocation once per allocate(), 
therefore we have to calculate it before iterating over asks and do 
normalization like here (RMAppManager):

 
{code:java}
Resource maxAllocation = scheduler.getMaximumResourceCapability(queue);
for (ResourceRequest amReq : amReqs) {
 SchedulerUtils.normalizeAndValidateRequest(amReq, maxAllocation,
 queue, scheduler, isRecovery, rmContext, null);

 amReq.setCapability(scheduler.getNormalizedResource(
 amReq.getCapability(), maxAllocation));{code}
 

If I understand correctly your suggestion was to skip 
_scheduler.getNormalizedResource_ and use directly the static 
_SchedulerUtils.getNormalizedResource(...)_ method. In this way, we don't have 
to change YarnScheduler API because we don't have to pass around 
maxResourceAllocation.

The problem with this that scheduler.getNormalizedResource is overridden in 
FairScheduler.

FS:

 
{code:java}
@Override
public Resource getNormalizedResource(Resource requestedResource,
 Resource maxResourceCapability) {
 return SchedulerUtils.getNormalizedResource(requestedResource,
 DOMINANT_RESOURCE_CALCULATOR,
 minimumAllocation,
 maxResourceCapability,
 incrAllocation);
}{code}
 

AbstractYarnScheduler:

 
{code:java}
@Override
public Resource getNormalizedResource(Resource requestedResource,
 Resource maxResourceCapability) {
 return SchedulerUtils.getNormalizedResource(requestedResource,
 getResourceCalculator(),
 getMinimumResourceCapability(),
 maxResourceCapability,
 getMinimumResourceCapability());
}{code}
 

This means that if I would like to call 
_SchedulerUtils.getNormalizedResource(...)_ from for example RMAppManager I 
still need the scheduler to get parameters like this:

 
{code:java}
Resource maxAllocation = scheduler.getMaximumResourceCapability(queue);
for (ResourceRequest amReq : amReqs) {
 SchedulerUtils.normalizeAndValidateRequest(amReq, maxAllocation,
 queue, scheduler, isRecovery, rmContext, null);

 Resource normalizedCapability = SchedulerUtils.getNormalizedResource(
 amReq.getCapability(), scheduler.getResourceCalculator(),
 scheduler.getMinimumResourceCapability(), maxAllocation,
 scheduler.getIncrementAllocation());

 amReq.setCapability(normalizedCapability);
}{code}
 

This means I have to introduce a new method in the YarnScheduler API called 
getIncrementAllocation, which is against our starting point. Also, we can 
delete the _getNormalizedResource_ method because it will be never used, that 
is another change in the API.

Furthermore, _scheduler.getNormalizedResource_ is used in several tests as a 
mocked method. If we replace that call to a static method call we will be in 
trouble in those tests also.

  

> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> 

[jira] [Updated] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception

2018-09-27 Thread Akshay Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay Agarwal updated YARN-8829:
-
Description: 
If no sub clusters are available in Router Based Federation Setup ,cluster 
metrics can throw "IndexOutOfBoundException".

Additional check is required for sub cluster is Empty.

*Exception details:*
{noformat}
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:653)
 at java.util.ArrayList.get(ArrayList.java:429)
 at 
org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654)
 at 
org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604)
 at 
org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232)
 at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253)
 at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
{noformat}

  was:
If no sub clusters are available in Router Based Federation Setup ,cluster 
metrics can throw "IndexOutOfBoundException".

Additional check is required for sub cluster is Empty.

*Exception details:*
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:653)
 at java.util.ArrayList.get(ArrayList.java:429)
 at 
org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654)
 at 
org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604)
 at 
org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232)
 at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253)
 at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)


> Cluster metrics can  fail with IndexOutOfBound Exception
> 
>
> Key: YARN-8829
> URL: https://issues.apache.org/jira/browse/YARN-8829
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akshay Agarwal
>Assignee: Akshay Agarwal
>Priority: Minor
>
> If no sub clusters are available in Router Based Federation Setup ,cluster 
> metrics can throw "IndexOutOfBoundException".
> Additional check is required for sub cluster is Empty.
> *Exception details:*
> {noformat}
> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, 
> Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>  at java.util.ArrayList.get(ArrayList.java:429)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception

2018-09-27 Thread Akshay Agarwal (JIRA)
Akshay Agarwal created YARN-8829:


 Summary: Cluster metrics can  fail with IndexOutOfBound Exception
 Key: YARN-8829
 URL: https://issues.apache.org/jira/browse/YARN-8829
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Akshay Agarwal


If no sub clusters are available in Router Based Federation Setup ,cluster 
metrics can throw "IndexOutOfBoundException".

Additional check is required for sub cluster is Empty.

*Exception details:*
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:653)
 at java.util.ArrayList.get(ArrayList.java:429)
 at 
org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654)
 at 
org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604)
 at 
org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232)
 at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253)
 at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
 at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception

2018-09-27 Thread Akshay Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay Agarwal reassigned YARN-8829:


Assignee: Akshay Agarwal

> Cluster metrics can  fail with IndexOutOfBound Exception
> 
>
> Key: YARN-8829
> URL: https://issues.apache.org/jira/browse/YARN-8829
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akshay Agarwal
>Assignee: Akshay Agarwal
>Priority: Minor
>
> If no sub clusters are available in Router Based Federation Setup ,cluster 
> metrics can throw "IndexOutOfBoundException".
> Additional check is required for sub cluster is Empty.
> *Exception details:*
> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, 
> Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>  at java.util.ArrayList.get(ArrayList.java:429)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8828) When config ReservationSystem ,the RM start failed.

2018-09-27 Thread yimeng (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yimeng updated YARN-8828:
-
Attachment: capacity-scheduler.xml
yarn-site.xml

> When config ReservationSystem ,the RM start failed.
> ---
>
> Key: YARN-8828
> URL: https://issues.apache.org/jira/browse/YARN-8828
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: yimeng
>Priority: Major
>  Labels: usability
> Attachments: capacity-scheduler.xml, yarn-site.xml
>
>
> I tested ReservationSystem in Hadooop 3.0,but it seems have problem.
> 1.config yarn.resourcemanager.reservation-system.enable = true in RM 
> yarn-site.xml 
> 2.select a leaf queue "bbb" ,config 
> yarn.scheduler.capacity.root.bbb.reservable = true  in 
> capacity-scheduler.xml,as follow
> 
> yarn.scheduler.capacity.root.bbb.reservable
> true
> 
> 3.then restart RM ,the RM start failed .The error stack log is as follows:
> 2018-09-27 11:30:15,691 | FATAL | main | Error starting ResourceManager | 
> ResourceManager.java:1517
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: mapping 
> contains invalid or non-leaf queue : bbb
>  at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>  at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:813)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1214)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:315)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1510)
> Caused by: java.io.IOException: mapping contains invalid or non-leaf queue : 
> bbb
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.validateAndGetQueueMapping(UserGroupMappingPlacementRule.java:316)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.get(UserGroupMappingPlacementRule.java:280)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getUserGroupMappingPlacementRule(CapacityScheduler.java:668)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updatePlacementRules(CapacityScheduler.java:689)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:716)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:360)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:425)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>  ... 7 more
>  
> I am sure the queue "bbb" is a leaf queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org