date:20180822

[jira] [Commented] (YARN-8692) Support node utilization metrics for SLS

2018-08-22 Thread Tao Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589730#comment-16589730
 ] 

Tao Yang commented on YARN-8692:


Thanks [~cheersyang] for the feedback. 
Attached v1 patch, please help to review in your free time.

> Support node utilization metrics for SLS
> 
>
> Key: YARN-8692
> URL: https://issues.apache.org/jira/browse/YARN-8692
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8692.001.patch, image-2018-08-21-18-04-22-749.png
>
>
> The distribution of node utilization is an important healthy factor for the 
> YARN cluster, related metrics in SLS can be used to evaluate the scheduling 
> effects and optimize related configurations. 
> To implement this improvement, we need to do things as below:
> (1) Add input configurations (contain avg and stddev for cpu/memory 
> utilization ratio) and generate utilization samples for tasks, not include AM 
> container cause I think it's negligible.
> (2) Simulate containers and node utilization within node status. 
> (3) calculate and generate the distribution metrics and use standard 
> deviation metric (stddev for short) to evaluate the effects(smaller is 
> better).  
> (4) show these metrics on SLS simulator page like this:
> !image-2018-08-21-18-04-22-749.png!
> For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, 
> and P0 represents 0%~9% utilization ratio(containers-utilization / 
> node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents 
> 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization 
> ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8692) Support node utilization metrics for SLS

2018-08-22 Thread Tao Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-8692:
---
Attachment: YARN-8692.001.patch

> Support node utilization metrics for SLS
> 
>
> Key: YARN-8692
> URL: https://issues.apache.org/jira/browse/YARN-8692
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8692.001.patch, image-2018-08-21-18-04-22-749.png
>
>
> The distribution of node utilization is an important healthy factor for the 
> YARN cluster, related metrics in SLS can be used to evaluate the scheduling 
> effects and optimize related configurations. 
> To implement this improvement, we need to do things as below:
> (1) Add input configurations (contain avg and stddev for cpu/memory 
> utilization ratio) and generate utilization samples for tasks, not include AM 
> container cause I think it's negligible.
> (2) Simulate containers and node utilization within node status. 
> (3) calculate and generate the distribution metrics and use standard 
> deviation metric (stddev for short) to evaluate the effects(smaller is 
> better).  
> (4) show these metrics on SLS simulator page like this:
> !image-2018-08-21-18-04-22-749.png!
> For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, 
> and P0 represents 0%~9% utilization ratio(containers-utilization / 
> node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents 
> 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization 
> ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8692) Support node utilization metrics for SLS

2018-08-22 Thread Tao Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589719#comment-16589719
 ] 

Tao Yang edited comment on YARN-8692 at 8/23/18 5:48 AM:
-

{quote}
I am curious how node memory/cpu is calculated here? Is it based on the 
allocated memory/cpu? 
{quote}
Yes, it's based on allocated memory/cpu. 
Detailed calculation as follow: 
{noformat}
container-utilization = $container-allocated-resource * $task-utilization-ratio
node-utilization = sum($container-utilization)
{noformat}
{{$task-utilization-ratio}} can be configured with average and standard 
deviation, so that we can generate different task-utilization-ratio samples as 
we wanted for containers.
For example, we can configured "memory_utilization_ratio":{ "val": 0.5, "std": 
0.01} for map tasks so that the memory utilization for map containers will be 
calculated as below: 
{noformat}
allocated-memory = 1000
memory-utilization-ratio-sample is a random double value from 0.49 to 0.51
memory-utilization-of-map-container = $allocated-memory * 
$memory-utilization-ratio-sample
{noformat}
As a result, utilization of map container can be 490, 491, 492, ..., 508, 509 
or 510


was (Author: tao yang):
{quote}
I am curious how node memory/cpu is calculated here? Is it based on the 
allocated memory/cpu? 
{quote}
Yes, it's based on allocated memory/cpu. 
Detailed calculation as follow: 
{noformat}
node-utilization = sum(container-utilization)
container-utilization = container-allocated-resource * task-utilization-ratio
{noformat}
{{task-utilization-ratio}} can be configured with average and standard 
deviation, so that we can generate different task-utilization-ratio samples as 
we wanted for containers.
For example, we can configured {{"memory_utilization_ratio":{ "val": 0.5, 
"std": 0.01}}} for map tasks so that we can calculate the memory utilization 
for map containers as below: 
{noformat}
allocated-memory = 1000
memory-utilization-ratio-sample is a random double value from 0.49 to 0.51
memory-utilization-of-map-container = $allocated-memory * 
$memory-utilization-ratio-sample
{noformat}
so that utilization of map container can be 490, 491, 492, ..., 508, 509 or 510

> Support node utilization metrics for SLS
> 
>
> Key: YARN-8692
> URL: https://issues.apache.org/jira/browse/YARN-8692
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: image-2018-08-21-18-04-22-749.png
>
>
> The distribution of node utilization is an important healthy factor for the 
> YARN cluster, related metrics in SLS can be used to evaluate the scheduling 
> effects and optimize related configurations. 
> To implement this improvement, we need to do things as below:
> (1) Add input configurations (contain avg and stddev for cpu/memory 
> utilization ratio) and generate utilization samples for tasks, not include AM 
> container cause I think it's negligible.
> (2) Simulate containers and node utilization within node status. 
> (3) calculate and generate the distribution metrics and use standard 
> deviation metric (stddev for short) to evaluate the effects(smaller is 
> better).  
> (4) show these metrics on SLS simulator page like this:
> !image-2018-08-21-18-04-22-749.png!
> For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, 
> and P0 represents 0%~9% utilization ratio(containers-utilization / 
> node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents 
> 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization 
> ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8692) Support node utilization metrics for SLS

2018-08-22 Thread Tao Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589719#comment-16589719
 ] 

Tao Yang commented on YARN-8692:


{quote}
I am curious how node memory/cpu is calculated here? Is it based on the 
allocated memory/cpu? 
{quote}
Yes, it's based on allocated memory/cpu. 
Detailed calculation as follow: 
{noformat}
node-utilization = sum(container-utilization)
container-utilization = container-allocated-resource * task-utilization-ratio
{noformat}
{{task-utilization-ratio}} can be configured with average and standard 
deviation, so that we can generate different task-utilization-ratio samples as 
we wanted for containers.
For example, we can configured {{"memory_utilization_ratio":{ "val": 0.5, 
"std": 0.01}}} for map tasks so that we can calculate the memory utilization 
for map containers as below: 
{noformat}
allocated-memory = 1000
memory-utilization-ratio-sample is a random double value from 0.49 to 0.51
memory-utilization-of-map-container = $allocated-memory * 
$memory-utilization-ratio-sample
{noformat}
so that utilization of map container can be 490, 491, 492, ..., 508, 509 or 510

> Support node utilization metrics for SLS
> 
>
> Key: YARN-8692
> URL: https://issues.apache.org/jira/browse/YARN-8692
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: image-2018-08-21-18-04-22-749.png
>
>
> The distribution of node utilization is an important healthy factor for the 
> YARN cluster, related metrics in SLS can be used to evaluate the scheduling 
> effects and optimize related configurations. 
> To implement this improvement, we need to do things as below:
> (1) Add input configurations (contain avg and stddev for cpu/memory 
> utilization ratio) and generate utilization samples for tasks, not include AM 
> container cause I think it's negligible.
> (2) Simulate containers and node utilization within node status. 
> (3) calculate and generate the distribution metrics and use standard 
> deviation metric (stddev for short) to evaluate the effects(smaller is 
> better).  
> (4) show these metrics on SLS simulator page like this:
> !image-2018-08-21-18-04-22-749.png!
> For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, 
> and P0 represents 0%~9% utilization ratio(containers-utilization / 
> node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents 
> 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization 
> ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8015) Support all types of placement constraint support for Capacity Scheduler

2018-08-22 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589704#comment-16589704
 ] 

Weiwei Yang commented on YARN-8015:
---

Thanks [~sunilg]!

> Support all types of placement constraint support for Capacity Scheduler
> 
>
> Key: YARN-8015
> URL: https://issues.apache.org/jira/browse/YARN-8015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Fix For: 3.2.0
>
> Attachments: YARN-8015.001.patch, YARN-8015.002.patch, 
> YARN-8015.003.patch, YARN-8015.004.patch
>
>
> AppPlacementAllocator currently only supports intra-app anti-affinity 
> placement constraints, once YARN-8002 and YARN-8013 are resolved, it needs to 
> support inter-app constraints too. Also, this may require some refactoring on 
> the existing code logic. Use this JIRA to track.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8015) Support all types of placement constraint support for Capacity Scheduler

2018-08-22 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589692#comment-16589692
 ] 

Hudson commented on YARN-8015:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14818 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14818/])
YARN-8015. Support all types of placement constraint support for (sunilg: rev 
1ac01444a24faee6f74f2e83d9521eb4e0be651b)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/SingleConstraintAppPlacementAllocator.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestSchedulingRequestContainerAllocation.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/placement/TestSingleConstraintAppPlacementAllocator.java


> Support all types of placement constraint support for Capacity Scheduler
> 
>
> Key: YARN-8015
> URL: https://issues.apache.org/jira/browse/YARN-8015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Fix For: 3.2.0
>
> Attachments: YARN-8015.001.patch, YARN-8015.002.patch, 
> YARN-8015.003.patch, YARN-8015.004.patch
>
>
> AppPlacementAllocator currently only supports intra-app anti-affinity 
> placement constraints, once YARN-8002 and YARN-8013 are resolved, it needs to 
> support inter-app constraints too. Also, this may require some refactoring on 
> the existing code logic. Use this JIRA to track.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1

2018-08-22 Thread Sen Zhao (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sen Zhao updated YARN-8701:
---
Description: 
If I configure *MaxResources* in fair-scheduler.xml, like this:
{code}resource1=50{code}
In the queue, the *MaxResources* value will change to 
{code}Max Resources: {code}

I think the value of VCores should be *CLUSTER_VCORES*.

  was:
If I configure *MaxResources* in fair-scheduler.xml, like this:
{code}resource1=50{code}
In the queue, the *MaxResources* value will change to 
 {code}memory:CLUSTER_MEMORY, VCores:0, 
resource1:50{code}

I think the value of VCores should be *CLUSTER_VCORES*.


> If the single parameter in Resources#createResourceWithSameValue is greater 
> than Integer.MAX_VALUE, then the value of vcores will be -1
> ---
>
> Key: YARN-8701
> URL: https://issues.apache.org/jira/browse/YARN-8701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Major
> Attachments: YARN-8701.001.patch
>
>
> If I configure *MaxResources* in fair-scheduler.xml, like this:
> {code}resource1=50{code}
> In the queue, the *MaxResources* value will change to 
> {code}Max Resources: {code}
> I think the value of VCores should be *CLUSTER_VCORES*.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1

2018-08-22 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589689#comment-16589689
 ] 

genericqa commented on YARN-8701:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
5s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8701 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936755/YARN-8701.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 42f5e9a8913d 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b021249 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21664/testReport/ |
| Max. process+thread count | 329 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21664/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> If the single parameter in Resources#createResourceWithSameValue is greater 
> than Integer.MAX_VALUE, then the value of vcores

[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-22 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589688#comment-16589688
 ] 

genericqa commented on YARN-8649:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 45s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m  
7s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8649 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936754/YARN-8649_5.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d74f059f170e 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b021249 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21663/testReport/ |
| Max. process+thread count | 336 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21663/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Similar as YARN-4355:NPE while processing localizer

[jira] [Updated] (YARN-8015) Support affinity placement constraint support for Capacity Scheduler

2018-08-22 Thread Sunil Govindan (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8015:
-
Summary: Support affinity placement constraint support for Capacity 
Scheduler  (was: Complete placement constraint support for Capacity Scheduler)

> Support affinity placement constraint support for Capacity Scheduler
> 
>
> Key: YARN-8015
> URL: https://issues.apache.org/jira/browse/YARN-8015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: YARN-8015.001.patch, YARN-8015.002.patch, 
> YARN-8015.003.patch, YARN-8015.004.patch
>
>
> AppPlacementAllocator currently only supports intra-app anti-affinity 
> placement constraints, once YARN-8002 and YARN-8013 are resolved, it needs to 
> support inter-app constraints too. Also, this may require some refactoring on 
> the existing code logic. Use this JIRA to track.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8015) Support all types of placement constraint support for Capacity Scheduler

2018-08-22 Thread Sunil Govindan (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8015:
-
Summary: Support all types of placement constraint support for Capacity 
Scheduler  (was: Support affinity placement constraint support for Capacity 
Scheduler)

> Support all types of placement constraint support for Capacity Scheduler
> 
>
> Key: YARN-8015
> URL: https://issues.apache.org/jira/browse/YARN-8015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: YARN-8015.001.patch, YARN-8015.002.patch, 
> YARN-8015.003.patch, YARN-8015.004.patch
>
>
> AppPlacementAllocator currently only supports intra-app anti-affinity 
> placement constraints, once YARN-8002 and YARN-8013 are resolved, it needs to 
> support inter-app constraints too. Also, this may require some refactoring on 
> the existing code logic. Use this JIRA to track.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job

2018-08-22 Thread Zhankun Tang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589682#comment-16589682
 ] 

Zhankun Tang commented on YARN-8698:


[~yuan_zac] Yeah. Wrong HADOOP_COMMON_HOME env will cause "hadoop classpath" 
failed.

But did you specify the "DOCKER_HADOOP_HDFS_HOME" to the hadoop home directory 
in your Docker image? I guess if this is specified, at least the 
run-PRIMARY_WORKER.sh won't fail?

 
{code:java}
yarn jar path-to/hadoop-yarn-applications-submarine-3.2.0-SNAPSHOT.jar job run \
--env DOCKER_JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/ \
--env DOCKER_HADOOP_HDFS_HOME=/hadoop-3.1.0 ...
{code}
 

> [Submarine] Failed to add hadoop dependencies in docker container when 
> submitting a submarine job
> -
>
> Key: YARN-8698
> URL: https://issues.apache.org/jira/browse/YARN-8698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
> Attachments: YARN-8698.001.patch
>
>
> When a standalone submarine tf job is submitted, the following error is got :
> INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
>  INFO:tensorflow:Done calling model_fn.
>  INFO:tensorflow:Create CheckpointSaverHook.
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  
> This error may be related to hadoop classpath
> Hadoop env variables of launch_container.sh are as follows:
> export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
>  export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
>  
> run-PRIMARY_WORKER.sh is like:
> export HADOOP_YARN_HOME=
>  export HADOOP_HDFS_HOME=/hadoop-3.1.0
>  export HADOOP_CONF_DIR=$WORK_DIR
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8692) Support node utilization metrics for SLS

2018-08-22 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589680#comment-16589680
 ] 

Weiwei Yang commented on YARN-8692:
---

+1 for the idea, it will be very helpful for testing load distribution. I am 
curious how node memory/cpu is calculated here? Is it based on the allocated 
memory/cpu? 

> Support node utilization metrics for SLS
> 
>
> Key: YARN-8692
> URL: https://issues.apache.org/jira/browse/YARN-8692
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: image-2018-08-21-18-04-22-749.png
>
>
> The distribution of node utilization is an important healthy factor for the 
> YARN cluster, related metrics in SLS can be used to evaluate the scheduling 
> effects and optimize related configurations. 
> To implement this improvement, we need to do things as below:
> (1) Add input configurations (contain avg and stddev for cpu/memory 
> utilization ratio) and generate utilization samples for tasks, not include AM 
> container cause I think it's negligible.
> (2) Simulate containers and node utilization within node status. 
> (3) calculate and generate the distribution metrics and use standard 
> deviation metric (stddev for short) to evaluate the effects(smaller is 
> better).  
> (4) show these metrics on SLS simulator page like this:
> !image-2018-08-21-18-04-22-749.png!
> For Node memory/CPU utilization distribution graphs, Y-axis is nodes number, 
> and P0 represents 0%~9% utilization ratio(containers-utilization / 
> node-total-resource), P1 represents 10%~19% utilization ratio, P2 represents 
> 20%~29% utilization ratio, ..., at last P9 represents 90%~100% utilization 
> ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1

2018-08-22 Thread Sen Zhao (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sen Zhao reassigned YARN-8701:
--

  Assignee: Sen Zhao
Attachment: YARN-8701.001.patch

> If the single parameter in Resources#createResourceWithSameValue is greater 
> than Integer.MAX_VALUE, then the value of vcores will be -1
> ---
>
> Key: YARN-8701
> URL: https://issues.apache.org/jira/browse/YARN-8701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Major
> Attachments: YARN-8701.001.patch
>
>
> If I configure *MaxResources* in fair-scheduler.xml, like this:
> {code}resource1=50{code}
> In the queue, the *MaxResources* value will change to 
>  {code}memory:CLUSTER_MEMORY, VCores:0, 
> resource1:50{code}
> I think the value of VCores should be *CLUSTER_VCORES*.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1

2018-08-22 Thread Sen Zhao (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sen Zhao updated YARN-8701:
---
Description: 
If I configure *MaxResources* in fair-scheduler.xml, like this:
{code}resource1=50{code}
In the queue, the *MaxResources* value will change to 
 {code}memory:CLUSTER_MEMORY, VCores:0, 
resource1:50{code}

I think the value of VCores should be *CLUSTER_VCORES*.

> If the single parameter in Resources#createResourceWithSameValue is greater 
> than Integer.MAX_VALUE, then the value of vcores will be -1
> ---
>
> Key: YARN-8701
> URL: https://issues.apache.org/jira/browse/YARN-8701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Reporter: Sen Zhao
>Priority: Major
>
> If I configure *MaxResources* in fair-scheduler.xml, like this:
> {code}resource1=50{code}
> In the queue, the *MaxResources* value will change to 
>  {code}memory:CLUSTER_MEMORY, VCores:0, 
> resource1:50{code}
> I think the value of VCores should be *CLUSTER_VCORES*.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-22 Thread lujie (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589652#comment-16589652
 ] 

lujie commented on YARN-8649:
-

@ [~jlowe]

I have improved the log as your suggestion. thanks for your cice review

> Similar as YARN-4355:NPE while processing localizer heartbeat
> -
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, 
> YARN-8649_4.patch, YARN-8649_5.patch, hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 which is reported by [# Jason Lowe]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-22 Thread lujie (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589652#comment-16589652
 ] 

lujie edited comment on YARN-8649 at 8/23/18 3:32 AM:
--

@ [~jlowe]

I have improved the log as your suggestion. thanks for your nice review


was (Author: xiaoheipangzi):
@ [~jlowe]

I have improved the log as your suggestion. thanks for your cice review

> Similar as YARN-4355:NPE while processing localizer heartbeat
> -
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, 
> YARN-8649_4.patch, YARN-8649_5.patch, hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 which is reported by [# Jason Lowe]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-22 Thread lujie (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8649:

Attachment: YARN-8649_5.patch

> Similar as YARN-4355:NPE while processing localizer heartbeat
> -
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, 
> YARN-8649_4.patch, YARN-8649_5.patch, hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 which is reported by [# Jason Lowe]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-8691) AMRMClient unregisterApplicationMaster Api's appMessage should have a maximum size

2018-08-22 Thread Sunil Govindan (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan reassigned YARN-8691:


Assignee: Yicong Cai

> AMRMClient unregisterApplicationMaster Api's appMessage should have a maximum 
> size
> --
>
> Key: YARN-8691
> URL: https://issues.apache.org/jira/browse/YARN-8691
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Critical
> Fix For: 2.7.7
>
>
> SparkSQL AM Codegen ERROR，then call unregister AM API and send the error 
> message to RM, RM receive the AM state and update to RMStateStore. The  
> Codegen error message maybe is huge, (Our case is about 200MB). If the 
> RMStateStore is ZKRMStateStore, it causes the same exception as YARN-6125, 
> but YARN-6125 doesn't cover the unregisterApplicationMaster's message cut.
>  
> SparkSQL Codegen error message show below:
> 18/08/18 08:34:54 ERROR codegen.CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Constant pool has grown past JVM 
> limit of 0x
>  /* 001 */ public java.lang.Object generate(Object[] references)
> { /* 002 */ return new SpecificSafeProjection(references); /* 003 */ }
> /* 004 */
>  /* 005 */ class SpecificSafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
>  ..
> about 2 million lines.
> ..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8691) AMRMClient unregisterApplicationMaster Api's appMessage should have a maximum size

2018-08-22 Thread Sunil Govindan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589647#comment-16589647
 ] 

Sunil Govindan commented on YARN-8691:
--

Thank [~caiyicong], assigned to u.

> AMRMClient unregisterApplicationMaster Api's appMessage should have a maximum 
> size
> --
>
> Key: YARN-8691
> URL: https://issues.apache.org/jira/browse/YARN-8691
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Critical
> Fix For: 2.7.7
>
>
> SparkSQL AM Codegen ERROR，then call unregister AM API and send the error 
> message to RM, RM receive the AM state and update to RMStateStore. The  
> Codegen error message maybe is huge, (Our case is about 200MB). If the 
> RMStateStore is ZKRMStateStore, it causes the same exception as YARN-6125, 
> but YARN-6125 doesn't cover the unregisterApplicationMaster's message cut.
>  
> SparkSQL Codegen error message show below:
> 18/08/18 08:34:54 ERROR codegen.CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Constant pool has grown past JVM 
> limit of 0x
>  /* 001 */ public java.lang.Object generate(Object[] references)
> { /* 002 */ return new SpecificSafeProjection(references); /* 003 */ }
> /* 004 */
>  /* 005 */ class SpecificSafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
>  ..
> about 2 million lines.
> ..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1

2018-08-22 Thread Sen Zhao (JIRA)

Sen Zhao created YARN-8701:
--

 Summary: If the single parameter in 
Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then 
the value of vcores will be -1
 Key: YARN-8701
 URL: https://issues.apache.org/jira/browse/YARN-8701
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Reporter: Sen Zhao






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8700) Application cannot un-registered

2018-08-22 Thread fox (JIRA)

fox created YARN-8700:
-

 Summary: Application cannot un-registered
 Key: YARN-8700
 URL: https://issues.apache.org/jira/browse/YARN-8700
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.3
Reporter: fox


Dear all, 

I found a problem with application unregistration in AWS EMR environment 
(emr-5.8.0, hadoop 2.7.3, spark 2.2.0). 

Application Type: Both Yarn and Spark

State: RUNNING

Inside the job logs, I got 

07:00:07.190 [main] INFO c.w.c.e.a.n.b.AbstractNormalBatchMain - [EDP2] Ready 
to run Tear Down
07:00:07.192 [main] INFO c.w.c.e.a.n.b.AbstractNormalBatchMain - [EDP2] Ready 
to run Tear Down
07:00:07.192 [main] INFO c.w.c.e.a.n.b.AbstractNormalBatchMain - [EDP2] Job 
Finish
07:00:07.195 [main] INFO o.s.c.a.AnnotationConfigApplicationContext - Closing 
org.springframework.context.annotation.AnnotationConfigApplicationContext@144ab54:
 startup date [Tue Aug 21 06:59:23 UTC 2018]; root of context hierarchy
07:00:07.306 [main] INFO o.s.s.c.ThreadPoolTaskExecutor - Shutting down 
ExecutorService 'redisClusterExecutor'
07:00:07.551 [main] INFO o.a.k.clients.producer.KafkaProducer - Closing the 
Kafka producer with timeoutMillis = 9223372036854775807 ms.
07:00:07.565 [main] INFO c.w.c.f.m.MessageQueueKafkaProducerImpl - Closed all 
the producer's connections for tenant: 7fd0356c-1258-11e8-abfd-0242ac110002.
07:00:09.869 [main] INFO c.w.c.edp2.normal.batch.AppMaster - finish run main 
method
07:00:09.870 [main] INFO c.w.c.edp2.normal.batch.AppMaster - delete temp file 
/tmp/aa33f388-f591-44a8-9aa3-13e2f8427c5d2802069659156113885.jar
07:00:10.112 [main] INFO o.a.h.y.c.api.impl.AMRMClientImpl - Waiting for 
application to be successfully unregistered.
07:00:10.215 [main] INFO o.a.h.y.c.api.impl.AMRMClientImpl - Waiting for 
application to be successfully unregistered.
07:00:10.319 [main] INFO o.a.h.y.c.api.impl.AMRMClientImpl - Waiting for 
application to be successfully unregistered.
07:00:10.422 [main] INFO o.a.h.y.c.api.impl.AMRMClientImpl - Waiting for 
application to be successfully unregistered.
07:00:10.528 [main] INFO o.a.h.y.c.api.impl.AMRMClientImpl - Waiting for 
application to be successfully unregistered.

 

and it keeps more than one day until I stopped the whole cluster. 

I also try to kill the application by yarn command, which also keeps forever 
waiting for application to be killed. 

hadoop@ip-10-100-2-124 ~]$ yarn application -kill application_1534810852740_0721
18/08/22 12:24:28 INFO impl.TimelineClientImpl: Timeline service address: 
http://ip-10-100-2-124.ap-northeast-1.compute.internal:8188/ws/v1/timeline/
18/08/22 12:24:29 INFO client.RMProxy: Connecting to ResourceManager at 
ip-10-100-2-124.ap-northeast-1.compute.internal/10.100.2.124:8032
Killing application application_1534810852740_0721
18/08/22 12:24:32 INFO impl.YarnClientImpl: Waiting for application 
application_1534810852740_0721 to be killed.
18/08/22 12:24:34 INFO impl.YarnClientImpl: Waiting for application 
application_1534810852740_0721 to be killed.
18/08/22 12:24:36 INFO impl.YarnClientImpl: Waiting for application 
application_1534810852740_0721 to be killed.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes

2018-08-22 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589605#comment-16589605
 ] 

Weiwei Yang commented on YARN-7863:
---

Hi [~Naganarasimha]

Unlike partitions, PC is not associated with resource, 
{quote}there is no way to find out for a given queue how much pending resources 
are there in each partition it can access.
{quote}
PCs are not associated with resource, so it's like an extra check after all 
other checks are done. Scheduler still calculates how much resource available 
in a partition for a given queue, assign resource from a node in this partition 
to a request, but if PC is not satisfied then the allocation proposal will be 
rejected. Partition in PC is not ready, to be honest, I am not sure if 
everything is align with existing label-based scheduling. I suggested in 
YARN-8015 to open a separate task for further enhance that.
{quote}And also i am not able to envisage the scenario where in partition needs 
to be OR'd with Allocation tags or Attributes.
{quote}
Agree, it won't make sense to put a OR between a partition constraint and a 
allocation-tag/attribute constraint. But other combinations are useful. We 
support this, however if a PC is really meaningful that is up to the user.

 

> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7863-YARN-3409.002.patch, 
> YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, 
> YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, 
> YARN-7863-YARN-3409.007.patch, YARN-7863-YARN-3409.008.patch, 
> YARN-7863.v0.patch
>
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job

2018-08-22 Thread Zac Zhou (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589597#comment-16589597
 ] 

Zac Zhou commented on YARN-8698:


Thanks a lot, [~leftnoteasy] :)

Hi [~tangzhankun], I think this issue is related to hadoop classpath. 

The hadoop path of nodemanager is different from the one of docker.

 launch_container.sh specifies HADOOP_COMMON_HOME to the path which doesn't 
exists in the docker container.

run-PRIMARY_WORKER.sh failed to execute the command:

export CLASSPATH=`$HADOOP_HDFS_HOME/bin/hadoop classpath --glob`

so classpath can't generated correctly.

I validate this issue with the following step:
 # move hadoop package to some path, like A.
 # specify HADOOP_COMMON_HOME to some other path, like B, which is not hadoop 
package location： export HADOOP_COMMON_HOME=B
 # execute the command: ${A}/bin/hadoop classpath --glob

We will get the following error：

 Error: Could not find or load main class org.apache.hadoop.util.Classpath

If any more info is needed， feel free to let me know~

Thanks

 

> [Submarine] Failed to add hadoop dependencies in docker container when 
> submitting a submarine job
> -
>
> Key: YARN-8698
> URL: https://issues.apache.org/jira/browse/YARN-8698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
> Attachments: YARN-8698.001.patch
>
>
> When a standalone submarine tf job is submitted, the following error is got :
> INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
>  INFO:tensorflow:Done calling model_fn.
>  INFO:tensorflow:Create CheckpointSaverHook.
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  
> This error may be related to hadoop classpath
> Hadoop env variables of launch_container.sh are as follows:
> export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
>  export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
>  
> run-PRIMARY_WORKER.sh is like:
> export HADOOP_YARN_HOME=
>  export HADOOP_HDFS_HOME=/hadoop-3.1.0
>  export HADOOP_CONF_DIR=$WORK_DIR
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8561) [Submarine] Initial implementation: Training job submission and job history retrieval

2018-08-22 Thread Zhankun Tang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589561#comment-16589561
 ] 

Zhankun Tang commented on YARN-8561:


[~leftnoteasy]

I'm going through the code. And a minor problem about below code:

 
{code:java}
boolean lackingEnvs = false;
...// set lackingEnvs to true based on some conditions
if (lackingEnvs) {
 LOG.error("When hdfs is being used to read/write models/data. Following"
 + "envs are required: 1) DOCKER_HADOOP_HDFS_HOME= 2) DOCKER_JAVA_HOME=. You can use --env to pass these envars.");
 throw new IOException("Failed to detect HDFS-related environments.");
}
{code}
It seems that if users don't specify these two required environment variables, 
the error message won't be thrown. Is it expected?

 

> [Submarine] Initial implementation: Training job submission and job history 
> retrieval
> -
>
> Key: YARN-8561
> URL: https://issues.apache.org/jira/browse/YARN-8561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8561.001.patch, YARN-8561.002.patch, 
> YARN-8561.003.patch, YARN-8561.004.patch, YARN-8561.005.patch
>
>
> Added following parts:
> 1) New subcomponent of YARN, under applications/ project. 
> 2) Tensorflow training job submission, including training (single node and 
> distributed). 
> - Supported Docker container. 
> - Support GPU isolation. 
> - Support YARN registry DNS.
> 3) Retrieve job history.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes

2018-08-22 Thread Naganarasimha G R (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589559#comment-16589559
 ] 

Naganarasimha G R commented on YARN-7863:
-

Hi [~sunilg],

    few other nits in the patch :
 * NodeAttributesManagerImpl ln no 205: as i was mentioning there is both debug 
and info log here ,i think we can remove the debug log.
 * NodeAttributesManagerImpl ln no 212-226 : Here you are sending out complete 
update of the node collections, and not just the modified NM's only. There are 
multiple impacts due to this, in large cluster we are unnecessarily sending lot 
of updates to scheduler secondly removed attributes will not be captured in 
this way. Where in later is more important
 * NodeAttributesManagerImpl ln no 212-226 : In general idea earlier was to 
make use of the AttributeValue by the scheduler so that the converted value is 
stored and used for comparison . But if we have the flexibility later on to 
change the scheduler event which is being pushed from NAM to Schedulers then i 
am fine with the Event being sent out, else i would suggest to send the 
AttributeValue itself 
 * Test cases for AND and OR are covered ? Though i could see AND not 
declaratively covered in TestPlacementConstraintParser but better to cover with 
AND and OR explicitly
 * PlacementSpec ln no 51: typo "teh"

> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7863-YARN-3409.002.patch, 
> YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, 
> YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, 
> YARN-7863-YARN-3409.007.patch, YARN-7863-YARN-3409.008.patch, 
> YARN-7863.v0.patch
>
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8685) Add containers query support for nodes/node REST API in RMWebServices

2018-08-22 Thread Tao Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589551#comment-16589551
 ] 

Tao Yang commented on YARN-8685:


{quote}
How about my suggestion about adding a new endpoint in RM? Does that make sense 
to you?
{quote}
Yes, it makes sense to me.  : )

> Add containers query support for nodes/node REST API in RMWebServices
> -
>
> Key: YARN-8685
> URL: https://issues.apache.org/jira/browse/YARN-8685
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: restapi
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8685.001.patch
>
>
> Currently we can only query running containers from NM containers REST API, 
> but can't get the valid containers which are in ALLOCATED/ACQUIRED state. We 
> have the requirements to get all containers allocated on specified nodes for 
> debugging. I want to add a "includeContainers" query param (default false) 
> for nodes/node REST API in RMWebServices, so that we can get valid containers 
> on nodes if "includeContainers=true" specified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8697) LocalityMulticastAMRMProxyPolicy should fallback to random sub-cluster when cannot resolve resource

2018-08-22 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589541#comment-16589541
 ] 

genericqa commented on YARN-8697:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: 
The patch generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 52s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
16s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 69m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8697 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936716/YARN-8697.v1.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4eba3d4b80df 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / af4b705 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/21662/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21662/testReport/ |
| Max. process+thread count | 440 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U:

[jira] [Commented] (YARN-8696) FederationInterceptor upgrade: home sub-cluster heartbeat async

2018-08-22 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589528#comment-16589528
 ] 

genericqa commented on YARN-8696:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
3s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  8s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
14s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
23s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
35s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 29s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}192m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
 |
|   | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8696 |
| JIRA Patch URL |

[jira] [Commented] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job

2018-08-22 Thread Zhankun Tang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589498#comment-16589498
 ] 

Zhankun Tang commented on YARN-8698:


[~yuan_zac] Thanks for the path! And I have a question that does the patch 
solve the issue you met?
And if possible, can you post more information on your Tensorflow environment 
and job script so that I can help reproduce your issue and double-confirm? 
Thanks.

> [Submarine] Failed to add hadoop dependencies in docker container when 
> submitting a submarine job
> -
>
> Key: YARN-8698
> URL: https://issues.apache.org/jira/browse/YARN-8698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
> Attachments: YARN-8698.001.patch
>
>
> When a standalone submarine tf job is submitted, the following error is got :
> INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
>  INFO:tensorflow:Done calling model_fn.
>  INFO:tensorflow:Create CheckpointSaverHook.
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  
> This error may be related to hadoop classpath
> Hadoop env variables of launch_container.sh are as follows:
> export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
>  export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
>  
> run-PRIMARY_WORKER.sh is like:
> export HADOOP_YARN_HOME=
>  export HADOOP_HDFS_HOME=/hadoop-3.1.0
>  export HADOOP_CONF_DIR=$WORK_DIR
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8696) FederationInterceptor upgrade: home sub-cluster heartbeat async

2018-08-22 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589484#comment-16589484
 ] 

genericqa commented on YARN-8696:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 24m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
40s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
10s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
55s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
26s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
34s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
48s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 35s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}220m 47s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8696 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936691/YARN-8696.v2.patch |
| Optional Tests |  dupname

[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-22 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589478#comment-16589478
 ] 

Wangda Tan commented on YARN-8638:
--

[~ccondit-target], 

Thanks for working on this ticket. 

It will be very clean if we can make runc/containerd to be a separate 
ContainerRuntime implementation. But not sure that if all the common logics 
like ContainerLaunch/LinuxContainerExecutor works fine for containerd/runc. If 
involved changes required, we may have to consider to move the abstraction to 
ContainerExecutor level, etc. 

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8697) LocalityMulticastAMRMProxyPolicy should fallback to random sub-cluster when cannot resolve resource

2018-08-22 Thread Botong Huang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8697:
---
Attachment: YARN-8697.v1.patch

> LocalityMulticastAMRMProxyPolicy should fallback to random sub-cluster when 
> cannot resolve resource
> ---
>
> Key: YARN-8697
> URL: https://issues.apache.org/jira/browse/YARN-8697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8697.v1.patch
>
>
> Right now in LocalityMulticastAMRMProxyPolicy, whenever we cannot resolve the 
> resource name (node or rack), we always route the request to home 
> sub-cluster. However, home sub-cluster might not be always be ready to use 
> (timed out YARN-8581) or enabled (by AMRMProxyPolicy weights). It might also 
> be overwhelmed by the requests if sub-cluster resolver has some issue. In 
> this Jira, we are changing it to pick a random active and enabled sub-cluster 
> for resource request we cannot resolve. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-22 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589432#comment-16589432
 ] 

Jason Lowe commented on YARN-8649:
--

Thanks for updating the patch!  Logic looks good overall, but I have some 
concerns on the logging that was added.

I think it's misleading to assume the NM is shutting down when this situation 
occurs.  As I understand it, the main trigger for this scenario is a container 
getting killed while it is still localizing.  That can happen when the NM shuts 
down, but it can also happen without the NM shutting down.  Therefore it seems 
inappropriate to assume this scenario means the NM is shutting down.  There are 
already separate logs when the NM decides to shut down so probably best to keep 
this logging to just the fact that the resource was removed before we got 
around to localizing it and therefore will no longer be localized.

The warning log should show the source resource, similar to what is done in the 
public localization debug code that was added, rather than the local path.  The 
local path won't mean as much as the resource that was requested, as that 
source resource path was logged when it was initially requested by the 
container.

There is debug logging in the public localizer case but not the private case 
which is inconsistent.  Arguably if it's useful for the public case it would be 
useful for the private case.  Given there's a loud warning log already in the 
common getPathForLocalization code, I'm not sure the debug log in the public 
path adds any value, especially if we change the loud warning log to show the 
source path.


> Similar as YARN-4355:NPE while processing localizer heartbeat
> -
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, 
> YARN-8649_4.patch, hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 which is reported by [# Jason Lowe]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-08-22 Thread Eric Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589430#comment-16589430
 ] 

Eric Yang commented on YARN-8569:
-

>From today's docker meetup discussion, distributed cache is not ideal 
>interface for replicate frequently changing cluster information.  If the file 
>checksum changes due to cluster information update, the file may not get 
>replicated to distributed cache.  This information is more similar to token 
>generation and population instead of jar file distribution.  I will change the 
>population mechanism to align with token population instead of distributed 
>cache.

> Create an interface to provide cluster information to application
> -
>
> Key: YARN-8569
> URL: https://issues.apache.org/jira/browse/YARN-8569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8569.001.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-22 Thread Chandni Singh (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589403#comment-16589403
 ] 

Chandni Singh commented on YARN-8638:
-

[~ccondit-target] The change looks good to me.

I have a question about the pluggable class. Will there by any plugin discovery 
mechanism? or the plugin class should be in NM's classpath?

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-22 Thread Eric Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang resolved YARN-8675.
-
Resolution: Not A Problem

Reopen by accident during docker meeting.  Close again.

> Setting hostname of docker container breaks with "host" networking mode for 
> Apps which do not run as a YARN service
> ---
>
> Key: YARN-8675
> URL: https://issues.apache.org/jira/browse/YARN-8675
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
>
> Applications like the Spark AM currently do not run as a YARN service and 
> setting hostname breaks driver/executor communication if docker version 
> >=1.13.1 , especially with wire-encryption turned on.
> YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could 
> have a mix of YARN service/native Applications.
> The proposal is to not set the hostname when "host" networking mode is 
> enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Reopened] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-22 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger reopened YARN-8675:
---

> Setting hostname of docker container breaks with "host" networking mode for 
> Apps which do not run as a YARN service
> ---
>
> Key: YARN-8675
> URL: https://issues.apache.org/jira/browse/YARN-8675
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
>
> Applications like the Spark AM currently do not run as a YARN service and 
> setting hostname breaks driver/executor communication if docker version 
> >=1.13.1 , especially with wire-encryption turned on.
> YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could 
> have a mix of YARN service/native Applications.
> The proposal is to not set the hostname when "host" networking mode is 
> enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job

2018-08-22 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-8698:


Assignee: Zac Zhou

> [Submarine] Failed to add hadoop dependencies in docker container when 
> submitting a submarine job
> -
>
> Key: YARN-8698
> URL: https://issues.apache.org/jira/browse/YARN-8698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
> Attachments: YARN-8698.001.patch
>
>
> When a standalone submarine tf job is submitted, the following error is got :
> INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
>  INFO:tensorflow:Done calling model_fn.
>  INFO:tensorflow:Create CheckpointSaverHook.
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  
> This error may be related to hadoop classpath
> Hadoop env variables of launch_container.sh are as follows:
> export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
>  export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
>  
> run-PRIMARY_WORKER.sh is like:
> export HADOOP_YARN_HOME=
>  export HADOOP_HDFS_HOME=/hadoop-3.1.0
>  export HADOOP_CONF_DIR=$WORK_DIR
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job

2018-08-22 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8698:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-8135

> [Submarine] Failed to add hadoop dependencies in docker container when 
> submitting a submarine job
> -
>
> Key: YARN-8698
> URL: https://issues.apache.org/jira/browse/YARN-8698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Priority: Major
> Attachments: YARN-8698.001.patch
>
>
> When a standalone submarine tf job is submitted, the following error is got :
> INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
>  INFO:tensorflow:Done calling model_fn.
>  INFO:tensorflow:Create CheckpointSaverHook.
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  
> This error may be related to hadoop classpath
> Hadoop env variables of launch_container.sh are as follows:
> export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
>  export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
>  
> run-PRIMARY_WORKER.sh is like:
> export HADOOP_YARN_HOME=
>  export HADOOP_HDFS_HOME=/hadoop-3.1.0
>  export HADOOP_CONF_DIR=$WORK_DIR
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job

2018-08-22 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8698:
-
Summary: [Submarine] Failed to add hadoop dependencies in docker container 
when submitting a submarine job  (was: Failed to add hadoop dependencies in 
docker container when submitting a submarine job)

> [Submarine] Failed to add hadoop dependencies in docker container when 
> submitting a submarine job
> -
>
> Key: YARN-8698
> URL: https://issues.apache.org/jira/browse/YARN-8698
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zac Zhou
>Priority: Major
> Attachments: YARN-8698.001.patch
>
>
> When a standalone submarine tf job is submitted, the following error is got :
> INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
>  INFO:tensorflow:Done calling model_fn.
>  INFO:tensorflow:Create CheckpointSaverHook.
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  
> This error may be related to hadoop classpath
> Hadoop env variables of launch_container.sh are as follows:
> export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
>  export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
>  
> run-PRIMARY_WORKER.sh is like:
> export HADOOP_YARN_HOME=
>  export HADOOP_HDFS_HOME=/hadoop-3.1.0
>  export HADOOP_CONF_DIR=$WORK_DIR
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8509) Total pending resource calculation in preemption should use user-limit factor instead of minimum-user-limit-percent

2018-08-22 Thread Eric Payne (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589226#comment-16589226
 ] 

Eric Payne commented on YARN-8509:
--

{code:title=UsersManager#computeUserLimit}
-Resource userLimitResource = Resources.max(resourceCalculator,
-partitionResource,
-Resources.divideAndCeil(resourceCalculator, resourceUsed,
-usersSummedByWeight),
-Resources.divideAndCeil(resourceCalculator,
-Resources.multiplyAndRoundDown(currentCapacity, getUserLimit()),
-100));
+Resource userLimitResource = Resources.multiplyAndRoundDown(queueCapacity,
+getUserLimitFactor());
 {code}
This is a drastic change that affects more than just preemption (the title of 
this JIRA is "Total pending resource calculation in preemption should use 
user-limit factor instead of minimum-user-limit-percent"). Forgive me if I 
didn't understand that this JIRA is trying to change the way the capacity 
scheduler calculates user limits.

[~leftnoteasy], I thought that the idea goal of the algorithm within 
{{computeUserLimit}} is to slowly grow each queue once the queue is over it's 
capacity so that resources can be assigned evenly. Are you okay with this 
change?

> Total pending resource calculation in preemption should use user-limit factor 
> instead of minimum-user-limit-percent
> ---
>
> Key: YARN-8509
> URL: https://issues.apache.org/jira/browse/YARN-8509
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
>  Labels: capacityscheduler
> Attachments: YARN-8509.001.patch, YARN-8509.002.patch, 
> YARN-8509.003.patch, YARN-8509.004.patch, YARN-8509.005.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
> pending resource based on user-limit percent and user-limit factor which will 
> cap pending resource for each user to the minimum of user-limit pending and 
> actual pending. This will prevent queue from taking more pending resource to 
> achieve queue balance after all queue satisfied with its ideal allocation.
>   
>  We need to change the logic to let queue pending can go beyond userlimit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router

2018-08-22 Thread Bibin A Chundatt (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589217#comment-16589217
 ] 

Bibin A Chundatt commented on YARN-8699:


Spark applcation submission depends in yarnclusterMetrics . 

{code}
  logInfo("Requesting a new application from cluster with %d NodeManagers"
.format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))
{code}

https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

> Add Yarnclient#yarnclusterMetrics API implementation in router
> --
>
> Key: YARN-8699
> URL: https://issues.apache.org/jira/browse/YARN-8699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Priority: Major
>
> Implement YarnclusterMetrics API in FederationClientInterceptor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router

2018-08-22 Thread Bibin A Chundatt (JIRA)

Bibin A Chundatt created YARN-8699:
--

 Summary: Add Yarnclient#yarnclusterMetrics API implementation in 
router
 Key: YARN-8699
 URL: https://issues.apache.org/jira/browse/YARN-8699
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bibin A Chundatt


Implement YarnclusterMetrics API in FederationClientInterceptor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6972) Adding RM ClusterId in AppInfo

2018-08-22 Thread Bibin A Chundatt (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589211#comment-16589211
 ] 

Bibin A Chundatt commented on YARN-6972:


Thank you [~tanujnay] for patch

MInor comments

# Changes to 
TestRMWebServiceAppsNodelabel,TestRMHA,TestRMWebServicesAppsModification,TestRMWebServicesNodeLabels
 might not be required. With out changes  testcases is passing. If rmcluster id 
is null response will not have rmclusterId field rt ??
# TestRMWebServicesApps is checking field count .Add validation for rmclusterId 
value too.

> Adding RM ClusterId in AppInfo
> --
>
> Key: YARN-6972
> URL: https://issues.apache.org/jira/browse/YARN-6972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Tanuj Nayak
>Priority: Major
> Attachments: YARN-6972.001.patch, YARN-6972.002.patch, 
> YARN-6972.003.patch, YARN-6972.004.patch, YARN-6972.005.patch, 
> YARN-6972.006.patch, YARN-6972.007.patch, YARN-6972.008.patch, 
> YARN-6972.009.patch, YARN-6972.010.patch, YARN-6972.011.patch, 
> YARN-6972.012.patch, YARN-6972.013.patch, YARN-6972.014.patch, 
> YARN-6972.015.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8696) FederationInterceptor upgrade: home sub-cluster heartbeat async

2018-08-22 Thread Botong Huang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8696:
---
Attachment: YARN-8696.v2.patch

> FederationInterceptor upgrade: home sub-cluster heartbeat async
> ---
>
> Key: YARN-8696
> URL: https://issues.apache.org/jira/browse/YARN-8696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8696.v1.patch, YARN-8696.v2.patch
>
>
> Today in _FederationInterceptor_, the heartbeat to home sub-cluster is 
> synchronous. After the heartbeat is sent out to home sub-cluster, it waits 
> for the home response to come back before merging and returning the (merged) 
> heartbeat result to back AM. If home sub-cluster is suffering from connection 
> issues, or down during an YarnRM master-slave switch, all heartbeat threads 
> in _FederationInterceptor_ will be blocked waiting for home response. As a 
> result, the successful UAM heartbeats from secondary sub-clusters will not be 
> returned to AM at all. Additionally, because of the fact that we kept the 
> same heartbeat responseId between AM and home RM, lots of tricky handling are 
> needed regarding the responseId resync when it comes to 
> _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart 
> (YARN-6127, YARN-1336), home RM master-slave switch etc. 
> In this patch, we change the heartbeat to home sub-cluster to asynchronous, 
> same as the way we handle UAM heartbeats in secondaries. So that any 
> sub-cluster down or connection issues won't impact AM getting responses from 
> other sub-clusters. The responseId is also managed separately for home 
> sub-cluster and AM, and they increment independently. The resync logic 
> becomes much cleaner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes

2018-08-22 Thread Naganarasimha G R (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589171#comment-16589171
 ] 

Naganarasimha G R commented on YARN-7863:
-

Thanks [~cheersyang], for some clarifications but at the same time was 
discussing with  Sunil too.
{quote} PC doesn't affect any logic how scheduler selects requests, it is still 
how it is handled now. A PC is simply checked twice when 1) creating an 
allocation proposal on a node and 2) at commit phase against a specific node.
{quote}
I am not specifying that changes in  this Jira alone introducing anything 
implications but my point was, earlier when partitions was introduced it was 
easy to determine pending resources per partition per queue in the earlier api 
using resource request. Now with the New api introduced with PC(not just from 
this Jira alone) there is no way to find out for a given queue how much pending 
resources are there in each partition it can access. This is because PC can 
have an Partition OR'd with Allocation TAG or with this Jira  OR'd with 
Attributes. Impact of this would be that Cluster admin will not be able to plan 
resources for the partition per queue. And also i am not able to envisage the 
scenario where in partition needs to be OR'd with Allocation tags or Attributes.
{quote}No, not necessarily. Just to keep the changes clean and incremental, we 
can allow this form for now. Because this is the spec we used for distributed 
shell. Since we don't have an --allocationTags argument. A "foo=3" is the only 
way to specify allocation tags right now.
{quote}
Agree, My bad but the reason i got confused is all the parsing logic for the  
Distributed shell's expression is present in  "hadoop-yarn-api" project which 
more so implies that this the API to be used by other apps too.
{quote}I want to reinforce the phase-target (for the merge), we want 
node-attributes can be integrated into PC and support simple operation "=" and 
"!=".
{quote}
Yes I concur with you on this, as long as we are able to clearly capture for 
others what kind of Java API needs to be used to specify Node attributes.

[~sunilg],

Could there be a simple test case or example which captures how to write a java 
api where in node attributes can be specified for a given SchedulingRequest  
i.e. without any of these expression DSL's ?

Also i could not see a test case where in CS is handling scheduling of 
containers with PC having attributes. So that  modifications in 
PlacementConstraintsUtil is tested.

 

 

 

> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7863-YARN-3409.002.patch, 
> YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, 
> YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, 
> YARN-7863-YARN-3409.007.patch, YARN-7863-YARN-3409.008.patch, 
> YARN-7863.v0.patch
>
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes

2018-08-22 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589116#comment-16589116
 ] 

Weiwei Yang commented on YARN-7863:
---

Hi [~Naganarasimha]

Regarding some of your comments
{quote}IIUC we can specify allocation tags, attribute expression  and partition 
label all in single expression ?
{quote}
Yes, this is supported. User can specify constraints against allocation tags, 
node attributes and partition in a single expression (a conjunction expression 
composed by AND or OR). E.g place a container on a node has allocation tag X 
and javaVersion (node-attribute)  = 1.8.
{quote}If its the case then how are we going select the outstanding/ pending 
requests for a given queue for a given partition, as there could be OR with 
allocation tags or attribute expression right ? 
{quote}
PC doesn't affect any logic how scheduler selects requests, it is still how it 
is handled now. A PC is simply checked twice when 1) creating an allocation 
proposal on a node and 2) at commit phase against a specific node.
{quote}Allocation tags are created even if we want to specify attribute 
expression ?
{quote}
No, not necessarily. Just to keep the changes clean and incremental, we can 
allow this form for now. Because this is the spec we used for distributed 
shell. Since we don't have an --allocationTags argument. A "foo=3" is the only 
way to specify allocation tags right now. A follow-up task is to make that 
optional. But I am not sure why even a single tag without container number is 
supported, maybe [~sunilg] can comment more.

I want to reinforce the phase-target (for the merge), we want node-attributes 
can be integrated into PC and support simple operation "=" and "!=". We want to 
extend DS to support node-attributes PC expressions for testing, but with 
minimal changes to the existing placement spec. For real users, their interface 
will be java API or native service spec, not this spec in DS.

Hope that makes sense. 

> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7863-YARN-3409.002.patch, 
> YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, 
> YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, 
> YARN-7863-YARN-3409.007.patch, YARN-7863-YARN-3409.008.patch, 
> YARN-7863.v0.patch
>
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes

2018-08-22 Thread Naganarasimha G R (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589075#comment-16589075
 ] 

Naganarasimha G R commented on YARN-7863:
-

Hi [~sunilg],

Now i am confused when we have everything in one single expression .
 * IIUC we can specify allocation tags, attribute expression  and partition 
label all in single expression ?
 * If its the case then how are we going select the outstanding/ pending 
requests for a given queue for a given partition, as there could be OR with 
allocation tags or attribute expression right ? I think this issue should be 
already existing when we have OR ing involved with allocation tags ?
 * Based on the test cases in 
TestPlacementConstraintParser.testParseNodeAttributeSpec, Allocation tags are 
created even if we want to specify attribute expression ?  ex : 
"xyz,in,rm.yarn.io/foo=true"  xyz is allocation tag ? If so users will be 
really confused what to pass and what not !
 * if i specify "IN" and have attribute expression as  
"xyz,in,rm.yarn.io/foo{color:#d04437}*!=*{color}true" is it valid ? In future 
when we come with more operators it would not make sense. I would suggest to go 
with "," as separator. 

I am not sure how many cases would be there where we want to specify all the 
constraints in the same expression but we are making the users life complex 
with having such a complex DSL. 

 

> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7863-YARN-3409.002.patch, 
> YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, 
> YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, 
> YARN-7863-YARN-3409.007.patch, YARN-7863-YARN-3409.008.patch, 
> YARN-7863.v0.patch
>
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-22 Thread Eric Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589066#comment-16589066
 ] 

Eric Yang commented on YARN-8675:
-

Further analysis of the problem indicated that the registry DNS is set to 
mycluster.com, while the host level domain is example.com.  If the hostname is 
host1.example.com, spark on yarn cluster workload will start the container as 
host1.mycluster.com.  Host1.mycluster.com is unresolvable because no 
registryDNS entry is written to zookeeper.  Without using YARN service API, 
there is no AM logic that handles registration of hostname to IP mapping.  This 
is the reason that it failed.  For handling net=host situation properly without 
using YARN service API, the registry DNS must set to same as host level domain, 
which is example.com.  System administrator must configure registryDNS domain 
properly to permit application to use host level domain.  This is to ensure 
that decouple of infrastructure cluster (YARN), and workload cluster (YARN 
apps).  The application does not try to impersonate infrastructure cluster 
unless explicitly allowed.  This is a feature, not a bug.

> Setting hostname of docker container breaks with "host" networking mode for 
> Apps which do not run as a YARN service
> ---
>
> Key: YARN-8675
> URL: https://issues.apache.org/jira/browse/YARN-8675
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
>
> Applications like the Spark AM currently do not run as a YARN service and 
> setting hostname breaks driver/executor communication if docker version 
> >=1.13.1 , especially with wire-encryption turned on.
> YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could 
> have a mix of YARN service/native Applications.
> The proposal is to not set the hostname when "host" networking mode is 
> enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-22 Thread Eric Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang resolved YARN-8675.
-
Resolution: Not A Problem

> Setting hostname of docker container breaks with "host" networking mode for 
> Apps which do not run as a YARN service
> ---
>
> Key: YARN-8675
> URL: https://issues.apache.org/jira/browse/YARN-8675
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
>
> Applications like the Spark AM currently do not run as a YARN service and 
> setting hostname breaks driver/executor communication if docker version 
> >=1.13.1 , especially with wire-encryption turned on.
> YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could 
> have a mix of YARN service/native Applications.
> The proposal is to not set the hostname when "host" networking mode is 
> enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-22 Thread Billie Rinaldi (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588998#comment-16588998
 ] 

Billie Rinaldi commented on YARN-8675:
--

Perhaps we should always set the hostname when the AM has provided one through 
the YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_HOSTNAME env var, but place 
conditions on when the DockerLinuxContainerRuntime makes up a default hostname 
to set. We could remove the default hostname entirely, or just set it when net 
!= host.

Another probably ill-advised option would be to have the runtime populate the 
registry when the runtime sets a default hostname and RegistryDNS is enabled. 
But then we'd have to figure out a way to clean up the registry later.

> Setting hostname of docker container breaks with "host" networking mode for 
> Apps which do not run as a YARN service
> ---
>
> Key: YARN-8675
> URL: https://issues.apache.org/jira/browse/YARN-8675
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
>
> Applications like the Spark AM currently do not run as a YARN service and 
> setting hostname breaks driver/executor communication if docker version 
> >=1.13.1 , especially with wire-encryption turned on.
> YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could 
> have a mix of YARN service/native Applications.
> The proposal is to not set the hostname when "host" networking mode is 
> enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes

2018-08-22 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588996#comment-16588996
 ] 

Weiwei Yang commented on YARN-7863:
---

Hi [~sunilg]

Thanks for the updates. I think v8 patch has addressed most of my concerns. Two 
comments:

PlacementConstraintsUtil.java

canSatisfyNodeConstraintExpresssion: looks like the logic only supports 
affinity to node-attributes, does it support anti-affinity? e.g 
{{targetNotIn(NODE, nodeAttribute("java", "1.8"))}}, can container with such PC 
be allocated to nodes where {{java != 1.8}}? If it is not straightforward to 
support this with {{targetIn}} and {{targetNotIn}}, I think we can add a API 
something like {{targetNodeAttribute(Operator.EQ, "java", "1.8")}} ? For this 
version, we can claim only to support {{EQ}} and {{NE}}. What do you think?

TestPlacementConstraints.java

line 72 and 85: why it is still {{nodeAttribute("java", "java=1.8")}}? The 
value should be "1.8" alone right?

Thanks

 

> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7863-YARN-3409.002.patch, 
> YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, 
> YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, 
> YARN-7863-YARN-3409.007.patch, YARN-7863-YARN-3409.008.patch, 
> YARN-7863.v0.patch
>
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-7644) NM gets backed up deleting docker containers

2018-08-22 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger reassigned YARN-7644:
-

Assignee: Chandni Singh  (was: Eric Badger)

[~csingh], assigned to you. Thanks for picking this up

> NM gets backed up deleting docker containers
> 
>
> Key: YARN-7644
> URL: https://issues.apache.org/jira/browse/YARN-7644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Eric Badger
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
>
> We are sending a {{docker stop}} to the docker container with a timeout of 10 
> seconds when we shut down a container. If the container does not stop after 
> 10 seconds then we force kill it. However, the {{docker stop}} command is a 
> blocking call. So in cases where lots of containers don't go down with the 
> initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to 
> return. This ties up the ContainerLaunch handler and so these kill events 
> back up. It also appears to be backing up new container launches as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8670) Support scheduling request for SLS input and attributes for Node in SLS

2018-08-22 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588925#comment-16588925
 ] 

Weiwei Yang edited comment on YARN-8670 at 8/22/18 2:36 PM:


Hi [~Sichen Zhao]
{quote}it seems all the improvements is depending on YARN-3409?
{quote}
Only if you want to specify node-attributes, that will depend on YARN-3409. 
Good news is that branch is close to merge, we are trying to get it done in 2 - 
3 weeks. Simple PC with allocation tags is supported in trunk code.
{quote}PC and allocataiontags shouldn't add to container level.
{quote}
The PC is specified in SchedulingRequest, so it is request level. It should be 
enough right? 
{quote}So maybe we need create a new task class. PC, allocataiontags and 
taskcontainer are members of task class
{quote}
Before going into the implementation details, I would love to know your idea 
how to specify PC in request level. Curren SLS supports to launch workload from 
SLS/Rumen/Synth traces. YARN-8007 was to support in Synth traces, so it was 
only able to test simple affinity/anti-affinity PCs. Therefore, could you 
please share the idea to support a large number of jobs/requests with PCs in 
traces? 

Thanks


was (Author: cheersyang):
Hi [~Sichen Zhao]
{quote}it seems all the improvements is depending on YARN-3409?
{quote}
Only if you want to specify node-attributes, that will depend on YARN-3409. 
Good news is that branch is close to merge, we are trying to get it done in 2 - 
3 weeks.
{quote}PC and allocataiontags shouldn't add to container level.
{quote}
The PC is specified in SchedulingRequest, so it is request level. It should be 
enough right? 
{quote}So maybe we need create a new task class. PC, allocataiontags and 
taskcontainer are members of task class
{quote}
Before going into the implementation details, I would love to know your idea 
how to specify PC in request level. Curren SLS supports to launch workload from 
SLS/Rumen/Synth traces. YARN-8007 was to support in Synth traces, so it was 
only able to test simple affinity/anti-affinity PCs. Therefore, could you 
please share the idea to support a large number of jobs/requests with PCs in 
traces? 

Thanks

> Support scheduling request for SLS input and attributes for Node in SLS
> ---
>
> Key: YARN-8670
> URL: https://issues.apache.org/jira/browse/YARN-8670
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Affects Versions: YARN-3409
>Reporter: Sichen zhao
>Priority: Major
> Fix For: YARN-3409
>
>
> YARN-3409 introduces placement constraint, Currently SLS does not support 
> specify placement constraint. 
> YARN-8007 support specifying placement constraint for task containers in SLS. 
> But there are still 
> some room for improvement:
>  # YARN-8007 only support placement constraint for the jobs level. In fact, 
> the more flexible way is support placement constraint for the tasks level.
>  # In most scenarios, node itself has some characteristics, called attribute, 
> which is not supported in SLS. So we can add the attribute on Nodes.
>  # YARN-8007 use the SYNTH as schedulingrequest input. But SYNTH can't create 
> a large number of specific resource requests. We wanna create a new 
> schedulingrequest input format(like sis format) for the more authentic input. 
> We can add some field in sis format, and this is schedulingrequest input 
> format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8670) Support scheduling request for SLS input and attributes for Node in SLS

2018-08-22 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588925#comment-16588925
 ] 

Weiwei Yang commented on YARN-8670:
---

Hi [~Sichen Zhao]
{quote}it seems all the improvements is depending on YARN-3409?
{quote}
Only if you want to specify node-attributes, that will depend on YARN-3409. 
Good news is that branch is close to merge, we are trying to get it done in 2 - 
3 weeks.
{quote}PC and allocataiontags shouldn't add to container level.
{quote}
The PC is specified in SchedulingRequest, so it is request level. It should be 
enough right? 
{quote}So maybe we need create a new task class. PC, allocataiontags and 
taskcontainer are members of task class
{quote}
Before going into the implementation details, I would love to know your idea 
how to specify PC in request level. Curren SLS supports to launch workload from 
SLS/Rumen/Synth traces. YARN-8007 was to support in Synth traces, so it was 
only able to test simple affinity/anti-affinity PCs. Therefore, could you 
please share the idea to support a large number of jobs/requests with PCs in 
traces? 

Thanks

> Support scheduling request for SLS input and attributes for Node in SLS
> ---
>
> Key: YARN-8670
> URL: https://issues.apache.org/jira/browse/YARN-8670
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Affects Versions: YARN-3409
>Reporter: Sichen zhao
>Priority: Major
> Fix For: YARN-3409
>
>
> YARN-3409 introduces placement constraint, Currently SLS does not support 
> specify placement constraint. 
> YARN-8007 support specifying placement constraint for task containers in SLS. 
> But there are still 
> some room for improvement:
>  # YARN-8007 only support placement constraint for the jobs level. In fact, 
> the more flexible way is support placement constraint for the tasks level.
>  # In most scenarios, node itself has some characteristics, called attribute, 
> which is not supported in SLS. So we can add the attribute on Nodes.
>  # YARN-8007 use the SYNTH as schedulingrequest input. But SYNTH can't create 
> a large number of specific resource requests. We wanna create a new 
> schedulingrequest input format(like sis format) for the more authentic input. 
> We can add some field in sis format, and this is schedulingrequest input 
> format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8670) Support scheduling request for SLS input and attributes for Node in SLS

2018-08-22 Thread Weiwei Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8670:
--
Target Version/s:   (was: YARN-3409)

> Support scheduling request for SLS input and attributes for Node in SLS
> ---
>
> Key: YARN-8670
> URL: https://issues.apache.org/jira/browse/YARN-8670
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Affects Versions: YARN-3409
>Reporter: Sichen zhao
>Priority: Major
> Fix For: YARN-3409
>
>
> YARN-3409 introduces placement constraint, Currently SLS does not support 
> specify placement constraint. 
> YARN-8007 support specifying placement constraint for task containers in SLS. 
> But there are still 
> some room for improvement:
>  # YARN-8007 only support placement constraint for the jobs level. In fact, 
> the more flexible way is support placement constraint for the tasks level.
>  # In most scenarios, node itself has some characteristics, called attribute, 
> which is not supported in SLS. So we can add the attribute on Nodes.
>  # YARN-8007 use the SYNTH as schedulingrequest input. But SYNTH can't create 
> a large number of specific resource requests. We wanna create a new 
> schedulingrequest input format(like sis format) for the more authentic input. 
> We can add some field in sis format, and this is schedulingrequest input 
> format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8685) Add containers query support for nodes/node REST API in RMWebServices

2018-08-22 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588896#comment-16588896
 ] 

Weiwei Yang commented on YARN-8685:
---

Hi [~Tao Yang]
{quote}There is a ContainerInfo class in hadoop-yarn-server-common module, the 
patch can share this class with adding several fields like 
allocationRequestId/version/allocationTags
{quote}
Correct.

How about my suggestion about adding a new endpoint in RM? Does that make sense 
to you?

Thanks

> Add containers query support for nodes/node REST API in RMWebServices
> -
>
> Key: YARN-8685
> URL: https://issues.apache.org/jira/browse/YARN-8685
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: restapi
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8685.001.patch
>
>
> Currently we can only query running containers from NM containers REST API, 
> but can't get the valid containers which are in ALLOCATED/ACQUIRED state. We 
> have the requirements to get all containers allocated on specified nodes for 
> debugging. I want to add a "includeContainers" query param (default false) 
> for nodes/node REST API in RMWebServices, so that we can get valid containers 
> on nodes if "includeContainers=true" specified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-08-22 Thread JIRA



[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588777#comment-16588777
 ] 

Antal Bálint Steinbach commented on YARN-8468:
--

Hi [~haibochen] ,

Thank you for your feedback. All of the points you suggested are fixed.

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-22 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588775#comment-16588775
 ] 

genericqa commented on YARN-8649:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
44s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8649 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936636/YARN-8649_4.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1f7cbdde302c 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8184739 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21659/testReport/ |
| Max. process+thread count | 335 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21659/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Similar as YARN-4355:NPE while processing localizer

[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-08-22 Thread JIRA



[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588772#comment-16588772
 ] 

Antal Bálint Steinbach commented on YARN-8468:
--

The failing test is a flaky test:

https://issues.apache.org/jira/browse/YARN-8433

 

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-08-22 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588748#comment-16588748
 ] 

genericqa commented on YARN-8468:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 11 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
11s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 31s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 41 new + 602 unchanged - 15 fixed = 643 total (was 617) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  6s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 46s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
21s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}147m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8468 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936607/YARN-8468.006.patch |
| Optional

[jira] [Updated] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-22 Thread lujie (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8649:

Attachment: YARN-8649_4.patch

> Similar as YARN-4355:NPE while processing localizer heartbeat
> -
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, 
> YARN-8649_4.patch, hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 which is reported by [# Jason Lowe]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8670) Support scheduling request for SLS input and attributes for Node in SLS

2018-08-22 Thread Sichen zhao (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588645#comment-16588645
 ] 

Sichen zhao edited comment on YARN-8670 at 8/22/18 10:07 AM:
-

Hi, sorry for reply late.

it seems all the improvements is depending on YARN-3409? the master branch do 
not support scheduling request yet.

And for the 1st improvment, what i thought was add PC and allocataiontags on 
taskcontainer , but i read YARN-8007, and PC and allocataiontags shouldn't add 
to container level.

So maybe we need create a new task class. PC, allocataiontags and taskcontainer 
are members of task class. What do you think?


was (Author: sichen zhao):
Hi, sorry for reply late.

it seems all the improvements is depending on YARN-3409? the master branch do 
not support scheduling request yet.

> Support scheduling request for SLS input and attributes for Node in SLS
> ---
>
> Key: YARN-8670
> URL: https://issues.apache.org/jira/browse/YARN-8670
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Affects Versions: YARN-3409
>Reporter: Sichen zhao
>Priority: Major
> Fix For: YARN-3409
>
>
> YARN-3409 introduces placement constraint, Currently SLS does not support 
> specify placement constraint. 
> YARN-8007 support specifying placement constraint for task containers in SLS. 
> But there are still 
> some room for improvement:
>  # YARN-8007 only support placement constraint for the jobs level. In fact, 
> the more flexible way is support placement constraint for the tasks level.
>  # In most scenarios, node itself has some characteristics, called attribute, 
> which is not supported in SLS. So we can add the attribute on Nodes.
>  # YARN-8007 use the SYNTH as schedulingrequest input. But SYNTH can't create 
> a large number of specific resource requests. We wanna create a new 
> schedulingrequest input format(like sis format) for the more authentic input. 
> We can add some field in sis format, and this is schedulingrequest input 
> format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8670) Support scheduling request for SLS input and attributes for Node in SLS

2018-08-22 Thread Sichen zhao (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588645#comment-16588645
 ] 

Sichen zhao commented on YARN-8670:
---

Sorry for reply late.

it seems all the improvements is depending on YARN-3409? the master branch do 
not support scheduling request yet.

> Support scheduling request for SLS input and attributes for Node in SLS
> ---
>
> Key: YARN-8670
> URL: https://issues.apache.org/jira/browse/YARN-8670
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Affects Versions: YARN-3409
>Reporter: Sichen zhao
>Priority: Major
> Fix For: YARN-3409
>
>
> YARN-3409 introduces placement constraint, Currently SLS does not support 
> specify placement constraint. 
> YARN-8007 support specifying placement constraint for task containers in SLS. 
> But there are still 
> some room for improvement:
>  # YARN-8007 only support placement constraint for the jobs level. In fact, 
> the more flexible way is support placement constraint for the tasks level.
>  # In most scenarios, node itself has some characteristics, called attribute, 
> which is not supported in SLS. So we can add the attribute on Nodes.
>  # YARN-8007 use the SYNTH as schedulingrequest input. But SYNTH can't create 
> a large number of specific resource requests. We wanna create a new 
> schedulingrequest input format(like sis format) for the more authentic input. 
> We can add some field in sis format, and this is schedulingrequest input 
> format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8670) Support scheduling request for SLS input and attributes for Node in SLS

2018-08-22 Thread Sichen zhao (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588645#comment-16588645
 ] 

Sichen zhao edited comment on YARN-8670 at 8/22/18 9:57 AM:


Hi, sorry for reply late.

it seems all the improvements is depending on YARN-3409? the master branch do 
not support scheduling request yet.


was (Author: sichen zhao):
Sorry for reply late.

it seems all the improvements is depending on YARN-3409? the master branch do 
not support scheduling request yet.

> Support scheduling request for SLS input and attributes for Node in SLS
> ---
>
> Key: YARN-8670
> URL: https://issues.apache.org/jira/browse/YARN-8670
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Affects Versions: YARN-3409
>Reporter: Sichen zhao
>Priority: Major
> Fix For: YARN-3409
>
>
> YARN-3409 introduces placement constraint, Currently SLS does not support 
> specify placement constraint. 
> YARN-8007 support specifying placement constraint for task containers in SLS. 
> But there are still 
> some room for improvement:
>  # YARN-8007 only support placement constraint for the jobs level. In fact, 
> the more flexible way is support placement constraint for the tasks level.
>  # In most scenarios, node itself has some characteristics, called attribute, 
> which is not supported in SLS. So we can add the attribute on Nodes.
>  # YARN-8007 use the SYNTH as schedulingrequest input. But SYNTH can't create 
> a large number of specific resource requests. We wanna create a new 
> schedulingrequest input format(like sis format) for the more authentic input. 
> We can add some field in sis format, and this is schedulingrequest input 
> format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-08-22 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach updated YARN-8468:
-
Attachment: YARN-8468.006.patch

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-22 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588477#comment-16588477
 ] 

genericqa commented on YARN-8649:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 41s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m  
1s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8649 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936581/YARN-8649_3.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux da047436d45a 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8184739 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/21657/artifact/out/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21657/testReport/ |
| Max. process+thread count | 440 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output |

[jira] [Updated] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-22 Thread lujie (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8649:

Attachment: YARN-8649_3.patch

> Similar as YARN-4355:NPE while processing localizer heartbeat
> -
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, 
> hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 which is reported by [# Jason Lowe]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

69 matches

Mail list logo