[jira] [Commented] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644491#comment-16644491
 ] 

Hadoop QA commented on YARN-8858:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m  
8s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.8 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
 5s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} branch-2.8 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 56s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
19s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}116m  6s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ae3769f |
| JIRA Issue | YARN-8858 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943159/YARN-8858-branch-2.8.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux cd1bdee5529b 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2.8 / 94f4b5b |
| maven | version: Apache Maven 3.0.5 |
| Default Java | 1.7.0_181 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/22128/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/22128/testReport/ |
| asflicense | 
https://builds.apache.org/job/PreCommit-YARN-Build/22128/artifact/out/patch-asflicense-problems.txt
 |
| Max. process+thread count | 668 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/22128/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically 

[jira] [Commented] (YARN-8842) Update QueueMetrics with custom resource values

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644482#comment-16644482
 ] 

Hadoop QA commented on YARN-8842:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
24s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 26s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 37 new + 185 unchanged - 2 fixed = 222 total (was 187) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
18s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}122m 29s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}211m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterServiceCapacity |
|   | 
hadoop.yarn.server.resourcemanager.reservation.TestFairSchedulerPlanFollower |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue |
|   | 

[jira] [Comment Edited] (YARN-8842) Update QueueMetrics with custom resource values

2018-10-09 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644473#comment-16644473
 ] 

Wilfred Spiegelenburg edited comment on YARN-8842 at 10/10/18 5:29 AM:
---

Thank you [~snemeth]. I am glad we gave this its own jira separate from the 
pre-emption changes. It has become quite large on its own.

The patch overall looks good. I have a couple of remarks:
* {{getMaxAllocationUtilization}} is new and not used outside of the patch, why 
would we add it if we're only going to leverage it in YARN-8059?
* With that simplification we also do not need the tests for 
{{getMaxAllocationUtilization}} in the {{TestQueueMetricsForCustomResources}}
* There was a possible NPE in {{getAllocatedResources}} until v7now fixed, 
combined with the fact that I do not understand the need for the 
{{isThereAnyAllocatedResource}}, the way the newInstance is created we can pass 
in an empty list or even a null. I would expect the method to look like this:
{code}
if (queueMetricsForCustomResources != null) {
  return Resource.newInstance(allocatedMB.value(), allocatedVCores.value(),
  queueMetricsForCustomResources.getAllocatedValues());
}
return Resource.newInstance(allocatedMB.value(), allocatedVCores.value());
{code}
That does bring the {{QueueMetricsAllocatedCustomResources}} class back in line 
with all other custom resource metrics classes, just extend the abstract.
* Please look at the checkstyle issues mentioned.

A nit: on my system a number of indent inconsistencies showed up (mainly in 
TestQueueMetricsForCustomResources, QueueMetricsTestcase & TestQueueMetrics)



was (Author: wilfreds):
Thank you [~snemeth]. I am glad we gave this its own jira separate from the 
pre-emption changes. It has become quite large on its own.

The patch overall looks good. I have a couple of remarks:
* {{getMaxAllocationUtilization}} is new and not used outside of the patch, why 
would we add it if we're only going to leverage it in YARN-8059?
* With that simplification we also do not need the tests for 
{{getMaxAllocationUtilization}} in the {{TestQueueMetricsForCustomResources}}
* There is a possible NPE in {{getAllocatedResources}} combined with the fact 
that I do not understand the need for the {{isThereAnyAllocatedResource}}, the 
way the newInstance is created we can pass in an empty list or even a null. I 
would expect the method to look like this:
{code}
if (queueMetricsForCustomResources != null) {
  return Resource.newInstance(allocatedMB.value(), allocatedVCores.value(),
  queueMetricsForCustomResources.getAllocatedValues());
}
return Resource.newInstance(allocatedMB.value(), allocatedVCores.value());
{code}
That does bring the {{QueueMetricsAllocatedCustomResources}} class back in line 
with all other custom resource metrics classes, just extend the abstract.
* Please look at the checkstyle issues mentioned.

A nit: on my system a number of indent inconsistencies showed up (mainly in 
TestQueueMetricsForCustomResources, QueueMetricsTestcase & TestQueueMetrics)


> Update QueueMetrics with custom resource values 
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8842.001.patch, YARN-8842.002.patch, 
> YARN-8842.003.patch, YARN-8842.004.patch, YARN-8842.005.patch, 
> YARN-8842.006.patch, YARN-8842.007.patch, YARN-8842.008.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8842) Update QueueMetrics with custom resource values

2018-10-09 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644473#comment-16644473
 ] 

Wilfred Spiegelenburg commented on YARN-8842:
-

Thank you [~snemeth]. I am glad we gave this its own jira separate from the 
pre-emption changes. It has become quite large on its own.

The patch overall looks good. I have a couple of remarks:
* {{getMaxAllocationUtilization}} is new and not used outside of the patch, why 
would we add it if we're only going to leverage it in YARN-8059?
* With that simplification we also do not need the tests for 
{{getMaxAllocationUtilization}} in the {{TestQueueMetricsForCustomResources}}
* There is a possible NPE in {{getAllocatedResources}} combined with the fact 
that I do not understand the need for the {{isThereAnyAllocatedResource}}, the 
way the newInstance is created we can pass in an empty list or even a null. I 
would expect the method to look like this:
{code}
if (queueMetricsForCustomResources != null) {
  return Resource.newInstance(allocatedMB.value(), allocatedVCores.value(),
  queueMetricsForCustomResources.getAllocatedValues());
}
return Resource.newInstance(allocatedMB.value(), allocatedVCores.value());
{code}
That does bring the {{QueueMetricsAllocatedCustomResources}} class back in line 
with all other custom resource metrics classes, just extend the abstract.
* Please look at the checkstyle issues mentioned.

A nit: on my system a number of indent inconsistencies showed up (mainly in 
TestQueueMetricsForCustomResources, QueueMetricsTestcase & TestQueueMetrics)


> Update QueueMetrics with custom resource values 
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8842.001.patch, YARN-8842.002.patch, 
> YARN-8842.003.patch, YARN-8842.004.patch, YARN-8842.005.patch, 
> YARN-8842.006.patch, YARN-8842.007.patch, YARN-8842.008.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644454#comment-16644454
 ] 

Hadoop QA commented on YARN-8753:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
33m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m 38s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8753 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943083/YARN-8753.002.patch |
| Optional Tests |  dupname  asflicense  shadedclient  |
| uname | Linux 7a5bc4a3eed9 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / edce866 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 341 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/22131/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> [UI2] Lost nodes representation missing from Nodemanagers Chart
> ---
>
> Key: YARN-8753
> URL: https://issues.apache.org/jira/browse/YARN-8753
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot 
> 2018-09-06 at 6.16.14 PM.png, Screen Shot 2018-09-07 at 11.59.02 AM.png, 
> YARN-8753.001.patch, YARN-8753.002.patch
>
>
> Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status 
> page. 
> This chart does not show nodemanagers if they are LOST. 
> Due to this issue, Node information page and Node status page shows different 
> node managers count. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5742) Serve aggregated logs of historical apps from timeline service

2018-10-09 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644453#comment-16644453
 ] 

Rohith Sharma K S edited comment on YARN-5742 at 10/10/18 4:51 AM:
---

[~vinodkv] [~vrushalic] [~sunilg] could you guys take a look at patch one round 
please? 


was (Author: rohithsharma):
[~vinodkv] [~vrushalic] [~sunilg] could you guys take a look at patch please? 

> Serve aggregated logs of historical apps from timeline service
> --
>
> Key: YARN-5742
> URL: https://issues.apache.org/jira/browse/YARN-5742
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5742-POC-v0.patch, YARN-5742.01.patch, 
> YARN-5742.v0.patch
>
>
> ATSv1.5 daemon has servlet to serve aggregated logs. But enabling only ATSv2, 
> does not serve logs from CLI and UI for completed application. Log serving 
> story has completely broken in ATSv2.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5742) Serve aggregated logs of historical apps from timeline service

2018-10-09 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644453#comment-16644453
 ] 

Rohith Sharma K S commented on YARN-5742:
-

[~vinodkv] [~vrushalic] [~sunilg] could you guys take a look at patch please? 

> Serve aggregated logs of historical apps from timeline service
> --
>
> Key: YARN-5742
> URL: https://issues.apache.org/jira/browse/YARN-5742
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5742-POC-v0.patch, YARN-5742.01.patch, 
> YARN-5742.v0.patch
>
>
> ATSv1.5 daemon has servlet to serve aggregated logs. But enabling only ATSv2, 
> does not serve logs from CLI and UI for completed application. Log serving 
> story has completely broken in ATSv2.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5742) Serve aggregated logs of historical apps from timeline service

2018-10-09 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644452#comment-16644452
 ] 

Rohith Sharma K S commented on YARN-5742:
-

Thanks [~abmodi] 

bq. throw new BadRequestException("invalid container id, " + containerIdStr); 
=> id, should be replaced by id:
sorry didn't get it. 


> Serve aggregated logs of historical apps from timeline service
> --
>
> Key: YARN-5742
> URL: https://issues.apache.org/jira/browse/YARN-5742
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-5742-POC-v0.patch, YARN-5742.01.patch, 
> YARN-5742.v0.patch
>
>
> ATSv1.5 daemon has servlet to serve aggregated logs. But enabling only ATSv2, 
> does not serve logs from CLI and UI for completed application. Log serving 
> story has completely broken in ATSv2.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-10-09 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8468:
--
Attachment: YARN-8468-branch-3.1.020.patch

> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8468-branch-3.1.018.patch, 
> YARN-8468-branch-3.1.019.patch, YARN-8468-branch-3.1.020.patch, 
> YARN-8468.000.patch, YARN-8468.001.patch, YARN-8468.002.patch, 
> YARN-8468.003.patch, YARN-8468.004.patch, YARN-8468.005.patch, 
> YARN-8468.006.patch, YARN-8468.007.patch, YARN-8468.008.patch, 
> YARN-8468.009.patch, YARN-8468.010.patch, YARN-8468.011.patch, 
> YARN-8468.012.patch, YARN-8468.013.patch, YARN-8468.014.patch, 
> YARN-8468.015.patch, YARN-8468.016.patch, YARN-8468.017.patch, 
> YARN-8468.018.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
> The goal of this ticket is to allow this value to be set on a per queue basis.
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
> Suggested solution:
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability(String queueName) in both 
> FSParentQueue and FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * Enforce the use of queue based maximum allocation limit if it is 
> available, if not use the general scheduler level setting
>  ** Use it during validation and normalization of requests in 
> scheduler.allocate, app submit and resource request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-10-09 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reopened YARN-8468:
---

> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8468-branch-3.1.018.patch, 
> YARN-8468-branch-3.1.019.patch, YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, 
> YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, 
> YARN-8468.017.patch, YARN-8468.018.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
> The goal of this ticket is to allow this value to be set on a per queue basis.
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
> Suggested solution:
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability(String queueName) in both 
> FSParentQueue and FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * Enforce the use of queue based maximum allocation limit if it is 
> available, if not use the general scheduler level setting
>  ** Use it during validation and normalization of requests in 
> scheduler.allocate, app submit and resource request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-10-09 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1660#comment-1660
 ] 

Weiwei Yang commented on YARN-8468:
---

Manually trigger the build won't work as the patch is already committed. Since 
the last patch only fixed one UT failure, I don't think we need to revert it 
and re-run jenkins. Instead, we can re-open this JIRA, create some fake code 
changes and trigger the branch-3.1 UT run again to see if there is any issue.

> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8468-branch-3.1.018.patch, 
> YARN-8468-branch-3.1.019.patch, YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, 
> YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, 
> YARN-8468.017.patch, YARN-8468.018.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
> The goal of this ticket is to allow this value to be set on a per queue basis.
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
> Suggested solution:
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability(String queueName) in both 
> FSParentQueue and FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * Enforce the use of queue based maximum allocation limit if it is 
> available, if not use the general scheduler level setting
>  ** Use it during validation and normalization of requests in 
> scheduler.allocate, app submit and resource request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-10-09 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reopened YARN-8468:
---

> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8468-branch-3.1.018.patch, 
> YARN-8468-branch-3.1.019.patch, YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, 
> YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, 
> YARN-8468.017.patch, YARN-8468.018.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
> The goal of this ticket is to allow this value to be set on a per queue basis.
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
> Suggested solution:
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability(String queueName) in both 
> FSParentQueue and FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * Enforce the use of queue based maximum allocation limit if it is 
> available, if not use the general scheduler level setting
>  ** Use it during validation and normalization of requests in 
> scheduler.allocate, app submit and resource request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-10-09 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved YARN-8468.
---
Resolution: Fixed

> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8468-branch-3.1.018.patch, 
> YARN-8468-branch-3.1.019.patch, YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, 
> YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, 
> YARN-8468.017.patch, YARN-8468.018.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
> The goal of this ticket is to allow this value to be set on a per queue basis.
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
> Suggested solution:
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability(String queueName) in both 
> FSParentQueue and FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * Enforce the use of queue based maximum allocation limit if it is 
> available, if not use the general scheduler level setting
>  ** Use it during validation and normalization of requests in 
> scheduler.allocate, app submit and resource request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-10-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644427#comment-16644427
 ] 

Wangda Tan commented on YARN-8569:
--

[~eyang], 

 
{quote}In sync_yarn_sysfs, check_nm_local_dir make sure the nm_local_dir is 
owned by yarn user. 
{quote}
If you will do #1, IIUC, the #2 has to be done, otherwise as a normal user we 
cannot read from nmPrivate dir. 

 
{quote}I am concerned that tokens are localized in the same nmPrivate directory.
{quote}
We already limit to read file under .../nmPrivate/app../sys/fs/ correct? 
How it is possible to read token file from that directory? 
{quote}Point 4 is a bit concerning to me. I am error on side of caution.. ..
{quote}
Given the file hierarchy is part of the API, we should make it correct in the 
first cut. Any changes in the future will be an incompatible change. So I 
prefer to get the API correct in the first patch. 

> Create an interface to provide cluster information to application
> -
>
> Key: YARN-8569
> URL: https://issues.apache.org/jira/browse/YARN-8569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8569 YARN sysfs interface to provide cluster 
> information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, 
> YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, 
> YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, 
> YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8827) Plumb aggregated application resource utilization from the NM to RM

2018-10-09 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-8827:
--
Fix Version/s: YARN-1011

> Plumb aggregated application resource utilization from the NM to RM
> ---
>
> Key: YARN-8827
> URL: https://issues.apache.org/jira/browse/YARN-8827
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Major
> Fix For: YARN-1011
>
> Attachments: YARN-8827-YARN-1011.01.patch, 
> YARN-8827-YARN-1011.02.patch, YARN-8827-YARN-1011.03.patch, 
> YARN-8827-YARN-1011.04.patch, YARN-8827-YARN-1011.05.patch, 
> YARN-8827-YARN-1011.06.patch, YARN-8827-YARN-1011.07.patch
>
>
> Opportunistic Containers for OverAllocation need to be allocated to pending 
> applications in some fair manner. Rather than evaluating queue and user 
> resource usage (allocated resource usage) and comparing against queue and 
> user limits to decide the allocation, it might  make more sense to use a 
> snapshot of actual resource utilization of the queue and user.
> To facilitate this, this JIRA proposes to aggregate per user, per app (and 
> maybe per queue) resource utilization in addition to aggregated Container and 
> Node Utilization and send it along with the NM heartbeat. It should be fairly 
> inexpensive to aggregate - since it can be performed in the same loop of the 
> {{ContainersMonitorImpl}}'s Monitoring thread.
> A snapshot aggregate can be made every couple of seconds in the RM. This 
> instantaneous resource utilization should be used to decide if Opportunistic 
> containers can be allocated to an App, Queue or User.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8827) Plumb aggregated application resource utilization from the NM to RM

2018-10-09 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-8827:
--
Affects Version/s: YARN-1011

> Plumb aggregated application resource utilization from the NM to RM
> ---
>
> Key: YARN-8827
> URL: https://issues.apache.org/jira/browse/YARN-8827
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Major
> Fix For: YARN-1011
>
> Attachments: YARN-8827-YARN-1011.01.patch, 
> YARN-8827-YARN-1011.02.patch, YARN-8827-YARN-1011.03.patch, 
> YARN-8827-YARN-1011.04.patch, YARN-8827-YARN-1011.05.patch, 
> YARN-8827-YARN-1011.06.patch, YARN-8827-YARN-1011.07.patch
>
>
> Opportunistic Containers for OverAllocation need to be allocated to pending 
> applications in some fair manner. Rather than evaluating queue and user 
> resource usage (allocated resource usage) and comparing against queue and 
> user limits to decide the allocation, it might  make more sense to use a 
> snapshot of actual resource utilization of the queue and user.
> To facilitate this, this JIRA proposes to aggregate per user, per app (and 
> maybe per queue) resource utilization in addition to aggregated Container and 
> Node Utilization and send it along with the NM heartbeat. It should be fairly 
> inexpensive to aggregate - since it can be performed in the same loop of the 
> {{ContainersMonitorImpl}}'s Monitoring thread.
> A snapshot aggregate can be made every couple of seconds in the RM. This 
> instantaneous resource utilization should be used to decide if Opportunistic 
> containers can be allocated to an App, Queue or User.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8827) Plumb aggregated application resource utilization from the NM to RM

2018-10-09 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-8827:
--
Summary: Plumb aggregated application resource utilization from the NM to 
RM  (was: Plumb per application resource utilization from the NM to RM)

> Plumb aggregated application resource utilization from the NM to RM
> ---
>
> Key: YARN-8827
> URL: https://issues.apache.org/jira/browse/YARN-8827
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Major
> Attachments: YARN-8827-YARN-1011.01.patch, 
> YARN-8827-YARN-1011.02.patch, YARN-8827-YARN-1011.03.patch, 
> YARN-8827-YARN-1011.04.patch, YARN-8827-YARN-1011.05.patch, 
> YARN-8827-YARN-1011.06.patch, YARN-8827-YARN-1011.07.patch
>
>
> Opportunistic Containers for OverAllocation need to be allocated to pending 
> applications in some fair manner. Rather than evaluating queue and user 
> resource usage (allocated resource usage) and comparing against queue and 
> user limits to decide the allocation, it might  make more sense to use a 
> snapshot of actual resource utilization of the queue and user.
> To facilitate this, this JIRA proposes to aggregate per user, per app (and 
> maybe per queue) resource utilization in addition to aggregated Container and 
> Node Utilization and send it along with the NM heartbeat. It should be fairly 
> inexpensive to aggregate - since it can be performed in the same loop of the 
> {{ContainersMonitorImpl}}'s Monitoring thread.
> A snapshot aggregate can be made every couple of seconds in the RM. This 
> instantaneous resource utilization should be used to decide if Opportunistic 
> containers can be allocated to an App, Queue or User.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8827) Plumb per application resource utilization from the NM to RM

2018-10-09 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644410#comment-16644410
 ] 

Arun Suresh commented on YARN-8827:
---

Thanks for the review [~elgoiri]. Committing this..

> Plumb per application resource utilization from the NM to RM
> 
>
> Key: YARN-8827
> URL: https://issues.apache.org/jira/browse/YARN-8827
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Major
> Attachments: YARN-8827-YARN-1011.01.patch, 
> YARN-8827-YARN-1011.02.patch, YARN-8827-YARN-1011.03.patch, 
> YARN-8827-YARN-1011.04.patch, YARN-8827-YARN-1011.05.patch, 
> YARN-8827-YARN-1011.06.patch, YARN-8827-YARN-1011.07.patch
>
>
> Opportunistic Containers for OverAllocation need to be allocated to pending 
> applications in some fair manner. Rather than evaluating queue and user 
> resource usage (allocated resource usage) and comparing against queue and 
> user limits to decide the allocation, it might  make more sense to use a 
> snapshot of actual resource utilization of the queue and user.
> To facilitate this, this JIRA proposes to aggregate per user, per app (and 
> maybe per queue) resource utilization in addition to aggregated Container and 
> Node Utilization and send it along with the NM heartbeat. It should be fairly 
> inexpensive to aggregate - since it can be performed in the same loop of the 
> {{ContainersMonitorImpl}}'s Monitoring thread.
> A snapshot aggregate can be made every couple of seconds in the RM. This 
> instantaneous resource utilization should be used to decide if Opportunistic 
> containers can be allocated to an App, Queue or User.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8827) Plumb per application resource utilization from the NM to RM

2018-10-09 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-8827:
--
Summary: Plumb per application resource utilization from the NM to RM  
(was: Plumb per app, per user and per queue resource utilization from the NM to 
RM)

> Plumb per application resource utilization from the NM to RM
> 
>
> Key: YARN-8827
> URL: https://issues.apache.org/jira/browse/YARN-8827
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Major
> Attachments: YARN-8827-YARN-1011.01.patch, 
> YARN-8827-YARN-1011.02.patch, YARN-8827-YARN-1011.03.patch, 
> YARN-8827-YARN-1011.04.patch, YARN-8827-YARN-1011.05.patch, 
> YARN-8827-YARN-1011.06.patch, YARN-8827-YARN-1011.07.patch
>
>
> Opportunistic Containers for OverAllocation need to be allocated to pending 
> applications in some fair manner. Rather than evaluating queue and user 
> resource usage (allocated resource usage) and comparing against queue and 
> user limits to decide the allocation, it might  make more sense to use a 
> snapshot of actual resource utilization of the queue and user.
> To facilitate this, this JIRA proposes to aggregate per user, per app (and 
> maybe per queue) resource utilization in addition to aggregated Container and 
> Node Utilization and send it along with the NM heartbeat. It should be fairly 
> inexpensive to aggregate - since it can be performed in the same loop of the 
> {{ContainersMonitorImpl}}'s Monitoring thread.
> A snapshot aggregate can be made every couple of seconds in the RM. This 
> instantaneous resource utilization should be used to decide if Opportunistic 
> containers can be allocated to an App, Queue or User.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8842) Update QueueMetrics with custom resource values

2018-10-09 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8842:
-
Attachment: YARN-8842.008.patch

> Update QueueMetrics with custom resource values 
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8842.001.patch, YARN-8842.002.patch, 
> YARN-8842.003.patch, YARN-8842.004.patch, YARN-8842.005.patch, 
> YARN-8842.006.patch, YARN-8842.007.patch, YARN-8842.008.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8842) Update QueueMetrics with custom resource values

2018-10-09 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644398#comment-16644398
 ] 

Szilard Nemeth commented on YARN-8842:
--

patch008 fixes a missing null check

> Update QueueMetrics with custom resource values 
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8842.001.patch, YARN-8842.002.patch, 
> YARN-8842.003.patch, YARN-8842.004.patch, YARN-8842.005.patch, 
> YARN-8842.006.patch, YARN-8842.007.patch, YARN-8842.008.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8862) [GPG] add Yarn Registry cleanup in ApplicationCleaner

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644397#comment-16644397
 ] 

Hadoop QA commented on YARN-8862:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-7402 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
51s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
0s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
46s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
26s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} YARN-7402 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 58s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 1 new + 
0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
19s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
31s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
38s{color} | {color:green} hadoop-yarn-server-globalpolicygenerator in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 94m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8862 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943149/YARN-8862-YARN-7402.v2.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4bc5dc2ef7f9 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (YARN-8842) Update QueueMetrics with custom resource values

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644380#comment-16644380
 ] 

Hadoop QA commented on YARN-8842:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
55s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 41s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 42 new + 185 unchanged - 2 fixed = 227 total (was 187) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 57s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
48s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}135m 55s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 5s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}231m 52s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterServiceCapacity |
|   | 
hadoop.yarn.server.resourcemanager.reservation.TestFairSchedulerPlanFollower |
|   | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMReconnect |
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppAttempt |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | 

[jira] [Commented] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644375#comment-16644375
 ] 

Weiwei Yang commented on YARN-8858:
---

Hi [~leftnoteasy], the patch has conflict to branch-2.8, could you pls take a 
look at the patch I just uploaded for branch-2.8, and let me know if that is 
good to port back to that branch. Thanks.

> CapacityScheduler should respect maximum node resource when per-queue 
> maximum-allocation is being used.
> ---
>
> Key: YARN-8858
> URL: https://issues.apache.org/jira/browse/YARN-8858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2, 3.3.0
>
> Attachments: YARN-8858-branch-2.8.001.patch, YARN-8858.001.patch, 
> YARN-8858.002.patch
>
>
> This issue happens after YARN-8720.
> Before that, AMS uses scheduler.getMaximumAllocation to do the normalization. 
> After that, AMS uses LeafQueue.getMaximumAllocation. The scheduler one uses 
> nodeTracker.getMaximumAllocation, but the LeafQueue.getMaximum doesn't. 
> We should use the scheduler.getMaximumAllocation to cap the per-queue's 
> maximum-allocation every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reopened YARN-8858:
---

> CapacityScheduler should respect maximum node resource when per-queue 
> maximum-allocation is being used.
> ---
>
> Key: YARN-8858
> URL: https://issues.apache.org/jira/browse/YARN-8858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2, 3.3.0
>
> Attachments: YARN-8858-branch-2.8.001.patch, YARN-8858.001.patch, 
> YARN-8858.002.patch
>
>
> This issue happens after YARN-8720.
> Before that, AMS uses scheduler.getMaximumAllocation to do the normalization. 
> After that, AMS uses LeafQueue.getMaximumAllocation. The scheduler one uses 
> nodeTracker.getMaximumAllocation, but the LeafQueue.getMaximum doesn't. 
> We should use the scheduler.getMaximumAllocation to cap the per-queue's 
> maximum-allocation every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8858:
--
Attachment: YARN-8858-branch-2.8.001.patch

> CapacityScheduler should respect maximum node resource when per-queue 
> maximum-allocation is being used.
> ---
>
> Key: YARN-8858
> URL: https://issues.apache.org/jira/browse/YARN-8858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2, 3.3.0
>
> Attachments: YARN-8858-branch-2.8.001.patch, YARN-8858.001.patch, 
> YARN-8858.002.patch
>
>
> This issue happens after YARN-8720.
> Before that, AMS uses scheduler.getMaximumAllocation to do the normalization. 
> After that, AMS uses LeafQueue.getMaximumAllocation. The scheduler one uses 
> nodeTracker.getMaximumAllocation, but the LeafQueue.getMaximum doesn't. 
> We should use the scheduler.getMaximumAllocation to cap the per-queue's 
> maximum-allocation every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644374#comment-16644374
 ] 

Weiwei Yang commented on YARN-8858:
---

Hi [~leftnoteasy]

I have committed the patch to trunk, cherry-picked to branch-3.2, branch-3.1, 
branch-3.0, branch-2, branch-2.9.

> CapacityScheduler should respect maximum node resource when per-queue 
> maximum-allocation is being used.
> ---
>
> Key: YARN-8858
> URL: https://issues.apache.org/jira/browse/YARN-8858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2, 3.3.0
>
> Attachments: YARN-8858-branch-2.8.001.patch, YARN-8858.001.patch, 
> YARN-8858.002.patch
>
>
> This issue happens after YARN-8720.
> Before that, AMS uses scheduler.getMaximumAllocation to do the normalization. 
> After that, AMS uses LeafQueue.getMaximumAllocation. The scheduler one uses 
> nodeTracker.getMaximumAllocation, but the LeafQueue.getMaximum doesn't. 
> We should use the scheduler.getMaximumAllocation to cap the per-queue's 
> maximum-allocation every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8858:
--
Fix Version/s: 2.10.0

> CapacityScheduler should respect maximum node resource when per-queue 
> maximum-allocation is being used.
> ---
>
> Key: YARN-8858
> URL: https://issues.apache.org/jira/browse/YARN-8858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2, 3.3.0
>
> Attachments: YARN-8858.001.patch, YARN-8858.002.patch
>
>
> This issue happens after YARN-8720.
> Before that, AMS uses scheduler.getMaximumAllocation to do the normalization. 
> After that, AMS uses LeafQueue.getMaximumAllocation. The scheduler one uses 
> nodeTracker.getMaximumAllocation, but the LeafQueue.getMaximum doesn't. 
> We should use the scheduler.getMaximumAllocation to cap the per-queue's 
> maximum-allocation every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8720) CapacityScheduler does not enforce max resource allocation check at queue level

2018-10-09 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8720:
--
Fix Version/s: 2.10.0

> CapacityScheduler does not enforce max resource allocation check at queue 
> level
> ---
>
> Key: YARN-8720
> URL: https://issues.apache.org/jira/browse/YARN-8720
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2, 2.8.6
>
> Attachments: YARN-8720-branch-2.8.001.patch, YARN-8720.001.patch, 
> YARN-8720.002.patch
>
>
> The value of 
> yarn.scheduler.capacity..maximum-allocation-mb/vcores is not 
> strictly enforced when applications request containers. An 
> InvalidResourceRequestException is thrown only when the ResourceRequest is 
> greater than the global value of yarn.scheduler.maximum-allocation-mb/vcores 
> . So for an example configuration such as below,
>  
> {code:java}
> yarn.scheduler.maximum-allocation-mb=4096
> yarn.scheduler.capacity.root.test.maximum-allocation-mb=2048
> {code}
>  
> The below DSShell command runs successfully and asks an AM container of size 
> 4096MB which is greater than max 2048MB configured in test queue.
> {code:java}
> yarn jar $YARN_HOME/hadoop-yarn-applications-distributedshell.jar 
> -num_containers 1 -jar 
> $YARN_HOME/hadoop-yarn-applications-distributedshell.jar -shell_command 
> "sleep 60" -container_memory=4096 -master_memory=4096 -queue=test{code}
> Instead it should not launch the application and fail with 
> InvalidResourceRequestException . The child container however will be 
> requested with size 2048MB as DSShell AppMaster does the below check before 
> ResourceRequest ask with RM.
> {code:java}
> // A resource ask cannot exceed the max.
> if (containerMemory > maxMem) {
>  LOG.info("Container memory specified above max threshold of cluster."
>  + " Using max value." + ", specified=" + containerMemory + ", max="
>  + maxMem);
>  containerMemory = maxMem;
> }{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644359#comment-16644359
 ] 

Hudson commented on YARN-8858:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15164 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15164/])
YARN-8858. CapacityScheduler should respect maximum node resource when (wwei: 
rev edce866489d83744f3f47a3b884b0c6136885e4a)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ClusterNodeTracker.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java


> CapacityScheduler should respect maximum node resource when per-queue 
> maximum-allocation is being used.
> ---
>
> Key: YARN-8858
> URL: https://issues.apache.org/jira/browse/YARN-8858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 2.9.2, 3.0.4, 3.1.2, 3.3.0
>
> Attachments: YARN-8858.001.patch, YARN-8858.002.patch
>
>
> This issue happens after YARN-8720.
> Before that, AMS uses scheduler.getMaximumAllocation to do the normalization. 
> After that, AMS uses LeafQueue.getMaximumAllocation. The scheduler one uses 
> nodeTracker.getMaximumAllocation, but the LeafQueue.getMaximum doesn't. 
> We should use the scheduler.getMaximumAllocation to cap the per-queue's 
> maximum-allocation every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8858:
--
Fix Version/s: 2.9.2

> CapacityScheduler should respect maximum node resource when per-queue 
> maximum-allocation is being used.
> ---
>
> Key: YARN-8858
> URL: https://issues.apache.org/jira/browse/YARN-8858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 2.9.2, 3.0.4, 3.1.2, 3.3.0
>
> Attachments: YARN-8858.001.patch, YARN-8858.002.patch
>
>
> This issue happens after YARN-8720.
> Before that, AMS uses scheduler.getMaximumAllocation to do the normalization. 
> After that, AMS uses LeafQueue.getMaximumAllocation. The scheduler one uses 
> nodeTracker.getMaximumAllocation, but the LeafQueue.getMaximum doesn't. 
> We should use the scheduler.getMaximumAllocation to cap the per-queue's 
> maximum-allocation every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8858:
--
Fix Version/s: 3.0.4

> CapacityScheduler should respect maximum node resource when per-queue 
> maximum-allocation is being used.
> ---
>
> Key: YARN-8858
> URL: https://issues.apache.org/jira/browse/YARN-8858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 3.0.4, 3.1.2, 3.3.0
>
> Attachments: YARN-8858.001.patch, YARN-8858.002.patch
>
>
> This issue happens after YARN-8720.
> Before that, AMS uses scheduler.getMaximumAllocation to do the normalization. 
> After that, AMS uses LeafQueue.getMaximumAllocation. The scheduler one uses 
> nodeTracker.getMaximumAllocation, but the LeafQueue.getMaximum doesn't. 
> We should use the scheduler.getMaximumAllocation to cap the per-queue's 
> maximum-allocation every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8858:
--
Fix Version/s: 3.1.2

> CapacityScheduler should respect maximum node resource when per-queue 
> maximum-allocation is being used.
> ---
>
> Key: YARN-8858
> URL: https://issues.apache.org/jira/browse/YARN-8858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 3.1.2, 3.3.0
>
> Attachments: YARN-8858.001.patch, YARN-8858.002.patch
>
>
> This issue happens after YARN-8720.
> Before that, AMS uses scheduler.getMaximumAllocation to do the normalization. 
> After that, AMS uses LeafQueue.getMaximumAllocation. The scheduler one uses 
> nodeTracker.getMaximumAllocation, but the LeafQueue.getMaximum doesn't. 
> We should use the scheduler.getMaximumAllocation to cap the per-queue's 
> maximum-allocation every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8842) Update QueueMetrics with custom resource values

2018-10-09 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8842:
-
Attachment: YARN-8842.007.patch

> Update QueueMetrics with custom resource values 
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8842.001.patch, YARN-8842.002.patch, 
> YARN-8842.003.patch, YARN-8842.004.patch, YARN-8842.005.patch, 
> YARN-8842.006.patch, YARN-8842.007.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8827) Plumb per app, per user and per queue resource utilization from the NM to RM

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644330#comment-16644330
 ] 

Hadoop QA commented on YARN-8827:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-1011 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  5m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
55s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m  
6s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
56s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
41s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
58s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
13s{color} | {color:green} YARN-1011 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 14m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  1s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
36s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 43s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 78m 
55s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
46s{color} | {color:green} hadoop-sls in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}225m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestNMProxy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8827 |
| JIRA 

[jira] [Comment Edited] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644322#comment-16644322
 ] 

Weiwei Yang edited comment on YARN-8858 at 10/10/18 1:31 AM:
-

+1. The UT failure should not be related. I applied the patch and tested 
locally, they work fine. I will commit this shortly. BTW, the checksytle issues 
were caused by some unused imports in the test class, I will remove them during 
the commit.


was (Author: cheersyang):
+1. The UT failure should not be related. I applied the patch and tested 
locally, they work fine. I will commit this shortly.

> CapacityScheduler should respect maximum node resource when per-queue 
> maximum-allocation is being used.
> ---
>
> Key: YARN-8858
> URL: https://issues.apache.org/jira/browse/YARN-8858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8858.001.patch, YARN-8858.002.patch
>
>
> This issue happens after YARN-8720.
> Before that, AMS uses scheduler.getMaximumAllocation to do the normalization. 
> After that, AMS uses LeafQueue.getMaximumAllocation. The scheduler one uses 
> nodeTracker.getMaximumAllocation, but the LeafQueue.getMaximum doesn't. 
> We should use the scheduler.getMaximumAllocation to cap the per-queue's 
> maximum-allocation every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644322#comment-16644322
 ] 

Weiwei Yang commented on YARN-8858:
---

+1. The UT failure should not be related. I applied the patch and tested 
locally, they work fine. I will commit this shortly.

> CapacityScheduler should respect maximum node resource when per-queue 
> maximum-allocation is being used.
> ---
>
> Key: YARN-8858
> URL: https://issues.apache.org/jira/browse/YARN-8858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8858.001.patch, YARN-8858.002.patch
>
>
> This issue happens after YARN-8720.
> Before that, AMS uses scheduler.getMaximumAllocation to do the normalization. 
> After that, AMS uses LeafQueue.getMaximumAllocation. The scheduler one uses 
> nodeTracker.getMaximumAllocation, but the LeafQueue.getMaximum doesn't. 
> We should use the scheduler.getMaximumAllocation to cap the per-queue's 
> maximum-allocation every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8862) [GPG] add Yarn Registry cleanup in ApplicationCleaner

2018-10-09 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8862:
---
Attachment: YARN-8862-YARN-7402.v2.patch

> [GPG] add Yarn Registry cleanup in ApplicationCleaner
> -
>
> Key: YARN-8862
> URL: https://issues.apache.org/jira/browse/YARN-8862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8862-YARN-7402.v1.patch, 
> YARN-8862-YARN-7402.v2.patch
>
>
> In Yarn Federation, we use Yarn Registry to use the AMToken for UAMs in 
> secondary sub-clusters. Because of potential more app attempts later, 
> AMRMProxy cannot kill the UAM and delete the tokens when one local attempt 
> finishes. So similar to the StateStore application table, we need 
> ApplicationCleaner in GPG to cleanup the finished app entries in Yarn 
> Registry. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8569) Create an interface to provide cluster information to application

2018-10-09 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644277#comment-16644277
 ] 

Eric Yang edited comment on YARN-8569 at 10/10/18 12:25 AM:


{quote}
1) We use use set_user before the method to drop privileges.
{quote}

Good catch, check_user will be applied in the next patch.

{quote}
2) Root privilege should be only used by open_file_as_nm.
{quote}

In sync_yarn_sysfs, check_nm_local_dir make sure the nm_local_dir is owned by 
yarn user.  This ensure the copy file operation is safe guarded against 
possible path exploits.  This is equivalent to next point to ensure nm local 
dirs are indeed owned by yarn.

{quote}
3) We should check all in/out file paths under local dirs defined by 
container_executor.cfg.
{quote}

Initialize container, launch container, launch docker container, they are all 
using nm-local-dirs path passed from node manager to container executor.  There 
is no definition of yarn.nodemanager.local-dirs in container-executor.cfg file. 
 For containing the scope of this JIRA, I opened YARN-8863 to track the 
proposed global pattern change.

{quote}
4) Regarding to API, I think we should allow filename specified.
{quote}

I am concerned that tokens are localized in the same nmPrivate directory.  If 
filename can be specified, it may increase vulnerable surface for attack to use 
the REST API to copy hdfs token into sysfs directory.  This appears to be 
innocent action is unsafe.  For untrusted docker containers, we prevent 
mounting of token file into container, and run it as a sandbox.  If we allow 
this action to happen, the untrusted container can acquire access to HDFS by 
calling sync yarn sysfs REST API.

{quote}5) For the mounted dir for container, for app-level information, I would 
prefer to put under ...sys/fs/app/ instead of ...sys/fs/ since we 
want to support per-container information in the future.{quote}

Point 4 is a bit concerning to me.  I am error on side of caution.  Therefore, 
I do not plan to make arbitrary filename change in this JIRA.  Please open a 
separate one, if you feel strongly about supporting custom filename in the rest 
API.


was (Author: eyang):
{quote}
1) We use use set_user before the method to drop privileges.
{quote}

Good catch, check_user will be applied in the next patch.

{quote}
2) Root privilege should be only used by open_file_as_nm.
{quote}

In sync_yarn_sysfs, check_nm_local_dir make sure the nm_local_dir is owned by 
yarn user.  This ensure the copy file operation is safe guarded against 
possible path exploits.  This is equivalent to next point to ensure nm local 
dirs are indeed owned by yarn.

{quote}
3) We should check all in/out file paths under local dirs defined by 
container_executor.cfg.
{quote}

Initialize container, launch container, launch docker container, they are all 
using nm-local-dirs path passed from node manager to container executor.  There 
is no definition of yarn.nodemanager.local-dirs in container-executor.cfg file. 
 For containing the scope of this JIRA, I opened YARN-8863 to track the 
proposed global pattern change.

{quote}
4) Regarding to API, I think we should allow filename specified.
{quote}

I am concerned that tokens are localized in the same nmPrivate directory.  If 
filename can be specified, it may increase vulnerable surface for attack to use 
the REST API to copy hdfs token into sysfs directory.  I don't unsure this 
appear to be innocent action is safe.

{quote}5) For the mounted dir for container, for app-level information, I would 
prefer to put under ...sys/fs/app/ instead of ...sys/fs/ since we 
want to support per-container information in the future.{quote}

Point 4 is a bit concerning to me.  I am error on side of caution.  Therefore, 
I do not plan to make arbitrary filename change in this JIRA.  Please open a 
separate one, if you feel strongly about supporting custom filename in the rest 
API.

> Create an interface to provide cluster information to application
> -
>
> Key: YARN-8569
> URL: https://issues.apache.org/jira/browse/YARN-8569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8569 YARN sysfs interface to provide cluster 
> information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, 
> YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, 
> YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, 
> YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:

[jira] [Commented] (YARN-8862) [GPG] add Yarn Registry cleanup in ApplicationCleaner

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644283#comment-16644283
 ] 

Hadoop QA commented on YARN-8862:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-7402 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m  
6s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
52s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
12s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 1s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
39s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} YARN-7402 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
36s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 55s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 4 new + 
1 unchanged - 0 fixed = 5 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
17s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
39s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
39s{color} | {color:green} hadoop-yarn-server-globalpolicygenerator in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 92m 35s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8862 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943132/YARN-8862-YARN-7402.v1.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0cdf0b6f88cf 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-10-09 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644277#comment-16644277
 ] 

Eric Yang commented on YARN-8569:
-

{quote}
1) We use use set_user before the method to drop privileges.
{quote}

Good catch, check_user will be applied in the next patch.

{quote}
2) Root privilege should be only used by open_file_as_nm.
{quote}

In sync_yarn_sysfs, check_nm_local_dir make sure the nm_local_dir is owned by 
yarn user.  This ensure the copy file operation is safe guarded against 
possible path exploits.  This is equivalent to next point to ensure nm local 
dirs are indeed owned by yarn.

{quote}
3) We should check all in/out file paths under local dirs defined by 
container_executor.cfg.
{quote}

Initialize container, launch container, launch docker container, they are all 
using nm-local-dirs path passed from node manager to container executor.  There 
is no definition of yarn.nodemanager.local-dirs in container-executor.cfg file. 
 For containing the scope of this JIRA, I opened YARN-8863 to track the 
proposed global pattern change.

{quote}
4) Regarding to API, I think we should allow filename specified.
{quote}

I am concerned that tokens are localized in the same nmPrivate directory.  If 
filename can be specified, it may increase vulnerable surface for attack to use 
the REST API to copy hdfs token into sysfs directory.  I don't unsure this 
appear to be innocent action is safe.

{quote}5) For the mounted dir for container, for app-level information, I would 
prefer to put under ...sys/fs/app/ instead of ...sys/fs/ since we 
want to support per-container information in the future.{quote}

Point 4 is a bit concerning to me.  I am error on side of caution.  Therefore, 
I do not plan to make arbitrary filename change in this JIRA.  Please open a 
separate one, if you feel strongly about supporting custom filename in the rest 
API.

> Create an interface to provide cluster information to application
> -
>
> Key: YARN-8569
> URL: https://issues.apache.org/jira/browse/YARN-8569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8569 YARN sysfs interface to provide cluster 
> information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, 
> YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, 
> YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, 
> YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8827) Plumb per app, per user and per queue resource utilization from the NM to RM

2018-10-09 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644275#comment-16644275
 ] 

Íñigo Goiri commented on YARN-8827:
---

I assume that Yetus will come without the javadoc -1 for  
[^YARN-8827-YARN-1011.07.patch].
 [^YARN-8827-YARN-1011.07.patch] LGTM.
+1 pending the output of Yetus for double checking.

> Plumb per app, per user and per queue resource utilization from the NM to RM
> 
>
> Key: YARN-8827
> URL: https://issues.apache.org/jira/browse/YARN-8827
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Major
> Attachments: YARN-8827-YARN-1011.01.patch, 
> YARN-8827-YARN-1011.02.patch, YARN-8827-YARN-1011.03.patch, 
> YARN-8827-YARN-1011.04.patch, YARN-8827-YARN-1011.05.patch, 
> YARN-8827-YARN-1011.06.patch, YARN-8827-YARN-1011.07.patch
>
>
> Opportunistic Containers for OverAllocation need to be allocated to pending 
> applications in some fair manner. Rather than evaluating queue and user 
> resource usage (allocated resource usage) and comparing against queue and 
> user limits to decide the allocation, it might  make more sense to use a 
> snapshot of actual resource utilization of the queue and user.
> To facilitate this, this JIRA proposes to aggregate per user, per app (and 
> maybe per queue) resource utilization in addition to aggregated Container and 
> Node Utilization and send it along with the NM heartbeat. It should be fairly 
> inexpensive to aggregate - since it can be performed in the same loop of the 
> {{ContainersMonitorImpl}}'s Monitoring thread.
> A snapshot aggregate can be made every couple of seconds in the RM. This 
> instantaneous resource utilization should be used to decide if Opportunistic 
> containers can be allocated to an App, Queue or User.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8864) NM incorrectly logs container user as the user who sent a stop container request in its audit log

2018-10-09 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8864:
-
Description: 
As in  ContainerManagerImpl.java
{code:java}
protected void stopContainerInternal(ContainerId containerID)
  throws YarnException, IOException { 
    ...   
NMAuditLogger.logSuccess(container.getUser(), AuditConstants.STOP_CONTAINER,
   "ContainerManageImpl", 
containerID.getApplicationAttemptId().getApplicationId(), containerID);
}{code}

  was:
As in  ContainerManagerImpl.java

    

protected void stopContainerInternal(ContainerId containerID)
  throws YarnException, IOException {
    ...
 
  NMAuditLogger.logSuccess(container.getUser(),    
    AuditConstants.STOP_CONTAINER, "ContainerManageImpl", containerID
  .getApplicationAttemptId().getApplicationId(), containerID);
    }
  }


> NM incorrectly logs container user as the user who sent a stop container 
> request in its audit log
> -
>
> Key: YARN-8864
> URL: https://issues.apache.org/jira/browse/YARN-8864
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Haibo Chen
>Priority: Major
>
> As in  ContainerManagerImpl.java
> {code:java}
> protected void stopContainerInternal(ContainerId containerID)
>   throws YarnException, IOException { 
>     ...   
> NMAuditLogger.logSuccess(container.getUser(), 
> AuditConstants.STOP_CONTAINER,
>"ContainerManageImpl", 
> containerID.getApplicationAttemptId().getApplicationId(), containerID);
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8864) NM incorrectly logs container user as the user who sent a stop container request in its audit log

2018-10-09 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8864:


 Summary: NM incorrectly logs container user as the user who sent a 
stop container request in its audit log
 Key: YARN-8864
 URL: https://issues.apache.org/jira/browse/YARN-8864
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.2.0
Reporter: Haibo Chen


As in  ContainerManagerImpl.java

    

protected void stopContainerInternal(ContainerId containerID)
  throws YarnException, IOException {
    ...
 
  NMAuditLogger.logSuccess(container.getUser(),    
    AuditConstants.STOP_CONTAINER, "ContainerManageImpl", containerID
  .getApplicationAttemptId().getApplicationId(), containerID);
    }
  }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644263#comment-16644263
 ] 

Hadoop QA commented on YARN-7644:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 24s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 4 new + 118 unchanged - 10 fixed = 122 total (was 128) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  2s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
29s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 72m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-7644 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943131/YARN-7644.006.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 61f37bac224f 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6a39739 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/22123/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/22123/testReport/ |
| Max. process+thread count | 307 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 

[jira] [Created] (YARN-8863) Define yarn node manager local dirs in container-executor.cfg

2018-10-09 Thread Eric Yang (JIRA)
Eric Yang created YARN-8863:
---

 Summary: Define yarn node manager local dirs in 
container-executor.cfg
 Key: YARN-8863
 URL: https://issues.apache.org/jira/browse/YARN-8863
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: security, yarn
Reporter: Eric Yang


The current implementation of container-executor accepts nm-local-dirs and 
nm-log-dirs from cli arguments.  If yarn user is compromised, it is possible 
for rogue yarn user to use container-executor to point nm-local-dirs to user 
home directory to make modification to user owned files.  This JIRA is to 
enhance container-executor.cfg to allow specification of 
yarn.nodemanager.local-dirs to safe guard rogue yarn user from exploiting 
nm-local-dirs paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7225) Add queue and partition info to RM audit log

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644243#comment-16644243
 ] 

Hadoop QA commented on YARN-7225:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m  
0s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.8 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
 4s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} branch-2.8 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 16s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 2 new + 100 unchanged - 1 fixed = 102 total (was 101) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 33s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
24s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}120m 20s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Unreaped Processes | hadoop-yarn-server-resourcemanager:1 |
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ae3769f |
| JIRA Issue | YARN-7225 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943114/YARN-7225.branch-2.8.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 35bc6f3f2bba 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2.8 / 94f4b5b |
| maven | version: Apache Maven 3.0.5 |
| Default Java | 1.7.0_181 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/22121/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| Unreaped Processes Log | 
https://builds.apache.org/job/PreCommit-YARN-Build/22121/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-reaper.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/22121/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test 

[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-10-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644223#comment-16644223
 ] 

Wangda Tan commented on YARN-8569:
--

Discussed with [~eyang]/[~suma.shivaprasad] offline, 

Several comments: 

1) We use use set_user before the method to drop privileges.

2) Root privilege should be only used by open_file_as_nm.

3) We should check all in/out file paths under local dirs defined by 
container_executor.cfg.

4) Regarding to API, I think we should allow filename specified.

5) For the mounted dir for container, for app-level information, I would prefer 
to put under ...sys/fs/app/ instead of ...sys/fs/ since we want to 
support per-container information in the future.

> Create an interface to provide cluster information to application
> -
>
> Key: YARN-8569
> URL: https://issues.apache.org/jira/browse/YARN-8569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8569 YARN sysfs interface to provide cluster 
> information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, 
> YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, 
> YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, 
> YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8813) Improve debug messages for NM preemption of OPPORTUNISTIC containers

2018-10-09 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644221#comment-16644221
 ] 

Robert Kanter commented on YARN-8813:
-

Thanks [~haibochen].  Committed to YARN-1011 branch!

> Improve debug messages for NM preemption of OPPORTUNISTIC containers
> 
>
> Key: YARN-8813
> URL: https://issues.apache.org/jira/browse/YARN-8813
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Fix For: YARN-1011
>
> Attachments: YARN-8813-YARN-1011.00.patch, 
> YARN-8813-YARN-1011.01.patch, YARN-8813-YARN-1011.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8813) Improve debug messages for NM preemption of OPPORTUNISTIC containers

2018-10-09 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644220#comment-16644220
 ] 

Robert Kanter commented on YARN-8813:
-

+1 LGTM

> Improve debug messages for NM preemption of OPPORTUNISTIC containers
> 
>
> Key: YARN-8813
> URL: https://issues.apache.org/jira/browse/YARN-8813
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8813-YARN-1011.00.patch, 
> YARN-8813-YARN-1011.01.patch, YARN-8813-YARN-1011.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-10-09 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644219#comment-16644219
 ] 

Weiwei Yang commented on YARN-8468:
---

Thanks [~leftnoteasy], I over looked that issue in the jenkins report ... 
thanks for triggering that again, where can I find the new report?

> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8468-branch-3.1.018.patch, 
> YARN-8468-branch-3.1.019.patch, YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, 
> YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, 
> YARN-8468.017.patch, YARN-8468.018.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
> The goal of this ticket is to allow this value to be set on a per queue basis.
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
> Suggested solution:
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability(String queueName) in both 
> FSParentQueue and FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * Enforce the use of queue based maximum allocation limit if it is 
> available, if not use the general scheduler level setting
>  ** Use it during validation and normalization of requests in 
> scheduler.allocate, app submit and resource request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-3854) Add localization support for docker images

2018-10-09 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644218#comment-16644218
 ] 

Chandni Singh edited comment on YARN-3854 at 10/9/18 11:13 PM:
---

Discussed the requirements for localization of docker images with [~eyang] 
[~leftnoteasy][~vinodkv] [~shaneku...@gmail.com].
I have attached a document which highlights the requirements, general design 
and use case. 
Please take a look. Any feedback is appreciated. I will start working on these 
requirements soon and create relevant jiras.


was (Author: csingh):
Discussed the requirements for localization of docker images with @eyang @wtan 
[~vinodkv] [~shaneku...@gmail.com].
I have attached a documents which highlights the requirements, general design 
and use case. 
Please take a look and any feedback is appreciated. I will start working on 
these requirements soon.

> Add localization support for docker images
> --
>
> Key: YARN-3854
> URL: https://issues.apache.org/jira/browse/YARN-3854
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: Localization Support For Docker Images.pdf, 
> YARN-3854-branch-2.8.001.patch, 
> YARN-3854_Localization_support_for_Docker_image_v1.pdf, 
> YARN-3854_Localization_support_for_Docker_image_v2.pdf, 
> YARN-3854_Localization_support_for_Docker_image_v3.pdf
>
>
> We need the ability to localize docker images when those images aren't 
> already available locally. There are various approaches that could be used 
> here with different trade-offs/issues : image archives on HDFS + docker load 
> ,  docker pull during the localization phase or (automatic) docker pull 
> during the run/launch phase. 
> We also need the ability to clean-up old/stale, unused images. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3854) Add localization support for docker images

2018-10-09 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644218#comment-16644218
 ] 

Chandni Singh commented on YARN-3854:
-

Discussed the requirements for localization of docker images with @eyang @wtan 
[~vinodkv] [~shaneku...@gmail.com].
I have attached a documents which highlights the requirements, general design 
and use case. 
Please take a look and any feedback is appreciated. I will start working on 
these requirements soon.

> Add localization support for docker images
> --
>
> Key: YARN-3854
> URL: https://issues.apache.org/jira/browse/YARN-3854
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: Localization Support For Docker Images.pdf, 
> YARN-3854-branch-2.8.001.patch, 
> YARN-3854_Localization_support_for_Docker_image_v1.pdf, 
> YARN-3854_Localization_support_for_Docker_image_v2.pdf, 
> YARN-3854_Localization_support_for_Docker_image_v3.pdf
>
>
> We need the ability to localize docker images when those images aren't 
> already available locally. There are various approaches that could be used 
> here with different trade-offs/issues : image archives on HDFS + docker load 
> ,  docker pull during the localization phase or (automatic) docker pull 
> during the run/launch phase. 
> We also need the ability to clean-up old/stale, unused images. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644213#comment-16644213
 ] 

Hadoop QA commented on YARN-8858:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 31s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 5 new + 106 unchanged - 0 fixed = 111 total (was 106) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}100m  9s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}149m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler |
|   | hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8858 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943105/YARN-8858.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bc461f70357c 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bf04f19 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/22118/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 

[jira] [Updated] (YARN-3854) Add localization support for docker images

2018-10-09 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-3854:

Attachment: Localization Support For Docker Images.pdf

> Add localization support for docker images
> --
>
> Key: YARN-3854
> URL: https://issues.apache.org/jira/browse/YARN-3854
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: Localization Support For Docker Images.pdf, 
> YARN-3854-branch-2.8.001.patch, 
> YARN-3854_Localization_support_for_Docker_image_v1.pdf, 
> YARN-3854_Localization_support_for_Docker_image_v2.pdf, 
> YARN-3854_Localization_support_for_Docker_image_v3.pdf
>
>
> We need the ability to localize docker images when those images aren't 
> already available locally. There are various approaches that could be used 
> here with different trade-offs/issues : image archives on HDFS + docker load 
> ,  docker pull during the localization phase or (automatic) docker pull 
> during the run/launch phase. 
> We also need the ability to clean-up old/stale, unused images. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8842) Update QueueMetrics with custom resource values

2018-10-09 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8842:
-
Attachment: YARN-8842.006.patch

> Update QueueMetrics with custom resource values 
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8842.001.patch, YARN-8842.002.patch, 
> YARN-8842.003.patch, YARN-8842.004.patch, YARN-8842.005.patch, 
> YARN-8842.006.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8842) Update QueueMetrics with custom resource values

2018-10-09 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644207#comment-16644207
 ] 

Szilard Nemeth commented on YARN-8842:
--

According to our offline discussion with [~haibochen], here are the changes 
incorporated with the latest patch (patch006).
As I applied all the changes listed here as separate steps and ran the tests 
with every change, an unwanted code change is quite unlikely as the test were 
always passed.

1. Unified resource decrease methods into one single method, so that: 
withContainersToDecrease, withVCoresToDecrease, withMemoryMBToDecrease, 
withCustomResToDecrease became withResourceToDecrease. This way, testcases are 
more concise and explicit.
2. Same as above, but for the declaration of resources and not decrease of 
resources.
2. Changed the QueueHierarchy class: instead of representing the hierarchy as a 
list of queue names, I created a Queue class that holds its name and a 
reference to its child queue. The validation code is also changed, an on-demand 
validation of queue names does happen when adding a child queue to the 
hierarchy.
3. The common parts of initializing QueueMetricsTestcase was extracted to a 
method.
4. Fixed test methods that only tested leaf queue metrics: now these methods 
get all the queue metrics and check the values on all queue levels. Example: 
testUpdatePreemptedSeconds previously only checked mqs.getLeafQueueSource, now 
it gets all the queue metrics sources and checks them.
5. Created separate classes for every type of custom resource: preempted 
seconds, allocated, available, pending, reserved.
Kept QueueMetricsForCustomResources class as a wrapper of all types, so that 
QueueMetrics should only refer to this class and it will delegate all 
alterations to the appropriate resource metric type objects.
Having an abstract class as a parent for the metric type classes named 
QueueMetricsCustomResourcesAbstract, I could keep the computation methods in 
one single place.

6. Added the performance optimization as [~wilfreds] suggested: Custom resource 
metric update only happen if there is any custom resource defined.

Please note that: 
1. The testcases in QueueMetricsForCustomResources would be more directly 
readable if the testXXX methods in QueueMetricsTestcase were moved to there. 
The reason why I don't do that is that QueueMetricsTestcase is intended to be a 
self-contained class that holds the data and the logic as well for all types of 
metrics testcases. Ideally, QueueMetricsTestcase could be used even from 
TestQueueMetrics, but I did not want to squeeze way too many changes into this 
patch.

2. I think it's not a right decision to move all metrics for cores / memory to 
QueueMetricsForCustomResources, as the former are annotated Metrics, the latter 
are not real metrics, but values stored in a map for every resource. 
All in all, I like to idea to have all resource metrics in one class per type, 
but I would do that in a follow-up jira. One more thing to keep in mind is that 
AFAIK, the metric subsytem from Hadoop-common is not ready for using metrics 
from dynamic fields (e.g. Maps) so this will be quite tricky to implement.

> Update QueueMetrics with custom resource values 
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8842.001.patch, YARN-8842.002.patch, 
> YARN-8842.003.patch, YARN-8842.004.patch, YARN-8842.005.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8862) [GPG] add Yarn Registry cleanup in ApplicationCleaner

2018-10-09 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8862:
---
Attachment: YARN-8862-YARN-7402.v1.patch

> [GPG] add Yarn Registry cleanup in ApplicationCleaner
> -
>
> Key: YARN-8862
> URL: https://issues.apache.org/jira/browse/YARN-8862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8862-YARN-7402.v1.patch
>
>
> In Yarn Federation, we use Yarn Registry to use the AMToken for UAMs in 
> secondary sub-clusters. Because of potential more app attempts later, 
> AMRMProxy cannot kill the UAM and delete the tokens when one local attempt 
> finishes. So similar to the StateStore application table, we need 
> ApplicationCleaner in GPG to cleanup the finished app entries in Yarn 
> Registry. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8862) [GPG] add Yarn Registry cleanup in ApplicationCleaner

2018-10-09 Thread Botong Huang (JIRA)
Botong Huang created YARN-8862:
--

 Summary: [GPG] add Yarn Registry cleanup in ApplicationCleaner
 Key: YARN-8862
 URL: https://issues.apache.org/jira/browse/YARN-8862
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Botong Huang
Assignee: Botong Huang


In Yarn Federation, we use Yarn Registry to use the AMToken for UAMs in 
secondary sub-clusters. Because of potential more app attempts later, AMRMProxy 
cannot kill the UAM and delete the tokens when one local attempt finishes. So 
similar to the StateStore application table, we need ApplicationCleaner in GPG 
to cleanup the finished app entries in Yarn Registry. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8862) [GPG] add Yarn Registry cleanup in ApplicationCleaner

2018-10-09 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8862:
---
Issue Type: Sub-task  (was: Task)
Parent: YARN-7402

> [GPG] add Yarn Registry cleanup in ApplicationCleaner
> -
>
> Key: YARN-8862
> URL: https://issues.apache.org/jira/browse/YARN-8862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
>
> In Yarn Federation, we use Yarn Registry to use the AMToken for UAMs in 
> secondary sub-clusters. Because of potential more app attempts later, 
> AMRMProxy cannot kill the UAM and delete the tokens when one local attempt 
> finishes. So similar to the StateStore application table, we need 
> ApplicationCleaner in GPG to cleanup the finished app entries in Yarn 
> Registry. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers

2018-10-09 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644179#comment-16644179
 ] 

Chandni Singh commented on YARN-7644:
-

Introduced a checkstyle warning in patch 5. Fixed it in patch 6.

> NM gets backed up deleting docker containers
> 
>
> Key: YARN-7644
> URL: https://issues.apache.org/jira/browse/YARN-7644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Eric Badger
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7644.001.patch, YARN-7644.002.patch, 
> YARN-7644.003.patch, YARN-7644.004.patch, YARN-7644.005.patch, 
> YARN-7644.006.patch
>
>
> We are sending a {{docker stop}} to the docker container with a timeout of 10 
> seconds when we shut down a container. If the container does not stop after 
> 10 seconds then we force kill it. However, the {{docker stop}} command is a 
> blocking call. So in cases where lots of containers don't go down with the 
> initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to 
> return. This ties up the ContainerLaunch handler and so these kill events 
> back up. It also appears to be backing up new container launches as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7644) NM gets backed up deleting docker containers

2018-10-09 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-7644:

Attachment: YARN-7644.006.patch

> NM gets backed up deleting docker containers
> 
>
> Key: YARN-7644
> URL: https://issues.apache.org/jira/browse/YARN-7644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Eric Badger
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7644.001.patch, YARN-7644.002.patch, 
> YARN-7644.003.patch, YARN-7644.004.patch, YARN-7644.005.patch, 
> YARN-7644.006.patch
>
>
> We are sending a {{docker stop}} to the docker container with a timeout of 10 
> seconds when we shut down a container. If the container does not stop after 
> 10 seconds then we force kill it. However, the {{docker stop}} command is a 
> blocking call. So in cases where lots of containers don't go down with the 
> initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to 
> return. This ties up the ContainerLaunch handler and so these kill events 
> back up. It also appears to be backing up new container launches as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8813) Improve debug messages for NM preemption of OPPORTUNISTIC containers

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644178#comment-16644178
 ] 

Hadoop QA commented on YARN-8813:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} YARN-1011 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
45s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
21s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
6s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} YARN-1011 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 53s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 41s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 78m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManager |
|   | hadoop.yarn.server.nodemanager.containermanager.TestNMProxy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8813 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943110/YARN-8813-YARN-1011.02.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 44913f15179c 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | YARN-1011 / efd8524 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/22120/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 

[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644174#comment-16644174
 ] 

Hadoop QA commented on YARN-7644:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 26s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 5 new + 119 unchanged - 10 fixed = 124 total (was 129) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 57s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
57s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m 13s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-7644 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943109/YARN-7644.005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d30456ce4b86 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 5b7ba48 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/22119/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/22119/testReport/ |
| Max. process+thread count | 340 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 

[jira] [Updated] (YARN-8827) Plumb per app, per user and per queue resource utilization from the NM to RM

2018-10-09 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-8827:
--
Attachment: YARN-8827-YARN-1011.07.patch

> Plumb per app, per user and per queue resource utilization from the NM to RM
> 
>
> Key: YARN-8827
> URL: https://issues.apache.org/jira/browse/YARN-8827
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Major
> Attachments: YARN-8827-YARN-1011.01.patch, 
> YARN-8827-YARN-1011.02.patch, YARN-8827-YARN-1011.03.patch, 
> YARN-8827-YARN-1011.04.patch, YARN-8827-YARN-1011.05.patch, 
> YARN-8827-YARN-1011.06.patch, YARN-8827-YARN-1011.07.patch
>
>
> Opportunistic Containers for OverAllocation need to be allocated to pending 
> applications in some fair manner. Rather than evaluating queue and user 
> resource usage (allocated resource usage) and comparing against queue and 
> user limits to decide the allocation, it might  make more sense to use a 
> snapshot of actual resource utilization of the queue and user.
> To facilitate this, this JIRA proposes to aggregate per user, per app (and 
> maybe per queue) resource utilization in addition to aggregated Container and 
> Node Utilization and send it along with the NM heartbeat. It should be fairly 
> inexpensive to aggregate - since it can be performed in the same loop of the 
> {{ContainersMonitorImpl}}'s Monitoring thread.
> A snapshot aggregate can be made every couple of seconds in the RM. This 
> instantaneous resource utilization should be used to decide if Opportunistic 
> containers can be allocated to an App, Queue or User.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8807) FairScheduler crashes RM with oversubscription turned on if an application is killed.

2018-10-09 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644132#comment-16644132
 ] 

Haibo Chen commented on YARN-8807:
--

Thanks [~rkanter], and [~zsiegl] for the review!

> FairScheduler crashes RM with oversubscription turned on if an application is 
> killed.
> -
>
> Key: YARN-8807
> URL: https://issues.apache.org/jira/browse/YARN-8807
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler, resourcemanager
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Fix For: YARN-1011
>
> Attachments: YARN-8807-YARN-1011.00.patch, 
> YARN-8807-YARN-1011.01.patch
>
>
> When an application, that has got opportunistic containers allocated, is 
> killed, its containers are not released immediately.
> Fair scheduler would therefore continue to try to promote such orphaned 
> containers, which results in NPE.
> {code:java}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptToAssignReservedResourcesOrPromoteOpportunisticContainers(FairScheduler.java:1158)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1129)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:1001)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1275)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testKillingApplicationWithOpportunisticContainersAssigned(TestFairScheduler.java:4019){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers

2018-10-09 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644125#comment-16644125
 ] 

Jason Lowe commented on YARN-7644:
--

Thanks for updating the patch!  +1 for patch 5 pending Jenkins.

> NM gets backed up deleting docker containers
> 
>
> Key: YARN-7644
> URL: https://issues.apache.org/jira/browse/YARN-7644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Eric Badger
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7644.001.patch, YARN-7644.002.patch, 
> YARN-7644.003.patch, YARN-7644.004.patch, YARN-7644.005.patch
>
>
> We are sending a {{docker stop}} to the docker container with a timeout of 10 
> seconds when we shut down a container. If the container does not stop after 
> 10 seconds then we force kill it. However, the {{docker stop}} command is a 
> blocking call. So in cases where lots of containers don't go down with the 
> initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to 
> return. This ties up the ContainerLaunch handler and so these kill events 
> back up. It also appears to be backing up new container launches as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7225) Add queue and partition info to RM audit log

2018-10-09 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7225:
-
Attachment: YARN-7225.branch-2.8.001.patch

> Add queue and partition info to RM audit log
> 
>
> Key: YARN-7225
> URL: https://issues.apache.org/jira/browse/YARN-7225
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.9.1, 2.8.4, 3.0.2, 3.1.1
>Reporter: Jonathan Hung
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7225.001.patch, YARN-7225.002.patch, 
> YARN-7225.003.patch, YARN-7225.branch-2.8.001.patch
>
>
> Right now RM audit log has fields such as user, ip, resource, etc. Having 
> queue and partition  is useful for resource tracking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8827) Plumb per app, per user and per queue resource utilization from the NM to RM

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644106#comment-16644106
 ] 

Hadoop QA commented on YARN-8827:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-1011 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
41s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
15s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
21s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
55s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
45s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
49s{color} | {color:green} YARN-1011 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 15m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
32s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
34s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api 
generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
49s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
26s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 12s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 71m  
7s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 25s{color} 
| {color:red} hadoop-sls in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}220m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestNMProxy |
|   | hadoop.yarn.sls.TestSLSStreamAMSynth |
\\
\\
|| Subsystem || 

[jira] [Commented] (YARN-8813) Improve debug messages for NM preemption of OPPORTUNISTIC containers

2018-10-09 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644096#comment-16644096
 ] 

Haibo Chen commented on YARN-8813:
--

That makes sense to me now. I have updated the patch as such.

> Improve debug messages for NM preemption of OPPORTUNISTIC containers
> 
>
> Key: YARN-8813
> URL: https://issues.apache.org/jira/browse/YARN-8813
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8813-YARN-1011.00.patch, 
> YARN-8813-YARN-1011.01.patch, YARN-8813-YARN-1011.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8813) Improve debug messages for NM preemption of OPPORTUNISTIC containers

2018-10-09 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8813:
-
Attachment: YARN-8813-YARN-1011.02.patch

> Improve debug messages for NM preemption of OPPORTUNISTIC containers
> 
>
> Key: YARN-8813
> URL: https://issues.apache.org/jira/browse/YARN-8813
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8813-YARN-1011.00.patch, 
> YARN-8813-YARN-1011.01.patch, YARN-8813-YARN-1011.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8807) FairScheduler crashes RM with oversubscription turned on if an application is killed.

2018-10-09 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644094#comment-16644094
 ] 

Robert Kanter commented on YARN-8807:
-

+1 LGTM

> FairScheduler crashes RM with oversubscription turned on if an application is 
> killed.
> -
>
> Key: YARN-8807
> URL: https://issues.apache.org/jira/browse/YARN-8807
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler, resourcemanager
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8807-YARN-1011.00.patch, 
> YARN-8807-YARN-1011.01.patch
>
>
> When an application, that has got opportunistic containers allocated, is 
> killed, its containers are not released immediately.
> Fair scheduler would therefore continue to try to promote such orphaned 
> containers, which results in NPE.
> {code:java}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptToAssignReservedResourcesOrPromoteOpportunisticContainers(FairScheduler.java:1158)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1129)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:1001)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1275)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testKillingApplicationWithOpportunisticContainersAssigned(TestFairScheduler.java:4019){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8710) Service AM should set a finite limit on NM container max retries

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644090#comment-16644090
 ] 

Hadoop QA commented on YARN-8710:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
29s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8710 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943093/YARN-8710.2.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bdc77b7f17af 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bf04f19 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/22116/testReport/ |
| Max. process+thread count | 770 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/22116/console |
| Powered 

[jira] [Commented] (YARN-8813) Improve debug messages for NM preemption of OPPORTUNISTIC containers

2018-10-09 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644091#comment-16644091
 ] 

Robert Kanter commented on YARN-8813:
-

It looks like the {{LOG}} is missing the {{final}} keyword.  +1 after that

> Improve debug messages for NM preemption of OPPORTUNISTIC containers
> 
>
> Key: YARN-8813
> URL: https://issues.apache.org/jira/browse/YARN-8813
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8813-YARN-1011.00.patch, 
> YARN-8813-YARN-1011.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers

2018-10-09 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644085#comment-16644085
 ] 

Chandni Singh commented on YARN-7644:
-

Made {{reapContainer}} package-private as well in Patch 5.

> NM gets backed up deleting docker containers
> 
>
> Key: YARN-7644
> URL: https://issues.apache.org/jira/browse/YARN-7644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Eric Badger
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7644.001.patch, YARN-7644.002.patch, 
> YARN-7644.003.patch, YARN-7644.004.patch, YARN-7644.005.patch
>
>
> We are sending a {{docker stop}} to the docker container with a timeout of 10 
> seconds when we shut down a container. If the container does not stop after 
> 10 seconds then we force kill it. However, the {{docker stop}} command is a 
> blocking call. So in cases where lots of containers don't go down with the 
> initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to 
> return. This ties up the ContainerLaunch handler and so these kill events 
> back up. It also appears to be backing up new container launches as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7644) NM gets backed up deleting docker containers

2018-10-09 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-7644:

Attachment: YARN-7644.005.patch

> NM gets backed up deleting docker containers
> 
>
> Key: YARN-7644
> URL: https://issues.apache.org/jira/browse/YARN-7644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Eric Badger
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7644.001.patch, YARN-7644.002.patch, 
> YARN-7644.003.patch, YARN-7644.004.patch, YARN-7644.005.patch
>
>
> We are sending a {{docker stop}} to the docker container with a timeout of 10 
> seconds when we shut down a container. If the container does not stop after 
> 10 seconds then we force kill it. However, the {{docker stop}} command is a 
> blocking call. So in cases where lots of containers don't go down with the 
> initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to 
> return. This ties up the ContainerLaunch handler and so these kill events 
> back up. It also appears to be backing up new container launches as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers

2018-10-09 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644080#comment-16644080
 ] 

Chandni Singh commented on YARN-7644:
-

Addressed the review comments in Patch 4. 

> NM gets backed up deleting docker containers
> 
>
> Key: YARN-7644
> URL: https://issues.apache.org/jira/browse/YARN-7644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Eric Badger
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7644.001.patch, YARN-7644.002.patch, 
> YARN-7644.003.patch, YARN-7644.004.patch
>
>
> We are sending a {{docker stop}} to the docker container with a timeout of 10 
> seconds when we shut down a container. If the container does not stop after 
> 10 seconds then we force kill it. However, the {{docker stop}} command is a 
> blocking call. So in cases where lots of containers don't go down with the 
> initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to 
> return. This ties up the ContainerLaunch handler and so these kill events 
> back up. It also appears to be backing up new container launches as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7644) NM gets backed up deleting docker containers

2018-10-09 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-7644:

Attachment: YARN-7644.004.patch

> NM gets backed up deleting docker containers
> 
>
> Key: YARN-7644
> URL: https://issues.apache.org/jira/browse/YARN-7644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Eric Badger
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7644.001.patch, YARN-7644.002.patch, 
> YARN-7644.003.patch, YARN-7644.004.patch
>
>
> We are sending a {{docker stop}} to the docker container with a timeout of 10 
> seconds when we shut down a container. If the container does not stop after 
> 10 seconds then we force kill it. However, the {{docker stop}} command is a 
> blocking call. So in cases where lots of containers don't go down with the 
> initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to 
> return. This ties up the ContainerLaunch handler and so these kill events 
> back up. It also appears to be backing up new container launches as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-10-09 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644071#comment-16644071
 ] 

Suma Shivaprasad commented on YARN-8569:


[~eyang] Thanks for updating the patch. +1

> Create an interface to provide cluster information to application
> -
>
> Key: YARN-8569
> URL: https://issues.apache.org/jira/browse/YARN-8569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8569 YARN sysfs interface to provide cluster 
> information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, 
> YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, 
> YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch, 
> YARN-8569.009.patch, YARN-8569.010.patch, YARN-8569.011.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8845) hadoop.registry.rm.enabled is not used

2018-10-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644063#comment-16644063
 ] 

Hudson commented on YARN-8845:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15156 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15156/])
YARN-8845.  Removed unused hadoop.registry.rm reference. (eyang: 
rev bf04f194568f9e81f5481b25a84ad903e3c307cf)
* (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/registry-configuration.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/java/org/apache/hadoop/registry/client/api/RegistryConstants.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/yarn-registry.md


> hadoop.registry.rm.enabled is not used
> --
>
> Key: YARN-8845
> URL: https://issues.apache.org/jira/browse/YARN-8845
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8845.000.patch, YARN-8845.001.patch
>
>
> YARN-2652 introduced "hadoop.registry.rm.enabled" as YARN-2571 was supposed 
> to initialize the registry but that's now gone. We should remove all the 
> references to this configuration key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8448) AM HTTPS Support

2018-10-09 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644059#comment-16644059
 ] 

Robert Kanter commented on YARN-8448:
-

I discussed this with [~haibochen] offline and we agreed that it's fine how it 
is now, but we should rename the values because REQUIRED is confusing.  I've 
thought a bit about names, and how about: NONE, LENIENT, and STRICT.

> AM HTTPS Support
> 
>
> Key: YARN-8448
> URL: https://issues.apache.org/jira/browse/YARN-8448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Major
> Attachments: YARN-8448.001.patch, YARN-8448.002.patch, 
> YARN-8448.003.patch, YARN-8448.004.patch, YARN-8448.005.patch, 
> YARN-8448.006.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8852) Add documentation for submarine installation details

2018-10-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643766#comment-16643766
 ] 

Wangda Tan edited comment on YARN-8852 at 10/9/18 8:38 PM:
---

Committed to trunk/branch-3.2, thanks [~yuan_zac] and [~liuxun323].


was (Author: leftnoteasy):
Committed to trunk/branch-3.2, thanks [~yuan_zac].

> Add documentation for submarine installation details
> 
>
> Key: YARN-8852
> URL: https://issues.apache.org/jira/browse/YARN-8852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, submarine
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8852.001.patch, YARN-8852.002.patch, 
> YARN-8852.003.patch
>
>
> To help the beginners to install and use the submarine, A detailed guide is 
> needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8858:
-
Attachment: YARN-8858.002.patch

> CapacityScheduler should respect maximum node resource when per-queue 
> maximum-allocation is being used.
> ---
>
> Key: YARN-8858
> URL: https://issues.apache.org/jira/browse/YARN-8858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8858.001.patch, YARN-8858.002.patch
>
>
> This issue happens after YARN-8720.
> Before that, AMS uses scheduler.getMaximumAllocation to do the normalization. 
> After that, AMS uses LeafQueue.getMaximumAllocation. The scheduler one uses 
> nodeTracker.getMaximumAllocation, but the LeafQueue.getMaximum doesn't. 
> We should use the scheduler.getMaximumAllocation to cap the per-queue's 
> maximum-allocation every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8858) CapacityScheduler should respect maximum node resource when per-queue maximum-allocation is being used.

2018-10-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644058#comment-16644058
 ] 

Wangda Tan commented on YARN-8858:
--

Attached ver.2 patch, addressed all comments.

> CapacityScheduler should respect maximum node resource when per-queue 
> maximum-allocation is being used.
> ---
>
> Key: YARN-8858
> URL: https://issues.apache.org/jira/browse/YARN-8858
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8858.001.patch, YARN-8858.002.patch
>
>
> This issue happens after YARN-8720.
> Before that, AMS uses scheduler.getMaximumAllocation to do the normalization. 
> After that, AMS uses LeafQueue.getMaximumAllocation. The scheduler one uses 
> nodeTracker.getMaximumAllocation, but the LeafQueue.getMaximum doesn't. 
> We should use the scheduler.getMaximumAllocation to cap the per-queue's 
> maximum-allocation every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers

2018-10-09 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644044#comment-16644044
 ] 

Jason Lowe commented on YARN-7644:
--

Thanks for updating the patch!

compareAndSetAlreadyLaunched is too explicit -- it essentially exposes the 
AtomicBoolean directly which defeats the point of encapsulation.  Something 
like setLaunched() or markLaunched() which returns false if it was already 
launched would be easier to read and also hide the fact that there's a CAS 
operation on an AtomicBoolean underneath.  In practice the boolean only goes 
one direction, so no need to expose it completely.

EXIT_CODE_FILE_SUFFIX should be package-private instead of protected.  
getContainerPid method also only needs to be package-private.

Nit: ContainerCleanup should cache the pid file path in a local rather than 
always calling the accessor method.

setPidFilePath added but never called.


> NM gets backed up deleting docker containers
> 
>
> Key: YARN-7644
> URL: https://issues.apache.org/jira/browse/YARN-7644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Eric Badger
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7644.001.patch, YARN-7644.002.patch, 
> YARN-7644.003.patch
>
>
> We are sending a {{docker stop}} to the docker container with a timeout of 10 
> seconds when we shut down a container. If the container does not stop after 
> 10 seconds then we force kill it. However, the {{docker stop}} command is a 
> blocking call. So in cases where lots of containers don't go down with the 
> initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to 
> return. This ties up the ContainerLaunch handler and so these kill events 
> back up. It also appears to be backing up new container launches as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-10-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644040#comment-16644040
 ] 

Wangda Tan commented on YARN-8468:
--

[~bsteinbach], 

It could be caused by some test issue: 
{code:java}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
project hadoop-yarn-server-resourcemanager: There was a timeout or other error 
in the fork -> [Help 1]
{code}
I just retriggered Jenkins to run your patch.

> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8468-branch-3.1.018.patch, 
> YARN-8468-branch-3.1.019.patch, YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, 
> YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, 
> YARN-8468.017.patch, YARN-8468.018.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
> The goal of this ticket is to allow this value to be set on a per queue basis.
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
> Suggested solution:
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability(String queueName) in both 
> FSParentQueue and FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * Enforce the use of queue based maximum allocation limit if it is 
> available, if not use the general scheduler level setting
>  ** Use it during validation and normalization of requests in 
> scheduler.allocate, app submit and resource request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8845) hadoop.registry.rm.enabled is not used

2018-10-09 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644022#comment-16644022
 ] 

Íñigo Goiri commented on YARN-8845:
---

Thanks [~eyang], I'll rebase HADOOP-15821.

> hadoop.registry.rm.enabled is not used
> --
>
> Key: YARN-8845
> URL: https://issues.apache.org/jira/browse/YARN-8845
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8845.000.patch, YARN-8845.001.patch
>
>
> YARN-2652 introduced "hadoop.registry.rm.enabled" as YARN-2571 was supposed 
> to initialize the registry but that's now gone. We should remove all the 
> references to this configuration key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8845) hadoop.registry.rm.enabled is not used

2018-10-09 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644014#comment-16644014
 ] 

Eric Yang commented on YARN-8845:
-

+1 on patch 001.

> hadoop.registry.rm.enabled is not used
> --
>
> Key: YARN-8845
> URL: https://issues.apache.org/jira/browse/YARN-8845
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: YARN-8845.000.patch, YARN-8845.001.patch
>
>
> YARN-2652 introduced "hadoop.registry.rm.enabled" as YARN-2571 was supposed 
> to initialize the registry but that's now gone. We should remove all the 
> references to this configuration key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8827) Plumb per app, per user and per queue resource utilization from the NM to RM

2018-10-09 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643991#comment-16643991
 ] 

Arun Suresh commented on YARN-8827:
---

The {{TestCapacityOverTimePolicy}} failure is unrelated and {{TestNMProxy}} 
should be fixed with a rebase.
[~elgoiri], good to go ?

> Plumb per app, per user and per queue resource utilization from the NM to RM
> 
>
> Key: YARN-8827
> URL: https://issues.apache.org/jira/browse/YARN-8827
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Major
> Attachments: YARN-8827-YARN-1011.01.patch, 
> YARN-8827-YARN-1011.02.patch, YARN-8827-YARN-1011.03.patch, 
> YARN-8827-YARN-1011.04.patch, YARN-8827-YARN-1011.05.patch, 
> YARN-8827-YARN-1011.06.patch
>
>
> Opportunistic Containers for OverAllocation need to be allocated to pending 
> applications in some fair manner. Rather than evaluating queue and user 
> resource usage (allocated resource usage) and comparing against queue and 
> user limits to decide the allocation, it might  make more sense to use a 
> snapshot of actual resource utilization of the queue and user.
> To facilitate this, this JIRA proposes to aggregate per user, per app (and 
> maybe per queue) resource utilization in addition to aggregated Container and 
> Node Utilization and send it along with the NM heartbeat. It should be fairly 
> inexpensive to aggregate - since it can be performed in the same loop of the 
> {{ContainersMonitorImpl}}'s Monitoring thread.
> A snapshot aggregate can be made every couple of seconds in the RM. This 
> instantaneous resource utilization should be used to decide if Opportunistic 
> containers can be allocated to an App, Queue or User.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8827) Plumb per app, per user and per queue resource utilization from the NM to RM

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643985#comment-16643985
 ] 

Hadoop QA commented on YARN-8827:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-1011 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  5m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
37s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
43s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
40s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
10s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
24s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
34s{color} | {color:green} YARN-1011 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 13m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
45s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m  7s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 55s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
26s{color} | {color:green} hadoop-sls in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}229m 10s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestNMProxy |
|   | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce 

[jira] [Commented] (YARN-8710) Service AM should set a finite limit on NM container max retries

2018-10-09 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643975#comment-16643975
 ] 

Suma Shivaprasad commented on YARN-8710:


Thanks [~billie.rinaldi] Have updated the patch to set to 10 max retries with 
failure interval of 10 min.

> Service AM should set a finite limit on NM container max retries 
> -
>
> Key: YARN-8710
> URL: https://issues.apache.org/jira/browse/YARN-8710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8710.1.patch, YARN-8710.2.patch
>
>
> Container retries are currently set to a default of -1 in 
> AbstractProviderService.buildContainerRetry. If this is not overridden via 
> service spec with a finite value for yarn.service.container-failure.retry.max 
> , this causes infinite NM reties for the container for ALWAYS/ON_FAILURE 
> restart policy . Ideally it should try a finite number of time on the same NM 
> and subsequently Service AM can retry on another node.
> We can set this to default value of 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8710) Service AM should set a finite limit on NM container max retries

2018-10-09 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8710:
---
Attachment: YARN-8710.2.patch

> Service AM should set a finite limit on NM container max retries 
> -
>
> Key: YARN-8710
> URL: https://issues.apache.org/jira/browse/YARN-8710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8710.1.patch, YARN-8710.2.patch
>
>
> Container retries are currently set to a default of -1 in 
> AbstractProviderService.buildContainerRetry. If this is not overridden via 
> service spec with a finite value for yarn.service.container-failure.retry.max 
> , this causes infinite NM reties for the container for ALWAYS/ON_FAILURE 
> restart policy . Ideally it should try a finite number of time on the same NM 
> and subsequently Service AM can retry on another node.
> We can set this to default value of 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643972#comment-16643972
 ] 

Hadoop QA commented on YARN-8569:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
47s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  7m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
27s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 148 unchanged - 1 fixed = 148 total (was 149) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
49s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
45s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
39s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
42s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
57s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} 

[jira] [Updated] (YARN-8710) Service AM should set a finite limit on NM container max retries

2018-10-09 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8710:
---
Attachment: (was: YARN-8710.2.patch)

> Service AM should set a finite limit on NM container max retries 
> -
>
> Key: YARN-8710
> URL: https://issues.apache.org/jira/browse/YARN-8710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8710.1.patch
>
>
> Container retries are currently set to a default of -1 in 
> AbstractProviderService.buildContainerRetry. If this is not overridden via 
> service spec with a finite value for yarn.service.container-failure.retry.max 
> , this causes infinite NM reties for the container for ALWAYS/ON_FAILURE 
> restart policy . Ideally it should try a finite number of time on the same NM 
> and subsequently Service AM can retry on another node.
> We can set this to default value of 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8710) Service AM should set a finite limit on NM container max retries

2018-10-09 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8710:
---
Attachment: YARN-8710.2.patch

> Service AM should set a finite limit on NM container max retries 
> -
>
> Key: YARN-8710
> URL: https://issues.apache.org/jira/browse/YARN-8710
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8710.1.patch
>
>
> Container retries are currently set to a default of -1 in 
> AbstractProviderService.buildContainerRetry. If this is not overridden via 
> service spec with a finite value for yarn.service.container-failure.retry.max 
> , this causes infinite NM reties for the container for ALWAYS/ON_FAILURE 
> restart policy . Ideally it should try a finite number of time on the same NM 
> and subsequently Service AM can retry on another node.
> We can set this to default value of 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml

2018-10-09 Thread Subru Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643949#comment-16643949
 ] 

Subru Krishnan commented on YARN-7592:
--

I want to make sure I fully understand the proposal - we will revert the 
changes in RMProxy and create the FederationClientRMProxy}} (I feel 
we can skip custom) directly if *yarn.federation.enabled* is set? }}

I like the idea, can you ensure couple of things:
 * This works with both HA enabled or not (for NM, router and AMRMProxy).
 * Assuming above is true, can we remove *yarn.federation.failover.enabled* 
flag completely?

 

Thanks for working on this!

 

> yarn.federation.failover.enabled missing in yarn-default.xml
> 
>
> Key: YARN-7592
> URL: https://issues.apache.org/jira/browse/YARN-7592
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.0.0-beta1
>Reporter: Gera Shegalov
>Priority: Major
> Attachments: IssueReproduce.patch
>
>
> yarn.federation.failover.enabled should be documented in yarn-default.xml. I 
> am also not sure why it should be true by default and force the HA retry 
> policy in {{RMProxy#createRMProxy}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers

2018-10-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643942#comment-16643942
 ] 

Hadoop QA commented on YARN-7644:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 24s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 5 new + 118 unchanged - 10 fixed = 123 total (was 128) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
50s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-7644 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12943076/YARN-7644.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1a9f38d626b9 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c3d22d3 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/22115/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/22115/testReport/ |
| Max. process+thread count | 415 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 

[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-10-09 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643931#comment-16643931
 ] 

Antal Bálint Steinbach commented on YARN-8468:
--

Thanks a lot [~cheersyang] . What about the unit test failure? There were no 
unit tests failing but it gives -1. Is this a common thing? :)

> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8468-branch-3.1.018.patch, 
> YARN-8468-branch-3.1.019.patch, YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, 
> YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, 
> YARN-8468.017.patch, YARN-8468.018.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
> The goal of this ticket is to allow this value to be set on a per queue basis.
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
> Suggested solution:
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability(String queueName) in both 
> FSParentQueue and FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * Enforce the use of queue based maximum allocation limit if it is 
> available, if not use the general scheduler level setting
>  ** Use it during validation and normalization of requests in 
> scheduler.allocate, app submit and resource request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart

2018-10-09 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8753:
-
Attachment: YARN-8753.002.patch

> [UI2] Lost nodes representation missing from Nodemanagers Chart
> ---
>
> Key: YARN-8753
> URL: https://issues.apache.org/jira/browse/YARN-8753
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot 
> 2018-09-06 at 6.16.14 PM.png, Screen Shot 2018-09-07 at 11.59.02 AM.png, 
> YARN-8753.001.patch, YARN-8753.002.patch
>
>
> Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status 
> page. 
> This chart does not show nodemanagers if they are LOST. 
> Due to this issue, Node information page and Node status page shows different 
> node managers count. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7652) Handle AM register requests asynchronously in FederationInterceptor

2018-10-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643901#comment-16643901
 ] 

Hudson commented on YARN-7652:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15152 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15152/])
YARN-7652. Handle AM register requests asynchronously in (inigoiri: rev 
c3d22d3b4569b7f87af4ee4abfcc284deebe90de)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/AMHeartbeatRequestHandler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/MockResourceManagerFacade.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/AMRMClientRelayer.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestableFederationInterceptor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestFederationInterceptor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedApplicationManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/uam/TestUnmanagedApplicationManager.java


> Handle AM register requests asynchronously in FederationInterceptor
> ---
>
> Key: YARN-7652
> URL: https://issues.apache.org/jira/browse/YARN-7652
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, federation
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Subru Krishnan
>Assignee: Botong Huang
>Priority: Major
> Fix For: 2.10.0, 3.3.0
>
> Attachments: YARN-7652.v1.patch, YARN-7652.v2.patch
>
>
> We (cc [~goiri]/[~botong]) observed that the {{FederationInterceptor}} in 
> {{AMRMProxy}} (and consequently the AM) is blocked if the _StateStore_ has 
> outdated info about a _SubCluster_. This is because we handle AM register 
> requests synchronously. This jira proposes to move to async similar to how we 
> operate with allocate invocations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >