[jira] [Commented] (YARN-7581) HBase filters are not constructed correctly in ATSv2

2018-03-13 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398109#comment-16398109
 ] 

Rohith Sharma K S commented on YARN-7581:
-

Looking into patch, I get one doubt that how does filter looks for combination 
of metricstoretrieve and cofigstoretrieve? 
Does it look like below ?
{code}
FilterList AND (2/2): [
  FilterList AND (1/1): [
FilterList OR (2/2): [
  SingleColumnValueFilter (c, config_param1, EQUAL, "value1"),
  SingleColumnValueFilter (c, config_param1, EQUAL, "value3")
],
FilterList OR (2/2): [
  SingleColumnValueFilter (m, metric_param1, EQUAL, "metric_value1"),
  SingleColumnValueFilter (m, metric_param1, EQUAL, "metric_value3")
],
  ],
  FilterList OR (2/2): [
FilterList AND (5/5): [
  FamilyFilter (EQUAL, i),
  QualifierFilter (NOT_EQUAL, e!),
  QualifierFilter (NOT_EQUAL, i!),
  QualifierFilter (NOT_EQUAL, s!),
  QualifierFilter (NOT_EQUAL, r!)
],
FilterList AND (1/1): [
 FamilyFilter(Equal, c) 
],
FilterList AND (1/1): [
 FamilyFilter(Equal, m) 
],
  ]
]
{code}

> HBase filters are not constructed correctly in ATSv2
> 
>
> Key: YARN-7581
> URL: https://issues.apache.org/jira/browse/YARN-7581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.0.0-beta1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-7581.00.patch, YARN-7581.01.patch, 
> YARN-7581.02.patch
>
>
> Post YARN-7346,
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters() and 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters() 
> start to fail when hbase.profile is set to 2.0)
> *Error Message*
>  [ERROR] Failures:
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters:1266 
> expected:<2> but was:<0>
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters:1523 
> expected:<1> but was:<0>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned

2018-03-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398088#comment-16398088
 ] 

Wangda Tan commented on YARN-8020:
--

[~eepayne], could you explain why preemption doesn't happen for the case you 
mentioned:

bq. The place where it seems to get stuck is when the containers in the 
preemptable queue are using one or more smaller Resource elements than the 
containers in the asking queue. For example, it will sometimes not preempt if 
the preemptable queue has containers using  and the 
asking queue queue has containers using .

[~sunilg] mentioned one case before: YARN-6538 which also causes preemption not 
happening.

> when DRF is used, preemption does not trigger due to incorrect idealAssigned
> 
>
> Key: YARN-8020
> URL: https://issues.apache.org/jira/browse/YARN-8020
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Priority: Major
>
> I’ve met that Inter Queue Preemption does not work.
> It happens when DRF is used and submitting application with a large number of 
> vcores.
> IMHO, idealAssigned can be set incorrectly by following code.
> {code}
> // This function "accepts" all the resources it can (pending) and return
> // the unused ones
> Resource offer(Resource avail, ResourceCalculator rc,
> Resource clusterResource, boolean considersReservedResource) {
>   Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
>   Resources.subtract(getMax(), idealAssigned),
>   Resource.newInstance(0, 0));
>   // accepted = min{avail,
>   //   max - assigned,
>   //   current + pending - assigned,
>   //   # Make sure a queue will not get more than max of its
>   //   # used/guaranteed, this is to make sure preemption won't
>   //   # happen if all active queues are beyond their guaranteed
>   //   # This is for leaf queue only.
>   //   max(guaranteed, used) - assigned}
>   // remain = avail - accepted
>   Resource accepted = Resources.min(rc, clusterResource,
>   absMaxCapIdealAssignedDelta,
>   Resources.min(rc, clusterResource, avail, Resources
>   /*
>* When we're using FifoPreemptionSelector (considerReservedResource
>* = false).
>*
>* We should deduct reserved resource from pending to avoid 
> excessive
>* preemption:
>*
>* For example, if an under-utilized queue has used = reserved = 20.
>* Preemption policy will try to preempt 20 containers (which is not
>* satisfied) from different hosts.
>*
>* In FifoPreemptionSelector, there's no guarantee that preempted
>* resource can be used by pending request, so policy will preempt
>* resources repeatly.
>*/
>   .subtract(Resources.add(getUsed(),
>   (considersReservedResource ? pending : pendingDeductReserved)),
>   idealAssigned)));
> {code}
> let’s say,
> * cluster resource : 
> * idealAssigned(assigned): 
> * avail: 
> * current: 
> * pending: 
> current + pending - assigned: 
> min ( avail, (current + pending - assigned) ) : 
> accepted: 
> as a result, idealAssigned will be , which does not 
> trigger preemption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8022) ResourceManager UI cluster/app/ page fails to render

2018-03-13 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398065#comment-16398065
 ] 

Rohith Sharma K S commented on YARN-8022:
-

committing shortly

> ResourceManager UI cluster/app/ page fails to render
> 
>
> Key: YARN-8022
> URL: https://issues.apache.org/jira/browse/YARN-8022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Blocker
> Attachments: Screen Shot 2018-03-12 at 1.45.05 PM.png, 
> YARN-8022.001.patch, YARN-8022.002.patch
>
>
> The page displays the message "Failed to read the attempts of the application"
>  
> The following stack trace is observed in RM log.
> org.apache.hadoop.yarn.server.webapp.AppBlock: Failed to read the attempts of 
> the application application_1520597233415_0002.
> java.lang.NullPointerException
>  at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:283)
>  at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:280)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>  at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:279)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.render(RMAppBlock.java:71)
>  at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>  at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>  at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>  at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>  at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>  at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>  at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
>  at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>  at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.app(RmController.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7574) Add support for Node Labels on Auto Created Leaf Queue Template

2018-03-13 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398032#comment-16398032
 ] 

genericqa commented on YARN-7574:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 26s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 64 new + 140 unchanged - 7 fixed = 204 total (was 147) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 14s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
22s{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
 generated 1 new + 4 unchanged - 0 fixed = 5 total (was 4) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 47s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}138m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoQueueCreation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-7574 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914382/YARN-7574.4.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 469bead06a8e 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 
21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9714fc1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/19970/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 

[jira] [Commented] (YARN-7707) [GPG] Policy generator framework

2018-03-13 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397993#comment-16397993
 ] 

genericqa commented on YARN-7707:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-7402 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
59s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 4s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
33s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
29s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
16s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
YARN-7402 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
4s{color} | {color:green} YARN-7402 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
28s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  4s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 226 unchanged - 0 fixed = 227 total (was 226) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 57s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
41s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
5s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
10s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 35s{color} 
| {color:red} hadoop-yarn-server-globalpolicygenerator in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 90m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.globalpolicygenerator.policygenerator.TestPolicyGenerator |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce 

[jira] [Updated] (YARN-7708) [GPG] Load based policy generator

2018-03-13 Thread Young Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Young Chen updated YARN-7708:
-
Attachment: YARN-7708-YARN-7402.07.cumulative.patch

> [GPG] Load based policy generator
> -
>
> Key: YARN-7708
> URL: https://issues.apache.org/jira/browse/YARN-7708
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Young Chen
>Priority: Major
> Attachments: YARN-7708-YARN-7402.01.cumulative.patch, 
> YARN-7708-YARN-7402.02.cumulative.patch, 
> YARN-7708-YARN-7402.03.cumulative.patch, 
> YARN-7708-YARN-7402.04.cumulative.patch, 
> YARN-7708-YARN-7402.05.cumulative.patch, 
> YARN-7708-YARN-7402.06.cumulative.patch, 
> YARN-7708-YARN-7402.07.cumulative.patch
>
>
> This policy reads load from the "pendingQueueLength" metrics and provides 
> scaling into a set of weights that influence the AMRMProxy and Router 
> behaviors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7574) Add support for Node Labels on Auto Created Leaf Queue Template

2018-03-13 Thread Suma Shivaprasad (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397910#comment-16397910
 ] 

Suma Shivaprasad edited comment on YARN-7574 at 3/14/18 12:42 AM:
--

[~sunilg] Thanks for the review. Uploading patch after addressing comments for

{{1. map of queueName-> partition->{leaf queue's state}} seems not clean to me. 
(GuaranteedOrZeroCapacityOverTimePolicy). We should ideally derive queues from 
partition. 

 - Now maintaining a map of partition-> queueName->\{leaf queue's state} 

 

2. As we discussed - 
AutoCreatedQueueManagementPolicy.computeQueueManagementChnages does not update 
scheduler state or LeafQueueState and suggest a list of queue entitlement 
changes per partition which is later committed by scheduler after validation 
through commitQueueManagmentChanges

3. Also added validation to check if the node labels that are configured are 
also configued on ManagedParentQueue in 

GuaranteedOrZeroCapacityOverTimePolicy.initializeLeafQueueTemplate

 

 


was (Author: suma.shivaprasad):
[~sunilg] Thanks for the review. Uploading patch after addressing comments for

{{1. map of queueName-> partition->{leaf queue's state}} seems not clean to me. 
(GuaranteedOrZeroCapacityOverTimePolicy). We should ideally derive queues from 
partition. 

 - Now maintaining a map of partition-> queueName->\{leaf queue's state} 

 

2. As we discussed - 
AutoCreatedQueueManagementPolicy.computeQueueManagementChnages does not update 
scheduler state or LeafQueueState and suggest a list of queue entitlement 
changes per partition which is later committed by scheduler after validation 
through commitQueueManagmentChanges

 

 

> Add support for Node Labels on Auto Created Leaf Queue Template
> ---
>
> Key: YARN-7574
> URL: https://issues.apache.org/jira/browse/YARN-7574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-7574.1.patch, YARN-7574.2.patch, YARN-7574.3.patch, 
> YARN-7574.4.patch
>
>
> YARN-7473 adds support for auto created leaf queues to inherit node labels 
> capacities from parent queues. Howebver there is no support for leaf queue 
> template to allow different configured capacities for different node labels. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7574) Add support for Node Labels on Auto Created Leaf Queue Template

2018-03-13 Thread Suma Shivaprasad (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-7574:
---
Attachment: YARN-7574.4.patch

> Add support for Node Labels on Auto Created Leaf Queue Template
> ---
>
> Key: YARN-7574
> URL: https://issues.apache.org/jira/browse/YARN-7574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-7574.1.patch, YARN-7574.2.patch, YARN-7574.3.patch, 
> YARN-7574.4.patch
>
>
> YARN-7473 adds support for auto created leaf queues to inherit node labels 
> capacities from parent queues. Howebver there is no support for leaf queue 
> template to allow different configured capacities for different node labels. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7574) Add support for Node Labels on Auto Created Leaf Queue Template

2018-03-13 Thread Suma Shivaprasad (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397910#comment-16397910
 ] 

Suma Shivaprasad commented on YARN-7574:


[~sunilg] Thanks for the review. Uploading patch after addressing comments for

{{1. map of queueName-> partition->{leaf queue's state}} seems not clean to me. 
(GuaranteedOrZeroCapacityOverTimePolicy). We should ideally derive queues from 
partition. 

 - Now maintaining a map of partition-> queueName->\{leaf queue's state} 

 

2. As we discussed - 
AutoCreatedQueueManagementPolicy.computeQueueManagementChnages does not update 
scheduler state or LeafQueueState and suggest a list of queue entitlement 
changes per partition which is later committed by scheduler after validation 
through commitQueueManagmentChanges

 

 

> Add support for Node Labels on Auto Created Leaf Queue Template
> ---
>
> Key: YARN-7574
> URL: https://issues.apache.org/jira/browse/YARN-7574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-7574.1.patch, YARN-7574.2.patch, YARN-7574.3.patch
>
>
> YARN-7473 adds support for auto created leaf queues to inherit node labels 
> capacities from parent queues. Howebver there is no support for leaf queue 
> template to allow different configured capacities for different node labels. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8027) Setting hostname of docker container breaks for --net=host in docker 1.13

2018-03-13 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397855#comment-16397855
 ] 

genericqa commented on YARN-8027:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 56s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 53s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8027 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914366/YARN-8027.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 932d8aafdf45 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 
19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9d6994d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/19968/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19968/testReport/ |
| Max. process+thread count | 290 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 

[jira] [Commented] (YARN-7707) [GPG] Policy generator framework

2018-03-13 Thread Young Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397835#comment-16397835
 ] 

Young Chen commented on YARN-7707:
--

Thanks for taking a look [~giovanni.fumarola]. I've fixed all your suggestions, 
and also cleaned up the GlobalPolicy registerPaths structure a little. Let me 
know what you think.

> [GPG] Policy generator framework
> 
>
> Key: YARN-7707
> URL: https://issues.apache.org/jira/browse/YARN-7707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Young Chen
>Priority: Major
>  Labels: federation, gpg
> Attachments: YARN-7707-YARN-7402.01.patch, 
> YARN-7707-YARN-7402.02.patch, YARN-7707-YARN-7402.03.patch, 
> YARN-7707-YARN-7402.04.patch, YARN-7707-YARN-7402.05.patch, 
> YARN-7707-YARN-7402.06.patch, YARN-7707-YARN-7402.07.patch, 
> YARN-7707-YARN-7402.08.patch, YARN-7707-YARN-7402.09.patch, 
> YARN-7707-YARN-7402.10.patch
>
>
> This JIRA tracks the development of a generic framework for querying 
> sub-clusters for metrics, running policies, and updating them in the 
> FederationStateStore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7707) [GPG] Policy generator framework

2018-03-13 Thread Young Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Young Chen updated YARN-7707:
-
Attachment: YARN-7707-YARN-7402.10.patch

> [GPG] Policy generator framework
> 
>
> Key: YARN-7707
> URL: https://issues.apache.org/jira/browse/YARN-7707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Young Chen
>Priority: Major
>  Labels: federation, gpg
> Attachments: YARN-7707-YARN-7402.01.patch, 
> YARN-7707-YARN-7402.02.patch, YARN-7707-YARN-7402.03.patch, 
> YARN-7707-YARN-7402.04.patch, YARN-7707-YARN-7402.05.patch, 
> YARN-7707-YARN-7402.06.patch, YARN-7707-YARN-7402.07.patch, 
> YARN-7707-YARN-7402.08.patch, YARN-7707-YARN-7402.09.patch, 
> YARN-7707-YARN-7402.10.patch
>
>
> This JIRA tracks the development of a generic framework for querying 
> sub-clusters for metrics, running policies, and updating them in the 
> FederationStateStore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8027) Setting hostname of docker container breaks for --net=host in docker 1.13

2018-03-13 Thread Jim Brennan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-8027:
--
Attachment: YARN-8027.001.patch

> Setting hostname of docker container breaks for --net=host in docker 1.13
> -
>
> Key: YARN-8027
> URL: https://issues.apache.org/jira/browse/YARN-8027
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-8027.001.patch
>
>
> In DockerLinuxContainerRuntime:launchContainer, we are adding the --hostname 
> argument to the docker run command to set the hostname in the container to 
> something like:  ctr-e84-1520889172376-0001-01-01.
> This does not work when combined with the --net=host command line option in 
> Docker 1.13.1.  It causes multiple failures when the client tries to resolve 
> the hostname and it fails.
> We haven't seen this before because we were using docker 1.12.6 which seems 
> to ignore --hostname when you are using --net=host.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8010) add config in FederationRMFailoverProxy to not bypass facade cache when failing over

2018-03-13 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397691#comment-16397691
 ] 

genericqa commented on YARN-8010:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
14s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
57s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 57s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
42s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
11s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 27m 
44s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | YARN-8010 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914339/YARN-8010.v3.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 58b0a74cd01a 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 

[jira] [Commented] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-03-13 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397685#comment-16397685
 ] 

Eric Yang commented on YARN-7999:
-

[~jlowe] Good catch on my mistake.  This was what happened.  I am commit patch 
02 bearing no objections.

> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Jason Lowe
>Priority: Major
> Attachments: YARN-7999.001.patch, YARN-7999.002.patch, q3.log
>
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_20]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_20
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_20//container_1520032931921_0001_01_20.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-03-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397635#comment-16397635
 ] 

Jason Lowe commented on YARN-7999:
--

The problem is the version number.  The container-executor config says 
3.1.0-SNAPSHOT but the directory being created is 3.2.0-SNAPSHOT.  The 
container-executor configs are (correctly) rejecting the path.  They need to be 
updated because the logs root directory path apparently changed when you moved 
to trunk.

Assuming updating the container-executor configs fixes that issue, are you good 
with this patch going in?


> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Jason Lowe
>Priority: Major
> Attachments: YARN-7999.001.patch, YARN-7999.002.patch, q3.log
>
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_20]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_20
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_20//container_1520032931921_0001_01_20.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-03-13 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397627#comment-16397627
 ] 

Eric Yang commented on YARN-7999:
-

[~jlowe] I see this:

{code}
Exception message: Invalid docker rw mount 
'/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1520958619361_0003/container_1520958619361_0003_01_03:/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1520958619361_0003/container_1520958619361_0003_01_03',
 
realpath=/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1520958619361_0003/container_1520958619361_0003_01_03
Error constructing docker command, docker error code=14, error message='Invalid 
docker read-write mount'
{code}

With debug delay turned on, I can see that 
/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1520958619361_0003/container_1520958619361_0003_01_03
 exists.  Do I need to manually specify:

{code}
docker.allowed.rw-mounts=/tmp,/usr/local/hadoop-3.1.0-SNAPSHOT/logs/*
{code}

Instead of current:

{code}
docker.allowed.rw-mounts=/tmp,/usr/local/hadoop-3.1.0-SNAPSHOT/logs
{code}

> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Jason Lowe
>Priority: Major
> Attachments: YARN-7999.001.patch, YARN-7999.002.patch, q3.log
>
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_20]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_20
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_20//container_1520032931921_0001_01_20.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8028) Support authorizeUserAccessToQueue in RMWebServices

2018-03-13 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8028:
-
Description: Currently we have {{QueueUserACLInfo}} in ApplicationClient, 
we should support similar API in REST API.

> Support authorizeUserAccessToQueue in RMWebServices
> ---
>
> Key: YARN-8028
> URL: https://issues.apache.org/jira/browse/YARN-8028
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> Currently we have {{QueueUserACLInfo}} in ApplicationClient, we should 
> support similar API in REST API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8028) Support authorizeUserAccessToQueue in RMWebServices

2018-03-13 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8028:


 Summary: Support authorizeUserAccessToQueue in RMWebServices
 Key: YARN-8028
 URL: https://issues.apache.org/jira/browse/YARN-8028
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wangda Tan
Assignee: Wangda Tan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-03-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397592#comment-16397592
 ] 

Jason Lowe commented on YARN-7999:
--

Thanks for the logs!  I don't see how this could be a race, the container 
executor is not multithreaded and doesn't start running the docker command for 
a container before it has completed creating the directories for the container.

>From the logs it looks like we never got around to running a docker command at 
>all, rather the mount security checks within the container executor are 
>failing.  The "Creating local dirs..." log implies that the local directories 
>(including log directories per my previous comment) are being created, and 
>that's just before it tries to construct the docker run command which checks 
>the mount permissions.

I don't see an error like "Could not determine real path of mount" or "Could 
not stat path" in the launch logs, so I'm guessing the log directory is 
actually being created.  You could try setting 
yarn.nodemanager.delete.debug-delay-sec to a large enough value to facilitate 
verifying the log directory is actually there.  Given it's not complaining 
about being unable to stat the mount path before complaining about it, I 
suspect it is there.  That leads me to believe that it doesn't think that path 
is allowed rather than not there, which implies it is either missing from the 
whitelisted paths in the container executor config or maybe something is wrong 
with YARN-7626 which did recently go into trunk.

> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Jason Lowe
>Priority: Major
> Attachments: YARN-7999.001.patch, YARN-7999.002.patch, q3.log
>
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_20]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_20
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_20//container_1520032931921_0001_01_20.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2018-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397580#comment-16397580
 ] 

Hudson commented on YARN-5764:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13830 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13830/])
YARN-5764. NUMA awareness support for launching containers. Contributed 
(szegedim: rev a82d4a2e3a6a5448e371cef0cb86d5dbe4871ccd)
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/numa/NumaNodeResource.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/numa/NumaResourceAllocation.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/numa/TestNumaResourceHandlerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperation.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/numa/NumaResourceAllocator.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/numa/NumaResourceHandlerImpl.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/numa/package-info.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/numa/TestNumaResourceAllocator.java


> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
>Priority: Major
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v10.patch, 
> YARN-5764-v11.patch, YARN-5764-v2.patch, YARN-5764-v3.patch, 
> YARN-5764-v4.patch, YARN-5764-v5.patch, YARN-5764-v6.patch, 
> YARN-5764-v7.patch, YARN-5764-v8.patch, YARN-5764-v9.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8010) add config in FederationRMFailoverProxy to not bypass facade cache when failing over

2018-03-13 Thread Botong Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8010:
---
Attachment: YARN-8010.v3.patch

> add config in FederationRMFailoverProxy to not bypass facade cache when 
> failing over
> 
>
> Key: YARN-8010
> URL: https://issues.apache.org/jira/browse/YARN-8010
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
> Attachments: YARN-8010.v1.patch, YARN-8010.v1.patch, 
> YARN-8010.v2.patch, YARN-8010.v3.patch
>
>
> Today when YarnRM is failing over, the FederationRMFailoverProxy running in 
> AMRMProxy will perform failover, try to get latest subcluster info from 
> FederationStateStore and then retry connect to the latest YarnRM master. When 
> calling getSubCluster() to FederationStateStoreFacade, it bypasses the cache 
> with a flush flag. When YarnRM is failing over, every AM heartbeat thread 
> creates a different thread inside FederationInterceptor, each of which keeps 
> performing failover several times. This leads to a big spike of getSubCluster 
> call to FederationStateStore. 
> Depending on the cluster setup (e.g. putting a VIP before all YarnRMs), 
> YarnRM master slave change might not result in RM addr change. In other 
> cases, a small delay of getting latest subcluster information may be 
> acceptable. This patch thus creates a config option, so that it is possible 
> to ask the FederationRMFailoverProxy to not flush cache when calling 
> getSubCluster(). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5764) NUMA awareness support for launching containers

2018-03-13 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397531#comment-16397531
 ] 

Miklos Szegedi commented on YARN-5764:
--

Committed to trunk. Thank you for the contribution [~devaraj.k], [~olasoji] for 
the report and for the reviews [~leftnoteasy], [~rajesh.balamohan], 
[~raviprak], [~sunilg] and [~rohithsharma].

> NUMA awareness support for launching containers
> ---
>
> Key: YARN-5764
> URL: https://issues.apache.org/jira/browse/YARN-5764
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Reporter: Olasoji
>Assignee: Devaraj K
>Priority: Major
> Attachments: NUMA Awareness for YARN Containers.pdf, NUMA Performance 
> Results.pdf, YARN-5764-v0.patch, YARN-5764-v1.patch, YARN-5764-v10.patch, 
> YARN-5764-v11.patch, YARN-5764-v2.patch, YARN-5764-v3.patch, 
> YARN-5764-v4.patch, YARN-5764-v5.patch, YARN-5764-v6.patch, 
> YARN-5764-v7.patch, YARN-5764-v8.patch, YARN-5764-v9.patch
>
>
> The purpose of this feature is to improve Hadoop performance by minimizing 
> costly remote memory accesses on non SMP systems. Yarn containers, on launch, 
> will be pinned to a specific NUMA node and all subsequent memory allocations 
> will be served by the same node, reducing remote memory accesses. The current 
> default behavior is to spread memory across all NUMA nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8022) ResourceManager UI cluster/app/ page fails to render

2018-03-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397525#comment-16397525
 ] 

Wangda Tan commented on YARN-8022:
--

Thanks [~rohithsharma]/[~tarunparimi], 

Please commit the patch if you think it is ready.

> ResourceManager UI cluster/app/ page fails to render
> 
>
> Key: YARN-8022
> URL: https://issues.apache.org/jira/browse/YARN-8022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Blocker
> Attachments: Screen Shot 2018-03-12 at 1.45.05 PM.png, 
> YARN-8022.001.patch, YARN-8022.002.patch
>
>
> The page displays the message "Failed to read the attempts of the application"
>  
> The following stack trace is observed in RM log.
> org.apache.hadoop.yarn.server.webapp.AppBlock: Failed to read the attempts of 
> the application application_1520597233415_0002.
> java.lang.NullPointerException
>  at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:283)
>  at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:280)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>  at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:279)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.render(RMAppBlock.java:71)
>  at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>  at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>  at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>  at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>  at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>  at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>  at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
>  at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>  at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.app(RmController.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement

2018-03-13 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397388#comment-16397388
 ] 

Sunil G commented on YARN-7494:
---

This was a bit due. Sorry for the delay.

Updated as per comments. However instead of a CompositeService, we can use the 
model like SchedulerMonitor. Its more easy and also will help to avoid creating 
custom sorters without extending AbstractService. Pls check.

> Add muti node lookup support for better placement
> -
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.003.patch, YARN-7494.v0.patch, YARN-7494.v1.patch, 
> multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7494) Add muti node lookup support for better placement

2018-03-13 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-7494:
--
Attachment: YARN-7494.003.patch

> Add muti node lookup support for better placement
> -
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.003.patch, YARN-7494.v0.patch, YARN-7494.v1.patch, 
> multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-03-13 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7999:

Attachment: q3.log

> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Jason Lowe
>Priority: Major
> Attachments: YARN-7999.001.patch, YARN-7999.002.patch, q3.log
>
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_20]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_20
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_20//container_1520032931921_0001_01_20.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-03-13 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397344#comment-16397344
 ] 

Eric Yang commented on YARN-7999:
-

[~jlowe] This is non entry_point docker image with clean checkout of trunk 
code.  There is no shared storage between nodes.  I turned off docker daemon on 
one of the node and this problem surfaced.  However, I think my earlier 
analysis is incorrect.  The directory problem also exists in node that have 
functional docker.  However, it appears to be a race condition between log 
directory creation and docker run.  Please see the attached log file.

> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Jason Lowe
>Priority: Major
> Attachments: YARN-7999.001.patch, YARN-7999.002.patch
>
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_20]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_20
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_20//container_1520032931921_0001_01_20.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-03-13 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397344#comment-16397344
 ] 

Eric Yang edited comment on YARN-7999 at 3/13/18 5:38 PM:
--

[~jlowe] This is non entry_point docker image with clean checkout of trunk 
code.  There is no shared storage between nodes.  I turned off docker daemon on 
one of the node and this problem surfaced.  However, I think my earlier 
analysis is incorrect.  The directory problem also exists in node that have 
functional docker.  However, it appears to be a race condition between log 
directory creation and docker run.  Please see the attached log file (q3.log).


was (Author: eyang):
[~jlowe] This is non entry_point docker image with clean checkout of trunk 
code.  There is no shared storage between nodes.  I turned off docker daemon on 
one of the node and this problem surfaced.  However, I think my earlier 
analysis is incorrect.  The directory problem also exists in node that have 
functional docker.  However, it appears to be a race condition between log 
directory creation and docker run.  Please see the attached log file.

> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Jason Lowe
>Priority: Major
> Attachments: YARN-7999.001.patch, YARN-7999.002.patch
>
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_20]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_20
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_20//container_1520032931921_0001_01_20.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7581) HBase filters are not constructed correctly in ATSv2

2018-03-13 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397342#comment-16397342
 ] 

Rohith Sharma K S commented on YARN-7581:
-

I ran failing tests against hbase-2 profile with this patch and all tests 
passed. I will look into more details tomorrow. 

> HBase filters are not constructed correctly in ATSv2
> 
>
> Key: YARN-7581
> URL: https://issues.apache.org/jira/browse/YARN-7581
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.0.0-beta1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-7581.00.patch, YARN-7581.01.patch, 
> YARN-7581.02.patch
>
>
> Post YARN-7346,
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters() and 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters() 
> start to fail when hbase.profile is set to 2.0)
> *Error Message*
>  [ERROR] Failures:
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesConfigFilters:1266 
> expected:<2> but was:<0>
>  [ERROR] 
> TestTimelineReaderWebServicesHBaseStorage.testGetEntitiesMetricFilters:1523 
> expected:<1> but was:<0>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned

2018-03-13 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397262#comment-16397262
 ] 

Eric Payne commented on YARN-8020:
--

[~kyungwan nam], on what version of YARN are you seeing this problem? My 
experience with DRF is different than is described above. I have investigated 
this on both 2.8 and 3.2 snapshot builds.

We are using the DRF calculator in large preemptable queues with various sizes 
of containers using both large memory or large vcores or both. Cross-queue 
preemption seems to be working well in general. I do see a corner case, but 
first I want to address your above comments.

bq. as a result, idealAssigned will be , which does 
not trigger preemption.
If one of the elements in the idealAssigned Resource is 0 or less than 0, 
preemption will not occur. This is so that preemption won't bring the queue too 
far below its guarantee for one of the elements. Having said that, it will 
preempt to a large extent even if it brings one of the elements below its 
guarantee, but if one of them goes to 0 or below in the idealAssigned Resource, 
it will stop preempting.

bq. avail: 
Cross-queue preemption will not preempt if there are available resources in the 
cluster or queue. It depends on how many resources are being requested by the 
other queue, but even with 1 available vcore, preemption may choose not to 
preempt in this case as well.

Now on to my corner case.

I do not see a problem using DRF if the containers in the preemptable queue 
have a larger Resource element and the containers in the asking queue have 
smaller Resource elements. For example, it seems to work fine if Resources in 
the preemptable queue is using  containers and the 
asking queue is using smaller containers, for example  
containers.

The place where it seems to get stuck is when the containers in the preemptable 
queue are using one or more smaller Resource elements than the containers in 
the asking queue. For example, it will sometimes not preempt if the preemptable 
queue has containers using  and the asking queue queue 
has containers using .

Even in the latter case, preemption will sometimes still occur, depending on 
the ratio of the sizes of each element to the ones in the ohter queue.

It would be helpful if you can provide a more detailed use case to describe 
exactly what you are seeing so I can try to reproduce it.

> when DRF is used, preemption does not trigger due to incorrect idealAssigned
> 
>
> Key: YARN-8020
> URL: https://issues.apache.org/jira/browse/YARN-8020
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Priority: Major
>
> I’ve met that Inter Queue Preemption does not work.
> It happens when DRF is used and submitting application with a large number of 
> vcores.
> IMHO, idealAssigned can be set incorrectly by following code.
> {code}
> // This function "accepts" all the resources it can (pending) and return
> // the unused ones
> Resource offer(Resource avail, ResourceCalculator rc,
> Resource clusterResource, boolean considersReservedResource) {
>   Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
>   Resources.subtract(getMax(), idealAssigned),
>   Resource.newInstance(0, 0));
>   // accepted = min{avail,
>   //   max - assigned,
>   //   current + pending - assigned,
>   //   # Make sure a queue will not get more than max of its
>   //   # used/guaranteed, this is to make sure preemption won't
>   //   # happen if all active queues are beyond their guaranteed
>   //   # This is for leaf queue only.
>   //   max(guaranteed, used) - assigned}
>   // remain = avail - accepted
>   Resource accepted = Resources.min(rc, clusterResource,
>   absMaxCapIdealAssignedDelta,
>   Resources.min(rc, clusterResource, avail, Resources
>   /*
>* When we're using FifoPreemptionSelector (considerReservedResource
>* = false).
>*
>* We should deduct reserved resource from pending to avoid 
> excessive
>* preemption:
>*
>* For example, if an under-utilized queue has used = reserved = 20.
>* Preemption policy will try to preempt 20 containers (which is not
>* satisfied) from different hosts.
>*
>* In FifoPreemptionSelector, there's no guarantee that preempted
>* resource can be used by pending request, so policy will preempt
>* resources repeatly.
>*/
>   .subtract(Resources.add(getUsed(),
>   

[jira] [Commented] (YARN-7999) Docker launch fails when user private filecache directory is missing

2018-03-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397068#comment-16397068
 ] 

Jason Lowe commented on YARN-7999:
--

The container executor should always create at least one log directory.  
{{launch_docker_container_as_user}} calls {{create_local_dirs}} which in turn 
calls {{create_container_dirs}} and that creates the container log directories 
(well, at least one after YARN-7590).  Docker has always been mounting the log 
directories, even before YARN-7815, so I can't readily explain how this failure 
is new.

Are you able to run any containers on trunk, entry point or not, and with or 
without this patch?  Do you have any details on how to reproduce this?  We were 
able to readily reproduce the original failure described in the JIRA, but this 
new failure mode we cannot reproduce and I cannot explain based on how the 
container executor code in trunk is written.

bq. The container attempted on the faulty node, and initialized logging 
directory on the faulty node. When the same attempt is started on other nodes, 
it does not initialize logging directory on other node which leads to the 
failure.

There's normally no state shared between nodes, so I can't explain how a faulty 
node could change the container initializing behavior on another node unless 
they are sharing NM directories via NFS or a similarly odd setup.  Do you have 
any idea how one node's failure could affect the other node behaviors?


> Docker launch fails when user private filecache directory is missing
> 
>
> Key: YARN-7999
> URL: https://issues.apache.org/jira/browse/YARN-7999
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Jason Lowe
>Priority: Major
> Attachments: YARN-7999.001.patch, YARN-7999.002.patch
>
>
> Docker container is failing to launch in trunk.  The root cause is:
> {code}
> [COMPINSTANCE sleeper-1 : container_1520032931921_0001_01_20]: 
> [2018-03-02 23:26:09.196]Exception from container-launch.
> Container id: container_1520032931921_0001_01_20
> Exit code: 29
> Exception message: image: hadoop/centos:latest is trusted in hadoop registry.
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Could not determine real path of mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache'
> Invalid docker mount 
> '/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache',
>  realpath=/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache
> Error constructing docker command, docker error code=12, error 
> message='Invalid docker mount'
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> [2018-03-02 23:26:09.240]Diagnostic message from attempt 0 : [2018-03-02 
> 23:26:09.240]
> [2018-03-02 23:26:09.240]Container exited with a non-zero exit code 29.
> [2018-03-02 23:26:39.278]Could not find 
> nmPrivate/application_1520032931921_0001/container_1520032931921_0001_01_20//container_1520032931921_0001_01_20.pid
>  in any of the directories
> [COMPONENT sleeper]: Failed 11 times, exceeded the limit - 10. Shutting down 
> now...
> {code}
> The filecache cant not be mounted because it doesn't exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8027) Setting hostname of docker container breaks for --net=host in docker 1.13

2018-03-13 Thread Jim Brennan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397007#comment-16397007
 ] 

Jim Brennan commented on YARN-8027:
---

{quote}We should look into whether it is a bug in that version of Docker. I see 
a couple of tickets regarding adding support for setting hostname when 
net=host, which would indicate that is a valid setting. I have not dug far 
enough to determine which versions are supposed to support it.
{quote}
[~billie.rinaldi], I think it is actually the opposite. Specifying --hostname 
with --net=host was broken before docker 1.13.1, which is why it didn't cause 
us a problem. In 1.13.1 though, it works, which breaks our ability to resolve 
the hostname, since we are not using Registry DNS.

I agree with [~jlowe] and [~shaneku...@gmail.com], we should only set the 
hostname when Registry DNS is enabled, as long as this is indeed always the 
case. We haven't experimented with user-defined networks here - is it the case 
that Registry DNS must always be used for user-defined networks?

> Setting hostname of docker container breaks for --net=host in docker 1.13
> -
>
> Key: YARN-8027
> URL: https://issues.apache.org/jira/browse/YARN-8027
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
>
> In DockerLinuxContainerRuntime:launchContainer, we are adding the --hostname 
> argument to the docker run command to set the hostname in the container to 
> something like:  ctr-e84-1520889172376-0001-01-01.
> This does not work when combined with the --net=host command line option in 
> Docker 1.13.1.  It causes multiple failures when the client tries to resolve 
> the hostname and it fails.
> We haven't seen this before because we were using docker 1.12.6 which seems 
> to ignore --hostname when you are using --net=host.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8027) Setting hostname of docker container breaks for --net=host in docker 1.13

2018-03-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396987#comment-16396987
 ] 

Jason Lowe commented on YARN-8027:
--

Overriding the hostname only makes sense if the new name is resolvable.  
Otherwise we're going to break any application that assumes they can lookup the 
hostname, which seems like a reasonable assumption to me.  The names we want to 
use for the hostname are only going to be resolvable when Registry DNS is 
enabled, so the hostname override behavior should be tied to that.


> Setting hostname of docker container breaks for --net=host in docker 1.13
> -
>
> Key: YARN-8027
> URL: https://issues.apache.org/jira/browse/YARN-8027
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
>
> In DockerLinuxContainerRuntime:launchContainer, we are adding the --hostname 
> argument to the docker run command to set the hostname in the container to 
> something like:  ctr-e84-1520889172376-0001-01-01.
> This does not work when combined with the --net=host command line option in 
> Docker 1.13.1.  It causes multiple failures when the client tries to resolve 
> the hostname and it fails.
> We haven't seen this before because we were using docker 1.12.6 which seems 
> to ignore --hostname when you are using --net=host.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8027) Setting hostname of docker container breaks for --net=host in docker 1.13

2018-03-13 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396967#comment-16396967
 ] 

Shane Kumpf commented on YARN-8027:
---

Honoring the YARN provided hostname was discussed a bit in YARN-7797 and 
YARN-7935, however, it seems that decision to keep the YARN generated hostname 
breaks down in a few places, namely for deployments not running Registry DNS. 
Being able to validate Registry DNS, even when --net=host, does have value, but 
it may not be worth the trade off.

Few options I can think of:
 # Check the configuration to determine if Registry DNS is enabled. If 
disabled, don't set the --hostname. If enabled, set the --hostname.
 # Don't set the --hostname if --net=host in all cases.
 # Add a configuration flag to toggle this behavior for --net=host containers 
(something like: 
yarn.nodemanager.runtime.linux.docker.network.host.use-yarn-generated-hostname),
 decoupling the dependency on the Registry DNS.

Option #2 would provide consistency in the case where Docker ignores the 
--hostname flag, but makes it impossible for a user to achieve the current 
behavior. I'm struggling with a use case where we would want to set --hostname 
without Registry DNS, so #1 seems more appropriate than adding a new config 
called out in #3, but maybe others can think of a case where decoupling the 
configuration from Registry DNS makes sense.

> Setting hostname of docker container breaks for --net=host in docker 1.13
> -
>
> Key: YARN-8027
> URL: https://issues.apache.org/jira/browse/YARN-8027
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
>
> In DockerLinuxContainerRuntime:launchContainer, we are adding the --hostname 
> argument to the docker run command to set the hostname in the container to 
> something like:  ctr-e84-1520889172376-0001-01-01.
> This does not work when combined with the --net=host command line option in 
> Docker 1.13.1.  It causes multiple failures when the client tries to resolve 
> the hostname and it fails.
> We haven't seen this before because we were using docker 1.12.6 which seems 
> to ignore --hostname when you are using --net=host.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8022) ResourceManager UI cluster/app/ page fails to render

2018-03-13 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396813#comment-16396813
 ] 

Rohith Sharma K S commented on YARN-8022:
-

thanks [~tarunparimi] for verification. [~sunilg] [~leftnoteasy] do you have 
any comments? Shall I commit this patch? 

> ResourceManager UI cluster/app/ page fails to render
> 
>
> Key: YARN-8022
> URL: https://issues.apache.org/jira/browse/YARN-8022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Blocker
> Attachments: Screen Shot 2018-03-12 at 1.45.05 PM.png, 
> YARN-8022.001.patch, YARN-8022.002.patch
>
>
> The page displays the message "Failed to read the attempts of the application"
>  
> The following stack trace is observed in RM log.
> org.apache.hadoop.yarn.server.webapp.AppBlock: Failed to read the attempts of 
> the application application_1520597233415_0002.
> java.lang.NullPointerException
>  at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:283)
>  at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:280)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>  at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:279)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.render(RMAppBlock.java:71)
>  at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>  at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>  at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>  at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>  at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>  at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>  at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
>  at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>  at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.app(RmController.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8022) ResourceManager UI cluster/app/ page fails to render

2018-03-13 Thread Tarun Parimi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396715#comment-16396715
 ] 

Tarun Parimi commented on YARN-8022:


Managed to check the patch with spnego auth as well. The application page is 
displayed fine for authorized user's applications. 

> ResourceManager UI cluster/app/ page fails to render
> 
>
> Key: YARN-8022
> URL: https://issues.apache.org/jira/browse/YARN-8022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Blocker
> Attachments: Screen Shot 2018-03-12 at 1.45.05 PM.png, 
> YARN-8022.001.patch, YARN-8022.002.patch
>
>
> The page displays the message "Failed to read the attempts of the application"
>  
> The following stack trace is observed in RM log.
> org.apache.hadoop.yarn.server.webapp.AppBlock: Failed to read the attempts of 
> the application application_1520597233415_0002.
> java.lang.NullPointerException
>  at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:283)
>  at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:280)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>  at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:279)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.render(RMAppBlock.java:71)
>  at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>  at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>  at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>  at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>  at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>  at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>  at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
>  at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>  at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.app(RmController.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8022) ResourceManager UI cluster/app/ page fails to render

2018-03-13 Thread Tarun Parimi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396604#comment-16396604
 ] 

Tarun Parimi commented on YARN-8022:


I tested the patch manually on trunk. The application page now renders properly 
for simple auth and displays the application info and attempts info as before.

Removed *org.apache.hadoop.http.lib.StaticUserWebFilter* from 
*hadoop.http.filter.initializers*  ** configuration to check for null callerUGI 
scenario and able to access the same page properly in both unsecure/secure 
cluster with simple auth.

Haven't checked with spnego auth enabled. But I guess behavior should be fine 
since unauthorized users won't be able to access the page.

> ResourceManager UI cluster/app/ page fails to render
> 
>
> Key: YARN-8022
> URL: https://issues.apache.org/jira/browse/YARN-8022
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Blocker
> Attachments: Screen Shot 2018-03-12 at 1.45.05 PM.png, 
> YARN-8022.001.patch, YARN-8022.002.patch
>
>
> The page displays the message "Failed to read the attempts of the application"
>  
> The following stack trace is observed in RM log.
> org.apache.hadoop.yarn.server.webapp.AppBlock: Failed to read the attempts of 
> the application application_1520597233415_0002.
> java.lang.NullPointerException
>  at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:283)
>  at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:280)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>  at org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:279)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppBlock.render(RMAppBlock.java:71)
>  at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>  at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>  at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>  at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>  at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
>  at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
>  at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
>  at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>  at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.app(RmController.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org