[jira] [Commented] (YARN-6827) [ATS1/1.5] NPE exception while publishing recovering applications into ATS during RM restart.

2017-07-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088468#comment-16088468
 ] 

Rohith Sharma K S commented on YARN-6827:
-

Attaching the failure trace below. This shows that applications are recovered 
first before ATS services are started. 

{noformat}
2017-07-15 10:19:35,200 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to 
active state
2017-07-15 10:19:35,245 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery started
2017-07-15 10:19:35,253 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded RM 
state version info 1.4
2017-07-15 10:19:35,431 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Unknown 
child node with name: HIERARCHIES
2017-07-15 10:19:35,452 INFO 
org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager:
 recovering RMDelegationTokenSecretManager.
2017-07-15 10:19:35,455 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Recovering 16 
applications
2017-07-15 10:19:35,518 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Priority '0' is acceptable in queue : default for application: 
application_1499929227397_0001
2017-07-15 10:19:35,578 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher:
 Error when publishing entity [YARN_APPLICATION,application_1499929227397_0001]
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.appFinished(TimelineServiceV1Publisher.java:156)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$FinalTransition.transition(RMAppImpl.java:1472)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1073)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1062)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:887)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:383)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:590)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1372)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:749)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1131)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1171)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1167)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:317)
at 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:143)
at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:893)
at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:472)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)

{noformat}

> [ATS1/1.5] NPE exception while publishing recovering applications into ATS 
> during RM restart.
> 

[jira] [Assigned] (YARN-6827) [ATS1/1.5] NPE exception while publishing recovering applications into ATS during RM restart.

2017-07-14 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-6827:
---

Assignee: Rohith Sharma K S

> [ATS1/1.5] NPE exception while publishing recovering applications into ATS 
> during RM restart.
> -
>
> Key: YARN-6827
> URL: https://issues.apache.org/jira/browse/YARN-6827
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> While recovering application, it is observed that NPE exception is thrown as 
> below.
> {noformat}
> 017-07-13 14:08:12,476 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher:
>  Error when publishing entity 
> [YARN_APPLICATION,application_1499929227397_0001]
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368)
> {noformat}
> This is because in RM service creation, active services are created first and 
> later ATS services are created. It means active services are started and ATS 
> services are started later point of time. 
> This gives sufficient time to active services recover the applications which 
> tries to publish into ATS while recovering. Since ATS services are not 
> started yet, it throws NPE. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6741) Deleting all children of a Parent Queue on refresh throws exception

2017-07-14 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-6741:

Attachment: YARN-6741.003.patch

have uploaded a new patch as the old patch failed to apply on the trunk

> Deleting all children of a Parent Queue on refresh throws exception
> ---
>
> Key: YARN-6741
> URL: https://issues.apache.org/jira/browse/YARN-6741
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0-alpha3
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-6741.001.patch, YARN-6741.002.patch, 
> YARN-6741.003.patch
>
>
> If we configure CS such that all  children of a parent queue are deleted and 
> made as a leaf queue, then {{refreshQueue}} operation fails when 
> re-initializing the parent Queue
> {code}
>// Sanity check
>   if (!(newlyParsedQueue instanceof ParentQueue) || !newlyParsedQueue
>   .getQueuePath().equals(getQueuePath())) {
> throw new IOException(
> "Trying to reinitialize " + getQueuePath() + " from "
> + newlyParsedQueue.getQueuePath());
>   }
> {code}
> *Expected Behavior:*
> Converting a Parent Queue to leafQueue on refreshQueue needs to be supported.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6827) [ATS1/1.5] NPE exception while publishing recovering applications into ATS during RM restart.

2017-07-14 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-6827:

Summary: [ATS1/1.5] NPE exception while publishing recovering applications 
into ATS during RM restart.  (was: NPE exception while publishing recovering 
applications into ATS during RM restart.)

> [ATS1/1.5] NPE exception while publishing recovering applications into ATS 
> during RM restart.
> -
>
> Key: YARN-6827
> URL: https://issues.apache.org/jira/browse/YARN-6827
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>
> While recovering application, it is observed that NPE exception is thrown as 
> below.
> {noformat}
> 017-07-13 14:08:12,476 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher:
>  Error when publishing entity 
> [YARN_APPLICATION,application_1499929227397_0001]
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368)
> {noformat}
> This is because in RM service creation, active services are created first and 
> later ATS services are created. It means active services are started and ATS 
> services are started later point of time. 
> This gives sufficient time to active services recover the applications which 
> tries to publish into ATS while recovering. Since ATS services are not 
> started yet, it throws NPE. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6827) NPE exception while publishing recovering applications into ATS during RM restart.

2017-07-14 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-6827:
---

 Summary: NPE exception while publishing recovering applications 
into ATS during RM restart.
 Key: YARN-6827
 URL: https://issues.apache.org/jira/browse/YARN-6827
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S


While recovering application, it is observed that NPE exception is thrown as 
below.
{noformat}
017-07-13 14:08:12,476 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher:
 Error when publishing entity [YARN_APPLICATION,application_1499929227397_0001]
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368)
{noformat}

This is because in RM service creation, active services are created first and 
later ATS services are created. It means active services are started and ATS 
services are started later point of time. 

This gives sufficient time to active services recover the applications which 
tries to publish into ATS while recovering. Since ATS services are not started 
yet, it throws NPE. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6741) Deleting all children of a Parent Queue on refresh throws exception

2017-07-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088465#comment-16088465
 ] 

Naganarasimha G R commented on YARN-6741:
-

Thanks [~wangda], for looking into this issue. 
Have updated the description, and below is the exception trace when only the 
testcase is run.
{code}
2017-07-15 10:39:24,483 INFO  [main] capacity.ParentQueue 
(ParentQueue.java:reinitialize(335)) - root: re-configured queue: a: 
numChildQueue= 2, capacity=0.105, absoluteCapacity=0.105, 
usedResources=usedCapacity=0.0, numApps=0, numContainers=0
java.io.IOException: Failed to re-init queues : Trying to reinitialize root.b 
from root.b
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:405)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler.testRefreshQueuesWithAllChildQueuesDeleted(TestCapacityScheduler.java:4584)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
{code}

> Deleting all children of a Parent Queue on refresh throws exception
> ---
>
> Key: YARN-6741
> URL: https://issues.apache.org/jira/browse/YARN-6741
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0-alpha3
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-6741.001.patch, YARN-6741.002.patch
>
>
> If we configure CS such that all  children of a parent queue are deleted and 
> made as a leaf queue, then {{refreshQueue}} operation fails when 
> re-initializing the parent Queue
> {code}
>// Sanity check
>   if (!(newlyParsedQueue instanceof ParentQueue) || !newlyParsedQueue
>   .getQueuePath().equals(getQueuePath())) {
> throw new IOException(
> "Trying to reinitialize " + getQueuePath() + " from "
> + newlyParsedQueue.getQueuePath());
>   }
> {code}
> *Expected Behavior:*
> Converting a Parent Queue to leafQueue on refreshQueue needs to be supported.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6741) Deleting all children of a Parent Queue on refresh throws exception

2017-07-14 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-6741:

Description: 
If we configure CS such that all  children of a parent queue are deleted and 
made as a leaf queue, then {{refreshQueue}} operation fails when 
re-initializing the parent Queue
{code}
   // Sanity check
  if (!(newlyParsedQueue instanceof ParentQueue) || !newlyParsedQueue
  .getQueuePath().equals(getQueuePath())) {
throw new IOException(
"Trying to reinitialize " + getQueuePath() + " from "
+ newlyParsedQueue.getQueuePath());
  }
{code}
*Expected Behavior:*
Converting a Parent Queue to leafQueue on refreshQueue needs to be supported.

  was:If we configure CS such that all  children of a parent queue are deleted 
and made as a leaf queue, then {{refreshQueue}} operation fails when validating 
the parent


> Deleting all children of a Parent Queue on refresh throws exception
> ---
>
> Key: YARN-6741
> URL: https://issues.apache.org/jira/browse/YARN-6741
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0-alpha3
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-6741.001.patch, YARN-6741.002.patch
>
>
> If we configure CS such that all  children of a parent queue are deleted and 
> made as a leaf queue, then {{refreshQueue}} operation fails when 
> re-initializing the parent Queue
> {code}
>// Sanity check
>   if (!(newlyParsedQueue instanceof ParentQueue) || !newlyParsedQueue
>   .getQueuePath().equals(getQueuePath())) {
> throw new IOException(
> "Trying to reinitialize " + getQueuePath() + " from "
> + newlyParsedQueue.getQueuePath());
>   }
> {code}
> *Expected Behavior:*
> Converting a Parent Queue to leafQueue on refreshQueue needs to be supported.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6741) Deleting all children of a Parent Queue on refresh throws exception

2017-07-14 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-6741:

Description: If we configure CS such that all  children of a parent queue 
are deleted and made as a leaf queue, then {{refreshQueue}} operation fails 
when validating the parent  (was: If we configure CS such that all  children of 
a parent queue are deleted and made as a leaf queue, then {{refreshQueue}} 
operation fails.)

> Deleting all children of a Parent Queue on refresh throws exception
> ---
>
> Key: YARN-6741
> URL: https://issues.apache.org/jira/browse/YARN-6741
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0-alpha3
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-6741.001.patch, YARN-6741.002.patch
>
>
> If we configure CS such that all  children of a parent queue are deleted and 
> made as a leaf queue, then {{refreshQueue}} operation fails when validating 
> the parent



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4455) Support fetching metrics by time range

2017-07-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088442#comment-16088442
 ] 

Rohith Sharma K S commented on YARN-4455:
-

I was just thinking on REST API filters metricstimestart and metricstimeend 
added that these values to be given in millis seconds right? Doesn't it 
complication for users? Let say, If user want to retrieve time series range 
last 24 hours, user need to convert yesterdays time to millis seconds and pass 
it on. 
Thoughts? cc:/ [~vrushalic] [~varun_saxena]

> Support fetching metrics by time range
> --
>
> Key: YARN-4455
> URL: https://issues.apache.org/jira/browse/YARN-4455
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355, yarn-5355-merge-blocker
> Attachments: YARN-4455-YARN-5355.01.patch, 
> YARN-4455-YARN-5355.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5412) Create a proxy chain for ResourceManager REST API in the Router

2017-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088371#comment-16088371
 ] 

Hadoop QA commented on YARN-5412:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-2915 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
28s{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
44s{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
25s{color} | {color:green} YARN-2915 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
24s{color} | {color:red} hadoop-yarn-server-router in YARN-2915 failed. {color} 
|
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
54s{color} | {color:green} YARN-2915 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
20s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 54s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 4 new + 231 unchanged - 0 fixed = 235 total (was 231) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
5s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
23s{color} | {color:red} hadoop-yarn-server-router in the patch failed. {color} 
|
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
21s{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router 
generated 17 new + 0 unchanged - 0 fixed = 17 total (was 0) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
33s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
34s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 45m  8s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
4s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m 34s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher |
|   | 
hadoop.yarn.server.resourcemanager.federation.TestFederationRMStateStoreService 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-5412 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12877418/YARN-5412-YARN-2915.4.patch
 |
| Optional Tests |  asflicense  compile  

[jira] [Commented] (YARN-6704) Add Federation Interceptor restart when work preserving NM is enabled

2017-07-14 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088357#comment-16088357
 ] 

Subru Krishnan commented on YARN-6704:
--

Thanks [~botong] for the patch. I looked and have a few questions.

What's the rationale behind refactoring of _createAndRegisterNewUAM_ to 
_launchUAM_? Additionally why would a client launch a UAM by already providing 
the _UnmanagedAMIdentifier_? I don't even see the _UnmanagedAMIdentifier_ being 
used anywhere.

I feel we shouldn't be persisting the running containers in the NM store:
* This explodes the complexity from O(apps) --> O(apps*containers).
* Moreover, the running containers are recoverable as they are part of the 
{{RegisterApplicationMasterResponse}}.

A nit: there seems to be multiple looping through Map data structures (even in 
tests) when it looks like a direct lookup by key is possible.

> Add Federation Interceptor restart when work preserving NM is enabled
> -
>
> Key: YARN-6704
> URL: https://issues.apache.org/jira/browse/YARN-6704
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
> Attachments: YARN-6704-YARN-2915.v1.patch, 
> YARN-6704-YARN-2915.v2.patch
>
>
> YARN-1336 added the ability to restart NM without loosing any running 
> containers. {{AMRMProxy}} restart is added in YARN-6127. In a Federated YARN 
> environment, there's additional state in the {{FederationInterceptor}} to 
> allow for spanning across multiple sub-clusters, so we need to enhance 
> {{FederationInterceptor}} to support work-preserving restart.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6307) Refactor FairShareComparator#compare

2017-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088355#comment-16088355
 ] 

Hadoop QA commented on YARN-6307:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 1 unchanged - 6 fixed = 1 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 42m 37s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 58s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6307 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12877419/YARN-6307.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux d8930868 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / f413ee3 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/16454/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16454/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 

[jira] [Updated] (YARN-6307) Refactor FairShareComparator#compare

2017-07-14 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6307:
---
Attachment: YARN-6307.002.patch

Rebase it since YARN-6769 was in.

> Refactor FairShareComparator#compare
> 
>
> Key: YARN-6307
> URL: https://issues.apache.org/jira/browse/YARN-6307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6307.001.patch, YARN-6307.002.patch
>
>
> The method does three things: compare the min share usage, compare fair share 
> usage by checking weight ratio, break tied by submit time and name. They are 
> mixed with each other which is not easy to read and maintenance, poor style. 
> Additionally, there are potential performance issues, like no need to check 
> weight ratio if minShare usage comparison already indicate the order. It is 
> worth to improve considering huge amount invokings in scheduler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6818) User limit per partition is not honored in branch-2.7 >=

2017-07-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088310#comment-16088310
 ] 

Naganarasimha G R commented on YARN-6818:
-

Thanks [~sunilg] for the reviews and [~jhung] for working on it.
+1 LGTM, patch can be committed.

> User limit per partition is not honored in branch-2.7 >=
> 
>
> Key: YARN-6818
> URL: https://issues.apache.org/jira/browse/YARN-6818
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>  Labels: release-blocker
> Attachments: YARN-6818-branch-2.7.001.patch, 
> YARN-6818-branch-2.7.002.patch
>
>
> We are seeing an issue where user limit factor does not cap the amount of 
> resources a user can consume in a queue in a partition. Suppose you have a 
> queue with access to partition X, used resources in default partition is 0, 
> and used resources in partition X is at the partition's user limit. This is 
> the problematic code as far as I can tell: (in LeafQueue.java){noformat}
> if (Resources
> .greaterThan(resourceCalculator, clusterResource,
> user.getUsed(label),
> limit)) {
>   // if enabled, check to see if could we potentially use this node 
> instead
>   // of a reserved node if the application has reserved containers
>   if (this.reservationsContinueLooking) {
> if (Resources.lessThanOrEqual(
> resourceCalculator,
> clusterResource,
> Resources.subtract(user.getUsed(), 
> application.getCurrentReservation()),
> limit)) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("User " + userName + " in queue " + getQueueName()
> + " will exceed limit based on reservations - " + " consumed: 
> "
> + user.getUsed() + " reserved: "
> + application.getCurrentReservation() + " limit: " + limit);
>   }
>   Resource amountNeededToUnreserve = 
> Resources.subtract(user.getUsed(label), limit);
>   // we can only acquire a new container if we unreserve first since 
> we ignored the
>   // user limit. Choose the max of user limit or what was previously 
> set by max
>   // capacity.
>   
> currentResoureLimits.setAmountNeededUnreserve(Resources.max(resourceCalculator,
>   clusterResource, 
> currentResoureLimits.getAmountNeededUnreserve(),
>   amountNeededToUnreserve));
>   return true;
> }
>   }
>   if (LOG.isDebugEnabled()) {
> LOG.debug("User " + userName + " in queue " + getQueueName()
> + " will exceed limit - " + " consumed: "
> + user.getUsed() + " limit: " + limit);
>   }
>   return false;
> }
> {noformat}
> First it sees the used resources in partition X is greater than partition's 
> user limit. Then the reservation check also succeeds because it is checking 
> {{user.getUsed() - application.getCurrentReservation() <= limit}} and returns 
> true.
> One fix is to just set {{Resources.subtract(user.getUsed(), 
> application.getCurrentReservation())}} to 
> {{Resources.subtract(user.getUsed(label), 
> application.getCurrentReservation())}}.
> This doesn't seem to be a problem in branch-2.8 and higher since YARN-3356 
> introduces this check: {noformat}  if (this.reservationsContinueLooking 
> && checkReservations
>   && label.equals(CommonNodeLabelsManager.NO_LABEL)) {{noformat}
> so in this case getting the used resources in default partition seems to be 
> correct.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5412) Create a proxy chain for ResourceManager REST API in the Router

2017-07-14 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088309#comment-16088309
 ] 

Giovanni Matteo Fumarola commented on YARN-5412:


Attached v4. It addresses the remaining Yetus warnings and it is a full rebase 
after YARN-6792 and YARN-6280.

> Create a proxy chain for ResourceManager REST API in the Router
> ---
>
> Key: YARN-5412
> URL: https://issues.apache.org/jira/browse/YARN-5412
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-5412-YARN-2915.1.patch, 
> YARN-5412-YARN-2915.2.patch, YARN-5412-YARN-2915.3.patch, 
> YARN-5412-YARN-2915.4.patch, YARN-5412-YARN-2915.proto.patch
>
>
> As detailed in the proposal in the umbrella JIRA, we are introducing a new 
> component that routes client request to appropriate ResourceManager(s). This 
> JIRA tracks the creation of a proxy for ResourceManager REST API in the 
> Router. This provides a placeholder for:
> 1) throttling mis-behaving clients (YARN-1546)
> 3) mask the access to multiple RMs (YARN-3659)
> We are planning to follow the interceptor pattern like we did in YARN-2884 to 
> generalize the approach and have only dynamically coupling for Federation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5412) Create a proxy chain for ResourceManager REST API in the Router

2017-07-14 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-5412:
---
Attachment: YARN-5412-YARN-2915.4.patch

> Create a proxy chain for ResourceManager REST API in the Router
> ---
>
> Key: YARN-5412
> URL: https://issues.apache.org/jira/browse/YARN-5412
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-5412-YARN-2915.1.patch, 
> YARN-5412-YARN-2915.2.patch, YARN-5412-YARN-2915.3.patch, 
> YARN-5412-YARN-2915.4.patch, YARN-5412-YARN-2915.proto.patch
>
>
> As detailed in the proposal in the umbrella JIRA, we are introducing a new 
> component that routes client request to appropriate ResourceManager(s). This 
> JIRA tracks the creation of a proxy for ResourceManager REST API in the 
> Router. This provides a placeholder for:
> 1) throttling mis-behaving clients (YARN-1546)
> 3) mask the access to multiple RMs (YARN-3659)
> We are planning to follow the interceptor pattern like we did in YARN-2884 to 
> generalize the approach and have only dynamically coupling for Federation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6223) [Umbrella] Natively support GPU configuration/discovery/scheduling/isolation on YARN

2017-07-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-6223:
-
Attachment: YARN-6223.wip.3.patch

Attached ver.3 patch, make container-executor code changes better organized. 

Work based on YARN-6033, so now all test cases are based on gtest.

TODO: put all configuration items into sections.

> [Umbrella] Natively support GPU configuration/discovery/scheduling/isolation 
> on YARN
> 
>
> Key: YARN-6223
> URL: https://issues.apache.org/jira/browse/YARN-6223
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6223.Natively-support-GPU-on-YARN-v1.pdf, 
> YARN-6223.wip.1.patch, YARN-6223.wip.2.patch, YARN-6223.wip.3.patch
>
>
> As varieties of workloads are moving to YARN, including machine learning / 
> deep learning which can speed up by leveraging GPU computation power. 
> Workloads should be able to request GPU from YARN as simple as CPU and memory.
> *To make a complete GPU story, we should support following pieces:*
> 1) GPU discovery/configuration: Admin can either config GPU resources and 
> architectures on each node, or more advanced, NodeManager can automatically 
> discover GPU resources and architectures and report to ResourceManager 
> 2) GPU scheduling: YARN scheduler should account GPU as a resource type just 
> like CPU and memory.
> 3) GPU isolation/monitoring: once launch a task with GPU resources, 
> NodeManager should properly isolate and monitor task's resource usage.
> For #2, YARN-3926 can support it natively. For #3, YARN-3611 has introduced 
> an extensible framework to support isolation for different resource types and 
> different runtimes.
> *Related JIRAs:*
> There're a couple of JIRAs (YARN-4122/YARN-5517) filed with similar goals but 
> different solutions:
> For scheduling:
> - YARN-4122/YARN-5517 are all adding a new GPU resource type to Resource 
> protocol instead of leveraging YARN-3926.
> For isolation:
> - And YARN-4122 proposed to use CGroups to do isolation which cannot solve 
> the problem listed at 
> https://github.com/NVIDIA/nvidia-docker/wiki/GPU-isolation#challenges such as 
> minor device number mapping; load nvidia_uvm module; mismatch of CUDA/driver 
> versions, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6775) CapacityScheduler: Improvements to assignContainers, avoid unnecessary canAssignToUser/Queue calls

2017-07-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088169#comment-16088169
 ] 

Wangda Tan commented on YARN-6775:
--

[~nroberts], there're some conflicts to apply patches to branch-2 (maybe 2.8 as 
well), do you mind to take a look?

> CapacityScheduler: Improvements to assignContainers, avoid unnecessary 
> canAssignToUser/Queue calls
> --
>
> Key: YARN-6775
> URL: https://issues.apache.org/jira/browse/YARN-6775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha3
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6775.001.patch, YARN-6775.002.patch, 
> YARN-6775.branch-2.002.patch, YARN-6775.branch-2.8.002.patch
>
>
> There are several things in assignContainers() that are done multiple times 
> even though the result cannot change (canAssignToUser, canAssignToQueue). Add 
> some local caching to take advantage of this fact.
> Will post patch shortly. Patch includes a simple throughput test that 
> demonstrates when we have users at their user-limit, the number of 
> NodeUpdateSchedulerEvents we can process can be improved from 13K/sec to 
> 50K/sec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6741) Deleting all children of a Parent Queue on refresh throws exception

2017-07-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088153#comment-16088153
 ] 

Wangda Tan commented on YARN-6741:
--

[~naganarasimha...@apache.org], Could you add more description of what is 
existing behavior and what's the expected behavior.

cc: [~sunilg].

> Deleting all children of a Parent Queue on refresh throws exception
> ---
>
> Key: YARN-6741
> URL: https://issues.apache.org/jira/browse/YARN-6741
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0-alpha3
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-6741.001.patch, YARN-6741.002.patch
>
>
> If we configure CS such that all  children of a parent queue are deleted and 
> made as a leaf queue, then {{refreshQueue}} operation fails.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6775) CapacityScheduler: Improvements to assignContainers, avoid unnecessary canAssignToUser/Queue calls

2017-07-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088155#comment-16088155
 ] 

Wangda Tan commented on YARN-6775:
--

TestCapacityScheduler failures are tracked by: YARN-6240, will commit patch 
shortly.

> CapacityScheduler: Improvements to assignContainers, avoid unnecessary 
> canAssignToUser/Queue calls
> --
>
> Key: YARN-6775
> URL: https://issues.apache.org/jira/browse/YARN-6775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha3
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6775.001.patch, YARN-6775.002.patch, 
> YARN-6775.branch-2.002.patch, YARN-6775.branch-2.8.002.patch
>
>
> There are several things in assignContainers() that are done multiple times 
> even though the result cannot change (canAssignToUser, canAssignToQueue). Add 
> some local caching to take advantage of this fact.
> Will post patch shortly. Patch includes a simple throughput test that 
> demonstrates when we have users at their user-limit, the number of 
> NodeUpdateSchedulerEvents we can process can be improved from 13K/sec to 
> 50K/sec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment

2017-07-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088145#comment-16088145
 ] 

Hudson commented on YARN-6625:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12011 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/12011/])
YARN-6625. yarn application -list returns a tracking URL for AM that (yufei: 
rev 9e0cde1469b8ffeb59619c64d6ece86b62424f04)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmIpFilter.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/amfilter/TestAmFilter.java


> yarn application -list returns a tracking URL for AM that doesn't work in 
> secured and HA environment
> 
>
> Key: YARN-6625
> URL: https://issues.apache.org/jira/browse/YARN-6625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy
>Affects Versions: 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6625.001.patch, YARN-6625.002.patch, 
> YARN-6625.003.patch, YARN-6625.004.patch
>
>
> The tracking URL given at the command line should work secured or not. The 
> tracking URLs are like http://node-2.abc.com:47014 and AM web server supposed 
> to redirect it to a RM address like this 
> http://node-1.abc.com:8088/proxy/application_1494544954891_0002/, but it 
> fails to do that because the connection is rejected when AM is talking to RM 
> admin service to get HA status.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6808) Allow Schedulers to return OPPORTUNISTIC containers when queues go over configured capacity

2017-07-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088136#comment-16088136
 ] 

Wangda Tan commented on YARN-6808:
--

[~asuresh], 

If it uses main scheduler, it will be much better than creating a separate 
scheduler.

Regarding to feature development, since it is in test phases, and there could 
be other changes required, I think this should be done in a branch and merge 
back until it gets more stable and improvements can be validated.

> Allow Schedulers to return OPPORTUNISTIC containers when queues go over 
> configured capacity
> ---
>
> Key: YARN-6808
> URL: https://issues.apache.org/jira/browse/YARN-6808
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-6808.001.patch, YARN-6808.002.patch
>
>
> This is based on discussions with [~kasha] and [~kkaranasos].
> Currently, when a Queues goes over capacity, apps on starved queues must wait 
> either for containers to complete or for them to be pre-empted by the 
> scheduler to get resources.
> This JIRA proposes to allow Schedulers to:
> # Allocate all containers over the configured queue capacity/weight as 
> OPPORTUNISTIC.
> # Auto-promote running OPPORTUNISTIC containers of apps as and when their 
> GUARANTEED containers complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment

2017-07-14 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088138#comment-16088138
 ] 

Yufei Gu commented on YARN-6625:


Thanks [~rkanter] for the review and lots of off-line discussion.

> yarn application -list returns a tracking URL for AM that doesn't work in 
> secured and HA environment
> 
>
> Key: YARN-6625
> URL: https://issues.apache.org/jira/browse/YARN-6625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy
>Affects Versions: 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6625.001.patch, YARN-6625.002.patch, 
> YARN-6625.003.patch, YARN-6625.004.patch
>
>
> The tracking URL given at the command line should work secured or not. The 
> tracking URLs are like http://node-2.abc.com:47014 and AM web server supposed 
> to redirect it to a RM address like this 
> http://node-1.abc.com:8088/proxy/application_1494544954891_0002/, but it 
> fails to do that because the connection is rejected when AM is talking to RM 
> admin service to get HA status.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6243) CapacityScheduler: canAssignToThisQueue should be called only when necessary

2017-07-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-6243.
--
Resolution: Duplicate

Duplicated to YARN-6775

> CapacityScheduler: canAssignToThisQueue should be called only when necessary
> 
>
> Key: YARN-6243
> URL: https://issues.apache.org/jira/browse/YARN-6243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>
> Now canAssignToThisQueue is called when iterating apps in a LeafQueue, we 
> should optimize this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088085#comment-16088085
 ] 

Ray Chiang edited comment on YARN-6798 at 7/14/17 9:17 PM:
---

Updating the version table:

|| Patch || LevelDBKey(s) || Hadoop Versions || Commit Date || NM LevelDB 
Version ||
| YARN-5049 | queued | (2.9.0, 3.0.0-alpha1) | May 11, 2016 | 1.2 |
| YARN-6127 | AMRMProxy/NextMasterKey | (2.9.0, 3.0.0-alpha4) | June 22, 2017 | 
1.1 |


was (Author: rchiang):
Updating the version table:

|| Patch || LevelDBKey(s) || Hadoop Versions || Commit Date || NM LevelDB 
Version ||
| YARN-5049 | queued | 3.0.0-alpha1 | May 11, 2016 | 1.2 |
| YARN-6127 | AMRMProxy/NextMasterKey | (2.9.0, 3.0.0-alpha4) | June 22, 2017 | 
1.1 |

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Botong Huang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-14 Thread Botong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088088#comment-16088088
 ] 

Botong Huang commented on YARN-6798:


Sounds good, thx!

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Botong Huang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088085#comment-16088085
 ] 

Ray Chiang commented on YARN-6798:
--

Updating the version table:

|| Patch || LevelDBKey(s) || Hadoop Versions || Commit Date || NM LevelDB 
Version ||
| YARN-5049 | queued | 3.0.0-alpha1 | May 11, 2016 | 1.2 |
| YARN-6127 | AMRMProxy/NextMasterKey | (2.9.0, 3.0.0-alpha4) | June 22, 2017 | 
1.1 |

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Botong Huang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088081#comment-16088081
 ] 

Ray Chiang commented on YARN-6798:
--

Thanks [~asuresh]!  [~botong], it looks like we'll use 1.2 as our current 
version.

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Botong Huang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-14 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang reassigned YARN-6798:


Assignee: Botong Huang  (was: Ray Chiang)

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Botong Huang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment

2017-07-14 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088062#comment-16088062
 ] 

Robert Kanter commented on YARN-6625:
-

+1

> yarn application -list returns a tracking URL for AM that doesn't work in 
> secured and HA environment
> 
>
> Key: YARN-6625
> URL: https://issues.apache.org/jira/browse/YARN-6625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy
>Affects Versions: 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6625.001.patch, YARN-6625.002.patch, 
> YARN-6625.003.patch, YARN-6625.004.patch
>
>
> The tracking URL given at the command line should work secured or not. The 
> tracking URLs are like http://node-2.abc.com:47014 and AM web server supposed 
> to redirect it to a RM address like this 
> http://node-1.abc.com:8088/proxy/application_1494544954891_0002/, but it 
> fails to do that because the connection is rejected when AM is talking to RM 
> admin service to get HA status.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-14 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088058#comment-16088058
 ] 

Arun Suresh commented on YARN-6798:
---

[~rchiang], I've committed YARN-5049 to branch-2 (cherry-picked and set version 
the version to 1.2)

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5049) Extend NMStateStore to save queued container information

2017-07-14 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5049:
--
Fix Version/s: 2.9.0

> Extend NMStateStore to save queued container information
> 
>
> Key: YARN-5049
> URL: https://issues.apache.org/jira/browse/YARN-5049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-5049.001.patch, YARN-5049.002.patch, 
> YARN-5049.003.patch
>
>
> This JIRA is about extending the NMStateStore to save queued container 
> information whenever a new container is added to the NM queue. 
> It also removes the information from the state store when the queued 
> container starts its execution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5049) Extend NMStateStore to save queued container information

2017-07-14 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh resolved YARN-5049.
---
Resolution: Fixed

Committed to branch-2 as well

> Extend NMStateStore to save queued container information
> 
>
> Key: YARN-5049
> URL: https://issues.apache.org/jira/browse/YARN-5049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5049.001.patch, YARN-5049.002.patch, 
> YARN-5049.003.patch
>
>
> This JIRA is about extending the NMStateStore to save queued container 
> information whenever a new container is added to the NM queue. 
> It also removes the information from the state store when the queued 
> container starts its execution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment

2017-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088052#comment-16088052
 ] 

Hadoop QA commented on YARN-6625:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
28s{color} | {color:green} hadoop-yarn-server-web-proxy in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6625 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12877391/YARN-6625.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux dfeaf03a6734 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / a29fe10 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16452/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy 
U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy 
|
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16452/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> yarn application -list returns a tracking URL for AM that doesn't work in 
> secured and HA environment
> 
>
> Key: YARN-6625
> URL: https://issues.apache.org/jira/browse/YARN-6625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy
>Affects Versions: 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6625.001.patch, YARN-6625.002.patch, 
> 

[jira] [Assigned] (YARN-5049) Extend NMStateStore to save queued container information

2017-07-14 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reassigned YARN-5049:
-

Assignee: Konstantinos Karanasos  (was: Arun Suresh)

> Extend NMStateStore to save queued container information
> 
>
> Key: YARN-5049
> URL: https://issues.apache.org/jira/browse/YARN-5049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5049.001.patch, YARN-5049.002.patch, 
> YARN-5049.003.patch
>
>
> This JIRA is about extending the NMStateStore to save queued container 
> information whenever a new container is added to the NM queue. 
> It also removes the information from the state store when the queued 
> container starts its execution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment

2017-07-14 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6625:
---
Attachment: YARN-6625.004.patch

Thanks [~rkanter]. Uploaded patch v4 for your comments.

> yarn application -list returns a tracking URL for AM that doesn't work in 
> secured and HA environment
> 
>
> Key: YARN-6625
> URL: https://issues.apache.org/jira/browse/YARN-6625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy
>Affects Versions: 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6625.001.patch, YARN-6625.002.patch, 
> YARN-6625.003.patch, YARN-6625.004.patch
>
>
> The tracking URL given at the command line should work secured or not. The 
> tracking URLs are like http://node-2.abc.com:47014 and AM web server supposed 
> to redirect it to a RM address like this 
> http://node-1.abc.com:8088/proxy/application_1494544954891_0002/, but it 
> fails to do that because the connection is rejected when AM is talking to RM 
> admin service to get HA status.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088015#comment-16088015
 ] 

Ray Chiang commented on YARN-6798:
--

Thanks [~asuresh].  That's what I get for relying on JIRA and forgetting to 
check git.

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6818) User limit per partition is not honored in branch-2.7 >=

2017-07-14 Thread Jonathan Hung (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-6818:

Labels: release-blocker  (was: blocker)

> User limit per partition is not honored in branch-2.7 >=
> 
>
> Key: YARN-6818
> URL: https://issues.apache.org/jira/browse/YARN-6818
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>  Labels: release-blocker
> Attachments: YARN-6818-branch-2.7.001.patch, 
> YARN-6818-branch-2.7.002.patch
>
>
> We are seeing an issue where user limit factor does not cap the amount of 
> resources a user can consume in a queue in a partition. Suppose you have a 
> queue with access to partition X, used resources in default partition is 0, 
> and used resources in partition X is at the partition's user limit. This is 
> the problematic code as far as I can tell: (in LeafQueue.java){noformat}
> if (Resources
> .greaterThan(resourceCalculator, clusterResource,
> user.getUsed(label),
> limit)) {
>   // if enabled, check to see if could we potentially use this node 
> instead
>   // of a reserved node if the application has reserved containers
>   if (this.reservationsContinueLooking) {
> if (Resources.lessThanOrEqual(
> resourceCalculator,
> clusterResource,
> Resources.subtract(user.getUsed(), 
> application.getCurrentReservation()),
> limit)) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("User " + userName + " in queue " + getQueueName()
> + " will exceed limit based on reservations - " + " consumed: 
> "
> + user.getUsed() + " reserved: "
> + application.getCurrentReservation() + " limit: " + limit);
>   }
>   Resource amountNeededToUnreserve = 
> Resources.subtract(user.getUsed(label), limit);
>   // we can only acquire a new container if we unreserve first since 
> we ignored the
>   // user limit. Choose the max of user limit or what was previously 
> set by max
>   // capacity.
>   
> currentResoureLimits.setAmountNeededUnreserve(Resources.max(resourceCalculator,
>   clusterResource, 
> currentResoureLimits.getAmountNeededUnreserve(),
>   amountNeededToUnreserve));
>   return true;
> }
>   }
>   if (LOG.isDebugEnabled()) {
> LOG.debug("User " + userName + " in queue " + getQueueName()
> + " will exceed limit - " + " consumed: "
> + user.getUsed() + " limit: " + limit);
>   }
>   return false;
> }
> {noformat}
> First it sees the used resources in partition X is greater than partition's 
> user limit. Then the reservation check also succeeds because it is checking 
> {{user.getUsed() - application.getCurrentReservation() <= limit}} and returns 
> true.
> One fix is to just set {{Resources.subtract(user.getUsed(), 
> application.getCurrentReservation())}} to 
> {{Resources.subtract(user.getUsed(label), 
> application.getCurrentReservation())}}.
> This doesn't seem to be a problem in branch-2.8 and higher since YARN-3356 
> introduces this check: {noformat}  if (this.reservationsContinueLooking 
> && checkReservations
>   && label.equals(CommonNodeLabelsManager.NO_LABEL)) {{noformat}
> so in this case getting the used resources in default partition seems to be 
> correct.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6818) User limit per partition is not honored in branch-2.7 >=

2017-07-14 Thread Jonathan Hung (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-6818:

Labels: blocker  (was: )

> User limit per partition is not honored in branch-2.7 >=
> 
>
> Key: YARN-6818
> URL: https://issues.apache.org/jira/browse/YARN-6818
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>  Labels: release-blocker
> Attachments: YARN-6818-branch-2.7.001.patch, 
> YARN-6818-branch-2.7.002.patch
>
>
> We are seeing an issue where user limit factor does not cap the amount of 
> resources a user can consume in a queue in a partition. Suppose you have a 
> queue with access to partition X, used resources in default partition is 0, 
> and used resources in partition X is at the partition's user limit. This is 
> the problematic code as far as I can tell: (in LeafQueue.java){noformat}
> if (Resources
> .greaterThan(resourceCalculator, clusterResource,
> user.getUsed(label),
> limit)) {
>   // if enabled, check to see if could we potentially use this node 
> instead
>   // of a reserved node if the application has reserved containers
>   if (this.reservationsContinueLooking) {
> if (Resources.lessThanOrEqual(
> resourceCalculator,
> clusterResource,
> Resources.subtract(user.getUsed(), 
> application.getCurrentReservation()),
> limit)) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("User " + userName + " in queue " + getQueueName()
> + " will exceed limit based on reservations - " + " consumed: 
> "
> + user.getUsed() + " reserved: "
> + application.getCurrentReservation() + " limit: " + limit);
>   }
>   Resource amountNeededToUnreserve = 
> Resources.subtract(user.getUsed(label), limit);
>   // we can only acquire a new container if we unreserve first since 
> we ignored the
>   // user limit. Choose the max of user limit or what was previously 
> set by max
>   // capacity.
>   
> currentResoureLimits.setAmountNeededUnreserve(Resources.max(resourceCalculator,
>   clusterResource, 
> currentResoureLimits.getAmountNeededUnreserve(),
>   amountNeededToUnreserve));
>   return true;
> }
>   }
>   if (LOG.isDebugEnabled()) {
> LOG.debug("User " + userName + " in queue " + getQueueName()
> + " will exceed limit - " + " consumed: "
> + user.getUsed() + " limit: " + limit);
>   }
>   return false;
> }
> {noformat}
> First it sees the used resources in partition X is greater than partition's 
> user limit. Then the reservation check also succeeds because it is checking 
> {{user.getUsed() - application.getCurrentReservation() <= limit}} and returns 
> true.
> One fix is to just set {{Resources.subtract(user.getUsed(), 
> application.getCurrentReservation())}} to 
> {{Resources.subtract(user.getUsed(label), 
> application.getCurrentReservation())}}.
> This doesn't seem to be a problem in branch-2.8 and higher since YARN-3356 
> introduces this check: {noformat}  if (this.reservationsContinueLooking 
> && checkReservations
>   && label.equals(CommonNodeLabelsManager.NO_LABEL)) {{noformat}
> so in this case getting the used resources in default partition seems to be 
> correct.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3260) AM attempt fail to register before RM processes launch event

2017-07-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087996#comment-16087996
 ] 

Hudson commented on YARN-3260:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12008 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/12008/])
YARN-3260. AM attempt fail to register before RM processes launch event. 
(jlowe: rev a5ae5ac50e97cf829c41dcf01655cd9bd4d36a00)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


> AM attempt fail to register before RM processes launch event
> 
>
> Key: YARN-3260
> URL: https://issues.apache.org/jira/browse/YARN-3260
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.9.0, 2.7.4, 3.0.0-beta1, 2.8.2
>
> Attachments: YARN-3260.001.patch
>
>
> The RM on one of our clusters was running behind on processing 
> AsyncDispatcher events, and this caused AMs to fail to register due to an 
> NPE.  The AM was launched and attempting to register before the 
> RMAppAttemptImpl had processed the LAUNCHED event, and the client to AM token 
> had not been generated yet.  The NPE occurred because the 
> ApplicationMasterService tried to encode the missing token.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment

2017-07-14 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087971#comment-16087971
 ] 

Robert Kanter commented on YARN-6625:
-

Two last minor things and then we're good to go:
# The {{isValidUrl}} method can be more simpler/compact - something like this:
{code:java}
private boolean isValidUrl(String url) {
  boolean isValid = false;
  try {
HttpURLConnection conn =
  (HttpURLConnection) new URL(url).openConnection();
conn.connect();
isValid = (conn.getResponseCode() == HttpURLConnection.HTTP_OK);
  } catch (Exception e) {
LOG.debug("Failed to connect to " + url + ": " + e.toString());
  }
  return isValid;
}
{code}
# Remove the * imports from TestAmFilter

> yarn application -list returns a tracking URL for AM that doesn't work in 
> secured and HA environment
> 
>
> Key: YARN-6625
> URL: https://issues.apache.org/jira/browse/YARN-6625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy
>Affects Versions: 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6625.001.patch, YARN-6625.002.patch, 
> YARN-6625.003.patch
>
>
> The tracking URL given at the command line should work secured or not. The 
> tracking URLs are like http://node-2.abc.com:47014 and AM web server supposed 
> to redirect it to a RM address like this 
> http://node-1.abc.com:8088/proxy/application_1494544954891_0002/, but it 
> fails to do that because the connection is rejected when AM is talking to RM 
> admin service to get HA status.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-5049) Extend NMStateStore to save queued container information

2017-07-14 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reopened YARN-5049:
---
  Assignee: Arun Suresh  (was: Konstantinos Karanasos)

> Extend NMStateStore to save queued container information
> 
>
> Key: YARN-5049
> URL: https://issues.apache.org/jira/browse/YARN-5049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5049.001.patch, YARN-5049.002.patch, 
> YARN-5049.003.patch
>
>
> This JIRA is about extending the NMStateStore to save queued container 
> information whenever a new container is added to the NM queue. 
> It also removes the information from the state store when the queued 
> container starts its execution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5049) Extend NMStateStore to save queued container information

2017-07-14 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087964#comment-16087964
 ] 

Arun Suresh commented on YARN-5049:
---

Re-opening for branch-2 commit

> Extend NMStateStore to save queued container information
> 
>
> Key: YARN-5049
> URL: https://issues.apache.org/jira/browse/YARN-5049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5049.001.patch, YARN-5049.002.patch, 
> YARN-5049.003.patch
>
>
> This JIRA is about extending the NMStateStore to save queued container 
> information whenever a new container is added to the NM queue. 
> It also removes the information from the state store when the queued 
> container starts its execution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-14 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087965#comment-16087965
 ] 

Arun Suresh commented on YARN-6798:
---

I had rolled back YARN-5049 from branch-2 precisely because of the major 
version bump. Will cherry-pick and update it today with a minor version bump. 
You can mark this

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6471) Support to add min/max resource configuration for a queue

2017-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087940#comment-16087940
 ] 

Hadoop QA commented on YARN-6471:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-5881 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
31s{color} | {color:green} YARN-5881 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
47s{color} | {color:green} YARN-5881 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
11s{color} | {color:green} YARN-5881 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
0s{color} | {color:green} YARN-5881 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
47s{color} | {color:green} YARN-5881 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
27s{color} | {color:green} YARN-5881 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m  
3s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 15s{color} | {color:orange} root: The patch generated 68 new + 1194 
unchanged - 12 fixed = 1262 total (was 1206) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
49s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 23s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
2s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
35s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 49s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}161m 10s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Boxing/unboxing to parse a primitive 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.updateResourceValuesFromConfig(Set,
 Resource, String[])  At 
CapacitySchedulerConfiguration.java:org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.updateResourceValuesFromConfig(Set,
 Resource, String[])  At CapacitySchedulerConfiguration.java:[line 1604] |
| Failed junit tests | hadoop.fs.TestLocalDirAllocator |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue |
|   | 

[jira] [Commented] (YARN-6788) Improve performance of resource profile branch

2017-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087910#comment-16087910
 ] 

Hadoop QA commented on YARN-6788:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} YARN-3926 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
37s{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
46s{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
20s{color} | {color:green} YARN-3926 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
30s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
YARN-3926 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
55s{color} | {color:green} YARN-3926 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
50s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  6s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 20 new + 98 unchanged - 15 fixed = 118 total (was 113) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
36s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
generated 0 new + 0 unchanged - 1 fixed = 0 total (was 1) {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
28s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
47s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  2m 56s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 46s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}134m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.util.resource.TestResources |
|   | hadoop.yarn.util.resource.TestResourceUtils |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterService |
|   | hadoop.yarn.server.resourcemanager.TestAppManager |
|   | 

[jira] [Commented] (YARN-6775) CapacityScheduler: Improvements to assignContainers, avoid unnecessary canAssignToUser/Queue calls

2017-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087851#comment-16087851
 ] 

Hadoop QA commented on YARN-6775:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.8 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
24s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 6 new + 56 unchanged - 1 fixed = 62 total (was 57) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 33s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_131. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}173m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_131 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_131 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:d946387 |
| JIRA Issue | YARN-6775 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12877336/YARN-6775.branch-2.8.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | 

[jira] [Commented] (YARN-6808) Allow Schedulers to return OPPORTUNISTIC containers when queues go over configured capacity

2017-07-14 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087835#comment-16087835
 ] 

Arun Suresh commented on YARN-6808:
---

[~leftnoteasy],

bq. Currently, opportunistic container allocation is too simple to follow what 
existing schedulers can do. For example, delayed scheduling, user limit. You 
might want to say that there's no guarantee for opportunistic container so we 
don't have to follow all of them.
Apologize if I did not articulate it correctly in my earlier comment.. also 
kindly do take a look at the v002 patch again.. The Container allocation is 
still done by the main configured scheduler, Capacity / Fair. It is just 
downgraded to an OC before being sent to the AM. Everything you mentioned: 
delayed scheduling, user limit enforcement etc.. all still happen.

bq. If like you said, use this approach to replace existing preemption, I think 
this patch is far away from the goal.
Agreed that this is probably not complete. For eg. Like I mentioned, we 
probably need to promote OCs to guaranteed containers as and when containers 
complete..

I am by no means claiming feature parity. My intention is to run tests and 
measure user-perceptible metrics - such as Job runtimes / good put and cluster 
utilization. That should give us an objective measure of the viability of the 
feature - dont you agree ?

> Allow Schedulers to return OPPORTUNISTIC containers when queues go over 
> configured capacity
> ---
>
> Key: YARN-6808
> URL: https://issues.apache.org/jira/browse/YARN-6808
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-6808.001.patch, YARN-6808.002.patch
>
>
> This is based on discussions with [~kasha] and [~kkaranasos].
> Currently, when a Queues goes over capacity, apps on starved queues must wait 
> either for containers to complete or for them to be pre-empted by the 
> scheduler to get resources.
> This JIRA proposes to allow Schedulers to:
> # Allocate all containers over the configured queue capacity/weight as 
> OPPORTUNISTIC.
> # Auto-promote running OPPORTUNISTIC containers of apps as and when their 
> GUARANTEED containers complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5146) [YARN-3368] Supports Fair Scheduler in new YARN UI

2017-07-14 Thread Abdullah Yousufi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087824#comment-16087824
 ] 

Abdullah Yousufi edited comment on YARN-5146 at 7/14/17 7:09 PM:
-

Hey [~sunilg], thanks for catching that! How can I reproduce and see this 
error? The classes are empty because they should be extending from the 
adapter/yarn-queue.js.


was (Author: ayousufi):
Hey [~sunilg]], thanks for catching that! How can I reproduce and see this 
error? The classes are empty because they should be extending from the 
adapter/yarn-queue.js.

> [YARN-3368] Supports Fair Scheduler in new YARN UI
> --
>
> Key: YARN-5146
> URL: https://issues.apache.org/jira/browse/YARN-5146
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Abdullah Yousufi
> Attachments: YARN-5146.001.patch, YARN-5146.002.patch, 
> YARN-5146.003.patch
>
>
> Current implementation in branch YARN-3368 only support capacity scheduler,  
> we want to make it support fair scheduler. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5146) [YARN-3368] Supports Fair Scheduler in new YARN UI

2017-07-14 Thread Abdullah Yousufi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087824#comment-16087824
 ] 

Abdullah Yousufi commented on YARN-5146:


Hey [~sunilg]], thanks for catching that! How can I reproduce and see this 
error? The classes are empty because they should be extending from the 
adapter/yarn-queue.js.

> [YARN-3368] Supports Fair Scheduler in new YARN UI
> --
>
> Key: YARN-5146
> URL: https://issues.apache.org/jira/browse/YARN-5146
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Abdullah Yousufi
> Attachments: YARN-5146.001.patch, YARN-5146.002.patch, 
> YARN-5146.003.patch
>
>
> Current implementation in branch YARN-3368 only support capacity scheduler,  
> we want to make it support fair scheduler. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2017-07-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087794#comment-16087794
 ] 

Wangda Tan commented on YARN-1011:
--

Thanks [~kasha] / [~haibo.chen] for the proposal.  

I just left some comments in YARN-6808: 
https://issues.apache.org/jira/browse/YARN-6808?focusedCommentId=16087782=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16087782
 It may related to this feature as well, so could you please take a look when 
you have time? 

I think it might be better to move some JIRAs to a new umbrella: "improve using 
opportunistic container for normal use cases". YARN-1014/YARN-6674 like I 
suggested in YARN-6808.

Thoughts?

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
>Assignee: Karthik Kambatla
> Attachments: patch-for-yarn-1011.patch, yarn-1011-design-v0.pdf, 
> yarn-1011-design-v1.pdf, yarn-1011-design-v2.pdf, yarn-1011-design-v3.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6808) Allow Schedulers to return OPPORTUNISTIC containers when queues go over configured capacity

2017-07-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087782#comment-16087782
 ] 

Wangda Tan edited comment on YARN-6808 at 7/14/17 6:49 PM:
---

Thanks [~asuresh] for the additional explanations.

Frankly speaking, I'm a little bit worried about this direction.

Currently, opportunistic container allocation is too simple to follow what 
existing schedulers can do. For example, delayed scheduling, user limit. You 
might want to say that there's no guarantee for opportunistic container so we 
don't have to follow all of them. This is tree under the context of YARN-2877, 
which is target to run container with short life and don't need any guarantees. 
However, if we move all containers beyond queue's configured capacity to queue 
opportunistic, it is a big deal. I'm not sure if you have any plans regarding 
to this.

One related topic is YARN-1011: Opportunistic containers will be used when node 
is overcommitted. YARN-1013/YARN-1015 are opened to track changes in scheduler 
side. To me it's better to reuse existing scheduler code instead of duplicating 
code to achieve this. I'm not sure is there any concrete plans for this. cc: 
[~haibo.chen], [~kasha].

If like you said, use this approach to replace existing preemption, I think 
this patch is far away from the goal. Even with the latest patch, it can only 
mark containers which are allocated below queue limit and which are allocated 
above queue limit. (Which could be wrong if queue's configured capacity 
changed). In that case, I don't know how it is possible to support existing 
preemption features such as preemption large containers, intra-queue preemption 
for app priority / user limit, etc.

bq. Essentially, the extra delay will look just like localization delay to the 
AM. We have verified this is fine for MapReduce and Spark. 
To me this is too simple to solve the problem. Yes existing MR/Spark are able 
to expect delays when containers are localizing, but in most cases, 
localization happens only once on each node because files need to be localized 
are mostly common between containers from the same app (or same set of task 
like mappers). It looks like in a large cluster, delays need to be expected for 
every OC launch. This is not acceptable for SLA-sensitive jobs.

I don't want to stop or slow down this innovation, but from what I can see, 
existing assumptions are plans need lots of time to be verified. In this case, 
like YARN-1011, do you think is it better to create an umbrella JIRA (use 
opportunistic container to do preemption). And move related works to branch? 

Edit #1:

Probably the new umbrella should be: Use OC for normal containers. (YARN-2877 
are not target to use it for normal containers).


was (Author: leftnoteasy):
Thanks [~asuresh] for the additional explanations.

Frankly speaking, I'm a little bit worried about this direction.

Currently, opportunistic container allocation is too simple to follow what 
existing schedulers can do. For example, delayed scheduling, user limit. You 
might want to say that there's no guarantee for opportunistic container so we 
don't have to follow all of them. This is tree under the context of YARN-2877, 
which is target to run container with short life and don't need any guarantees. 
However, if we move all containers beyond queue's configured capacity to queue 
opportunistic, it is a big deal. I'm not sure if you have any plans regarding 
to this.

One related topic is YARN-1011: Opportunistic containers will be used when node 
is overcommitted. YARN-1013/YARN-1015 are opened to track changes in scheduler 
side. To me it's better to reuse existing scheduler code instead of duplicating 
code to achieve this. I'm not sure is there any concrete plans for this. cc: 
[~haibo.chen], [~kasha].

If like you said, use this approach to replace existing preemption, I think 
this patch is far away from the goal. Even with the latest patch, it can only 
mark containers which are allocated below queue limit and which are allocated 
above queue limit. (Which could be wrong if queue's configured capacity 
changed). In that case, I don't know how it is possible to support existing 
preemption features such as preemption large containers, intra-queue preemption 
for app priority / user limit, etc.

bq. Essentially, the extra delay will look just like localization delay to the 
AM. We have verified this is fine for MapReduce and Spark. 
To me this is too simple to solve the problem. Yes existing MR/Spark are able 
to expect delays when containers are localizing, but in most cases, 
localization happens only once on each node because files need to be localized 
are mostly common between containers from the same app (or same set of task 
like mappers). It looks like in a large cluster, delays need to be expected for 
every OC launch. This is not acceptable for SLA-sensitive jobs.

I 

[jira] [Commented] (YARN-6808) Allow Schedulers to return OPPORTUNISTIC containers when queues go over configured capacity

2017-07-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087782#comment-16087782
 ] 

Wangda Tan commented on YARN-6808:
--

Thanks [~asuresh] for the additional explanations.

Frankly speaking, I'm a little bit worried about this direction.

Currently, opportunistic container allocation is too simple to follow what 
existing schedulers can do. For example, delayed scheduling, user limit. You 
might want to say that there's no guarantee for opportunistic container so we 
don't have to follow all of them. This is tree under the context of YARN-2877, 
which is target to run container with short life and don't need any guarantees. 
However, if we move all containers beyond queue's configured capacity to queue 
opportunistic, it is a big deal. I'm not sure if you have any plans regarding 
to this.

One related topic is YARN-1011: Opportunistic containers will be used when node 
is overcommitted. YARN-1013/YARN-1015 are opened to track changes in scheduler 
side. To me it's better to reuse existing scheduler code instead of duplicating 
code to achieve this. I'm not sure is there any concrete plans for this. cc: 
[~haibo.chen], [~kasha].

If like you said, use this approach to replace existing preemption, I think 
this patch is far away from the goal. Even with the latest patch, it can only 
mark containers which are allocated below queue limit and which are allocated 
above queue limit. (Which could be wrong if queue's configured capacity 
changed). In that case, I don't know how it is possible to support existing 
preemption features such as preemption large containers, intra-queue preemption 
for app priority / user limit, etc.

bq. Essentially, the extra delay will look just like localization delay to the 
AM. We have verified this is fine for MapReduce and Spark. 
To me this is too simple to solve the problem. Yes existing MR/Spark are able 
to expect delays when containers are localizing, but in most cases, 
localization happens only once on each node because files need to be localized 
are mostly common between containers from the same app (or same set of task 
like mappers). It looks like in a large cluster, delays need to be expected for 
every OC launch. This is not acceptable for SLA-sensitive jobs.

I don't want to stop or slow down this innovation, but from what I can see, 
existing assumptions are plans need lots of time to be verified. In this case, 
like YARN-1011, do you think is it better to create an umbrella JIRA (use 
opportunistic container to do preemption). And move related works to branch? 

> Allow Schedulers to return OPPORTUNISTIC containers when queues go over 
> configured capacity
> ---
>
> Key: YARN-6808
> URL: https://issues.apache.org/jira/browse/YARN-6808
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-6808.001.patch, YARN-6808.002.patch
>
>
> This is based on discussions with [~kasha] and [~kkaranasos].
> Currently, when a Queues goes over capacity, apps on starved queues must wait 
> either for containers to complete or for them to be pre-empted by the 
> scheduler to get resources.
> This JIRA proposes to allow Schedulers to:
> # Allocate all containers over the configured queue capacity/weight as 
> OPPORTUNISTIC.
> # Auto-promote running OPPORTUNISTIC containers of apps as and when their 
> GUARANTEED containers complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6771) Use classloader inside configuration class to make new classes

2017-07-14 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087770#comment-16087770
 ] 

Sangjin Lee commented on YARN-6771:
---

Thanks for the contribution [~jongyoul]. It appears that the same pattern 
exists in {{RpcServerFactoryPBImpl}} too. To be consistent, we should change 
both or neither.

That said, these are pretty basic classes that pin all YARN RPC communications. 
I don't see the full history, but it appears that using a clean configuration 
to load these classes was intentional. Could you please explain in bit more 
detail why you'd need to use provide a specific {{Configuration}} instance to 
create RPC clients/servers? Is there a JIRA or issue that describes an issue on 
your end?

> Use classloader inside configuration class to make new classes 
> ---
>
> Key: YARN-6771
> URL: https://issues.apache.org/jira/browse/YARN-6771
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.1, 3.0.0-alpha4
>Reporter: Jongyoul Lee
> Fix For: 2.8.2
>
> Attachments: YARN-6771-1.patch, YARN-6771-2.patch, YARN-6771.patch
>
>
> While running {{RpcClientFactoryPBImpl.getClient}}, 
> {{RpcClientFactoryPBImpl}} uses {{localConf.getClassByName}}. But in case of 
> using custom classloader, we have to use {{conf.getClassByName}} because 
> custom classloader is already stored in {{Configuration}} class.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment

2017-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087760#comment-16087760
 ] 

Hadoop QA commented on YARN-6625:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m  9s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy:
 The patch generated 1 new + 19 unchanged - 0 fixed = 20 total (was 19) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
23s{color} | {color:green} hadoop-yarn-server-web-proxy in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 17m 55s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6625 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12877356/YARN-6625.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 88524ba7e0d2 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 75c0220 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/16451/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-web-proxy.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16451/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy 
U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy 
|
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16451/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> yarn application -list returns a tracking URL for AM that doesn't work in 
> secured and HA environment
> 
>
> Key: YARN-6625
>  

[jira] [Updated] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment

2017-07-14 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6625:
---
Attachment: YARN-6625.003.patch

Thanks [~rkanter] for the review. Uploaded patch v3 for all your comments.

> yarn application -list returns a tracking URL for AM that doesn't work in 
> secured and HA environment
> 
>
> Key: YARN-6625
> URL: https://issues.apache.org/jira/browse/YARN-6625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy
>Affects Versions: 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6625.001.patch, YARN-6625.002.patch, 
> YARN-6625.003.patch
>
>
> The tracking URL given at the command line should work secured or not. The 
> tracking URLs are like http://node-2.abc.com:47014 and AM web server supposed 
> to redirect it to a RM address like this 
> http://node-1.abc.com:8088/proxy/application_1494544954891_0002/, but it 
> fails to do that because the connection is rejected when AM is talking to RM 
> admin service to get HA status.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087719#comment-16087719
 ] 

Ray Chiang commented on YARN-6798:
--

Finally got a bit of time to look at the previous patches.  I see a minor issue.

|| Patch || LevelDBKey(s) || Hadoop Versions || Commit Date ||
| YARN-5049 | queued | 3.0.0-alpha1 | May 11, 2016 |
| YARN-6127 | AMRMProxy/NextMasterKey | (2.9.0, 3.0.0-alpha4) | June 22, 2017 |

So, branch-2 has just YARN-6127, while trunk has YARN-5049 and YARN-6127.  If 
we label YARN-5049 as 1.1 and YARN-6127 as 1.2, then branch-2's having a 1.2 
version won't quite be accurate.  If do the reverse, we'd be chronologically 
backward (which seems okay to me, but I'd like a second opinion).


> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6714) IllegalStateException while handling APP_ATTEMPT_REMOVED event when async-scheduling enabled in CapacityScheduler

2017-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087709#comment-16087709
 ] 

Hadoop QA commented on YARN-6714:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 1s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 45m 56s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_131. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.7.0_131 Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5e40efe |
| JIRA Issue | YARN-6714 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12877334/YARN-6714.branch-2.006.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux a8047908572c 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | branch-2 / af80d10 |
| Default Java | 1.7.0_131 |
| Multi-JDK versions |  

[jira] [Commented] (YARN-6804) Allow custom hostname for docker containers in native services

2017-07-14 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087699#comment-16087699
 ] 

Jian He commented on YARN-6804:
---

bq. Do you think we should remove them all?
ok, I didn't see those. I don't think this is required.  I think anywhere else 
in the codebase is not doing this. It's only done for ifDebugEnabled

> Allow custom hostname for docker containers in native services
> --
>
> Key: YARN-6804
> URL: https://issues.apache.org/jira/browse/YARN-6804
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Fix For: yarn-native-services
>
> Attachments: YARN-6804-yarn-native-services.001.patch, 
> YARN-6804-yarn-native-services.002.patch
>
>
> Instead of the default random docker container hostname, we could set a more 
> user-friendly hostname for the container. The default could be a hostname 
> based on the container ID, with an option for the AM to provide a different 
> hostname. In the case of the native services AM, we could provide the 
> hostname that would be created by the registry DNS server. Regardless of 
> whether or not registry DNS is enabled, this would be a more useful hostname 
> for the docker container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6826) SLS NMSimulator support for Opportunistic Container Queuing

2017-07-14 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-6826:
--
Issue Type: Improvement  (was: Bug)

> SLS NMSimulator support for Opportunistic Container Queuing
> ---
>
> Key: YARN-6826
> URL: https://issues.apache.org/jira/browse/YARN-6826
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6788) Improve performance of resource profile branch

2017-07-14 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-6788:
--
Attachment: YARN-6788-YARN-3926.005.patch

Thanks [~leftnoteasy]

Addressed first set of comments. But I am uploading an interim patch to check 
on jenkins. I ll work on last comments and will update a patch later.

> Improve performance of resource profile branch
> --
>
> Key: YARN-6788
> URL: https://issues.apache.org/jira/browse/YARN-6788
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-6788-YARN-3926.001.patch, 
> YARN-6788-YARN-3926.002.patch, YARN-6788-YARN-3926.003.patch, 
> YARN-6788-YARN-3926.004.patch, YARN-6788-YARN-3926.005.patch
>
>
> Currently we could see a 15% performance delta with this branch. 
> Few performance improvements to improve the same.
> Also this patch will handle 
> [comments|https://issues.apache.org/jira/browse/YARN-6761?focusedCommentId=16075418=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16075418]
>  from [~leftnoteasy].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6826) SLS NMSimulator support for Opportunistic Container Queuing

2017-07-14 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-6826:
--
Summary: SLS NMSimulator support for Opportunistic Container Queuing  (was: 
SLS NMSimulator support for Opportunistic Container Queuing.)

> SLS NMSimulator support for Opportunistic Container Queuing
> ---
>
> Key: YARN-6826
> URL: https://issues.apache.org/jira/browse/YARN-6826
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6826) SLS NMSimulator support for Opportunistic Container Queuing.

2017-07-14 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-6826:
-

 Summary: SLS NMSimulator support for Opportunistic Container 
Queuing.
 Key: YARN-6826
 URL: https://issues.apache.org/jira/browse/YARN-6826
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler-load-simulator
Reporter: Arun Suresh
Assignee: Arun Suresh






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6808) Allow Schedulers to return OPPORTUNISTIC containers when queues go over configured capacity

2017-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087615#comment-16087615
 ] 

Hadoop QA commented on YARN-6808:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 47s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 8 new + 
429 unchanged - 0 fixed = 437 total (was 429) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 47m 18s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 86m 57s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6808 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12877325/YARN-6808.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 484a6ae1b1c3 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 75c0220 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Updated] (YARN-6471) Support to add min/max resource configuration for a queue

2017-07-14 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-6471:
--
Attachment: YARN-6471-YARN-5881.001.patch

Attaching patch against branch and kicking jenkins

> Support to add min/max resource configuration for a queue
> -
>
> Key: YARN-6471
> URL: https://issues.apache.org/jira/browse/YARN-6471
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-6471.001.patch, YARN-6471.002.patch, 
> YARN-6471.003.patch, YARN-6471.004.patch, YARN-6471.005.patch, 
> YARN-6471.006.patch, YARN-6471.007.patch, YARN-6471-YARN-5881.001.patch
>
>
> This jira will track the new configurations which are needed to configure min 
> resource and max resource of various resource types in a queue.
> For eg: 
> {noformat}
> yarn.scheduler.capacity.root.default.memory.min-resource
> yarn.scheduler.capacity.root.default.memory.max-resource
> yarn.scheduler.capacity.root.default.vcores.min-resource
> yarn.scheduler.capacity.root.default.vcores.max-resource
> {noformat}
> Uploading a patch soon



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6818) User limit per partition is not honored in branch-2.7 >=

2017-07-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087580#comment-16087580
 ] 

Sunil G commented on YARN-6818:
---

Yes. Thanks [~jhung].
I will wait for [~naganarasimha...@apache.org] as well.

> User limit per partition is not honored in branch-2.7 >=
> 
>
> Key: YARN-6818
> URL: https://issues.apache.org/jira/browse/YARN-6818
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: YARN-6818-branch-2.7.001.patch, 
> YARN-6818-branch-2.7.002.patch
>
>
> We are seeing an issue where user limit factor does not cap the amount of 
> resources a user can consume in a queue in a partition. Suppose you have a 
> queue with access to partition X, used resources in default partition is 0, 
> and used resources in partition X is at the partition's user limit. This is 
> the problematic code as far as I can tell: (in LeafQueue.java){noformat}
> if (Resources
> .greaterThan(resourceCalculator, clusterResource,
> user.getUsed(label),
> limit)) {
>   // if enabled, check to see if could we potentially use this node 
> instead
>   // of a reserved node if the application has reserved containers
>   if (this.reservationsContinueLooking) {
> if (Resources.lessThanOrEqual(
> resourceCalculator,
> clusterResource,
> Resources.subtract(user.getUsed(), 
> application.getCurrentReservation()),
> limit)) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("User " + userName + " in queue " + getQueueName()
> + " will exceed limit based on reservations - " + " consumed: 
> "
> + user.getUsed() + " reserved: "
> + application.getCurrentReservation() + " limit: " + limit);
>   }
>   Resource amountNeededToUnreserve = 
> Resources.subtract(user.getUsed(label), limit);
>   // we can only acquire a new container if we unreserve first since 
> we ignored the
>   // user limit. Choose the max of user limit or what was previously 
> set by max
>   // capacity.
>   
> currentResoureLimits.setAmountNeededUnreserve(Resources.max(resourceCalculator,
>   clusterResource, 
> currentResoureLimits.getAmountNeededUnreserve(),
>   amountNeededToUnreserve));
>   return true;
> }
>   }
>   if (LOG.isDebugEnabled()) {
> LOG.debug("User " + userName + " in queue " + getQueueName()
> + " will exceed limit - " + " consumed: "
> + user.getUsed() + " limit: " + limit);
>   }
>   return false;
> }
> {noformat}
> First it sees the used resources in partition X is greater than partition's 
> user limit. Then the reservation check also succeeds because it is checking 
> {{user.getUsed() - application.getCurrentReservation() <= limit}} and returns 
> true.
> One fix is to just set {{Resources.subtract(user.getUsed(), 
> application.getCurrentReservation())}} to 
> {{Resources.subtract(user.getUsed(label), 
> application.getCurrentReservation())}}.
> This doesn't seem to be a problem in branch-2.8 and higher since YARN-3356 
> introduces this check: {noformat}  if (this.reservationsContinueLooking 
> && checkReservations
>   && label.equals(CommonNodeLabelsManager.NO_LABEL)) {{noformat}
> so in this case getting the used resources in default partition seems to be 
> correct.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6818) User limit per partition is not honored in branch-2.7 >=

2017-07-14 Thread Jonathan Hung (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-6818:

Affects Version/s: 2.7.4

> User limit per partition is not honored in branch-2.7 >=
> 
>
> Key: YARN-6818
> URL: https://issues.apache.org/jira/browse/YARN-6818
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: YARN-6818-branch-2.7.001.patch, 
> YARN-6818-branch-2.7.002.patch
>
>
> We are seeing an issue where user limit factor does not cap the amount of 
> resources a user can consume in a queue in a partition. Suppose you have a 
> queue with access to partition X, used resources in default partition is 0, 
> and used resources in partition X is at the partition's user limit. This is 
> the problematic code as far as I can tell: (in LeafQueue.java){noformat}
> if (Resources
> .greaterThan(resourceCalculator, clusterResource,
> user.getUsed(label),
> limit)) {
>   // if enabled, check to see if could we potentially use this node 
> instead
>   // of a reserved node if the application has reserved containers
>   if (this.reservationsContinueLooking) {
> if (Resources.lessThanOrEqual(
> resourceCalculator,
> clusterResource,
> Resources.subtract(user.getUsed(), 
> application.getCurrentReservation()),
> limit)) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("User " + userName + " in queue " + getQueueName()
> + " will exceed limit based on reservations - " + " consumed: 
> "
> + user.getUsed() + " reserved: "
> + application.getCurrentReservation() + " limit: " + limit);
>   }
>   Resource amountNeededToUnreserve = 
> Resources.subtract(user.getUsed(label), limit);
>   // we can only acquire a new container if we unreserve first since 
> we ignored the
>   // user limit. Choose the max of user limit or what was previously 
> set by max
>   // capacity.
>   
> currentResoureLimits.setAmountNeededUnreserve(Resources.max(resourceCalculator,
>   clusterResource, 
> currentResoureLimits.getAmountNeededUnreserve(),
>   amountNeededToUnreserve));
>   return true;
> }
>   }
>   if (LOG.isDebugEnabled()) {
> LOG.debug("User " + userName + " in queue " + getQueueName()
> + " will exceed limit - " + " consumed: "
> + user.getUsed() + " limit: " + limit);
>   }
>   return false;
> }
> {noformat}
> First it sees the used resources in partition X is greater than partition's 
> user limit. Then the reservation check also succeeds because it is checking 
> {{user.getUsed() - application.getCurrentReservation() <= limit}} and returns 
> true.
> One fix is to just set {{Resources.subtract(user.getUsed(), 
> application.getCurrentReservation())}} to 
> {{Resources.subtract(user.getUsed(label), 
> application.getCurrentReservation())}}.
> This doesn't seem to be a problem in branch-2.8 and higher since YARN-3356 
> introduces this check: {noformat}  if (this.reservationsContinueLooking 
> && checkReservations
>   && label.equals(CommonNodeLabelsManager.NO_LABEL)) {{noformat}
> so in this case getting the used resources in default partition seems to be 
> correct.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6775) CapacityScheduler: Improvements to assignContainers, avoid unnecessary canAssignToUser/Queue calls

2017-07-14 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-6775:
-
Attachment: YARN-6775.branch-2.8.002.patch

> CapacityScheduler: Improvements to assignContainers, avoid unnecessary 
> canAssignToUser/Queue calls
> --
>
> Key: YARN-6775
> URL: https://issues.apache.org/jira/browse/YARN-6775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha3
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6775.001.patch, YARN-6775.002.patch, 
> YARN-6775.branch-2.002.patch, YARN-6775.branch-2.8.002.patch
>
>
> There are several things in assignContainers() that are done multiple times 
> even though the result cannot change (canAssignToUser, canAssignToQueue). Add 
> some local caching to take advantage of this fact.
> Will post patch shortly. Patch includes a simple throughput test that 
> demonstrates when we have users at their user-limit, the number of 
> NodeUpdateSchedulerEvents we can process can be improved from 13K/sec to 
> 50K/sec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6714) IllegalStateException while handling APP_ATTEMPT_REMOVED event when async-scheduling enabled in CapacityScheduler

2017-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087542#comment-16087542
 ] 

Hadoop QA commented on YARN-6714:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
33s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m  5s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_131. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}122m 50s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.7.0_131 Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
| JDK v1.7.0_131 Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.reservation.TestFairSchedulerPlanFollower
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5e40efe |
| JIRA Issue | YARN-6714 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12877308/YARN-6714.branch-2.005.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 2a62a8b82705 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Updated] (YARN-6714) IllegalStateException while handling APP_ATTEMPT_REMOVED event when async-scheduling enabled in CapacityScheduler

2017-07-14 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-6714:
---
Attachment: YARN-6714.branch-2.006.patch

> IllegalStateException while handling APP_ATTEMPT_REMOVED event when 
> async-scheduling enabled in CapacityScheduler
> -
>
> Key: YARN-6714
> URL: https://issues.apache.org/jira/browse/YARN-6714
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-6714.001.patch, YARN-6714.002.patch, 
> YARN-6714.003.patch, YARN-6714.branch-2.003.patch, 
> YARN-6714.branch-2.004.patch, YARN-6714.branch-2.005.patch, 
> YARN-6714.branch-2.006.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, after AM failover 
> and unreserve all reserved containers, it still have chance to get and commit 
> the outdated reserve proposal of the failed app attempt. This problem 
> happened on an app in our cluster, when this app stopped, it unreserved all 
> reserved containers and compared these appAttemptId with current 
> appAttemptId, if not match it will throw IllegalStateException and make RM 
> crashed.
> Error log:
> {noformat}
> 2017-06-08 11:02:24,339 FATAL [ResourceManager Event Processor] 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Trying to unreserve  for application 
> appattempt_1495188831758_0121_02 when currently reserved  for application 
> application_1495188831758_0121 on node host: node1:45454 #containers=2 
> available=... used=...
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.unreserveResource(FiCaSchedulerNode.java:123)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:845)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1787)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1957)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:586)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:966)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1740)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:152)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:822)
> at java.lang.Thread.run(Thread.java:834)
> {noformat}
> When async-scheduling enabled, CapacityScheduler#doneApplicationAttempt and 
> CapacityScheduler#tryCommit both need to get write_lock before executing, so 
> we can check the app attempt state in commit process to avoid committing 
> outdated proposals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6804) Allow custom hostname for docker containers in native services

2017-07-14 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087517#comment-16087517
 ] 

Billie Rinaldi commented on YARN-6804:
--

bq. I think the check for if “LOG.isInfoEnabled” is not required
Most of the LOG.info checks in DockerLinuxContainerRuntime have 
LOG.isInfoEnabled() before them (not sure why), so I was trying to match those. 
Do you think we should remove them all?

bq. the network parameter is not used, and the added NETWORK_TYPE_BRIDGE seems 
also not used
You're right, I had a version of the patch where I only set the hostname when 
the network type was bridge, but then I read that docker hostname can be set 
even if network is host, so I removed it.

bq. probably we can create a common util method in RegistryPathUtils
Good point. Actually there's already a method for that, we can just make 
encodeYarnID perform the ctr replacement as well.

> Allow custom hostname for docker containers in native services
> --
>
> Key: YARN-6804
> URL: https://issues.apache.org/jira/browse/YARN-6804
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Fix For: yarn-native-services
>
> Attachments: YARN-6804-yarn-native-services.001.patch, 
> YARN-6804-yarn-native-services.002.patch
>
>
> Instead of the default random docker container hostname, we could set a more 
> user-friendly hostname for the container. The default could be a hostname 
> based on the container ID, with an option for the AM to provide a different 
> hostname. In the case of the native services AM, we could provide the 
> hostname that would be created by the registry DNS server. Regardless of 
> whether or not registry DNS is enabled, this would be a more useful hostname 
> for the docker container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6808) Allow Schedulers to return OPPORTUNISTIC containers when queues go over configured capacity

2017-07-14 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087500#comment-16087500
 ] 

Arun Suresh edited comment on YARN-6808 at 7/14/17 3:55 PM:


[~leftnoteasy], good questions..

bq. Use opportunistic container to do lazy preemption in NM. (Is there any 
umbrella JIRA for this?)
Technically, this is the default behavior for opportunistic containers - as it 
is today. Opp containers are killed in the NM when a Guaranteed container is 
started by an AM - if the NM at that point does not have resources to start the 
guaranteed container. We are also working on YARN-5972 which adds some amount 
of work preservation to this by - instead of killing the Opp container, we 
PAUSE it. PAUSE will be supported using the cgroups 
[freezer|https://www.kernel.org/doc/Documentation/cgroup-v1/freezer-subsystem.txt]
 module in linux and Windows JobObjects (we are using this in production 
actually)

bq. Let's say app1 in an underutilized queue, which want to preempt containers 
from an over-utilized queue. Will preemption happens if app1 asks opportunistic 
container?
I am assuming by under-utilized, you mean starved. So currently, if app1 
SPECIFICALLY asks for Opp containers it will get them irrespective of where the 
queue is underutilized or not. Opp containers ALLOCATION today are not limited 
by queue/cluster capacity today - It is just limited by the length of queued 
containers on Nodes (YARN-1011 will in time place stricter capacity limits by 
allocating only if the allocated resources are not being used). Opp containers 
EXECUTION is obviously bound by available resources on the NM, and like I 
mentioned earlier, running Opp containers will be killed to make room for any 
Guaranteed container.

bq. For target #1, who make the decision of moving guaranteed containers to 
opportunistic containers. If it is still decided by central RM, does that mean 
preemption logics in RM are same as today except kill operation is decided by 
NM side? 
Yes, it is RM. Currently in both the Schedulers, after a container is 
allocated, candidates for preemption are chosen from containers of apps from 
queues which are above capacity - then the RM aks the NM to preempt the 
containers. What the latest patch (002) here does is: Allocation of containers 
happen in the same code path - but right before handing the container to the 
AM, it checks if the queue capacity is exceeded - If so, downgrade the 
container to Opp. Thus technically, the same apps/containers that were a target 
for normal preemption will become candidates for preemption at the NM. There 
are obviously improvements that can be made - like that I mentioned in the 
phase 2 of the JIRA in the description - where, in addition to downgrading over 
cap containers to Opp, we can upgrade running Opp containers to Guaranteed for 
apps when some of their Guaranteed containers complete.
Like I mentioned, we are still prototyping - we are running tests now to 
collect data - will keep you guys posted on results.

bq. For overall opportunistic container execution: If OC launch request will be 
queued by NM, it may wait a long time before get executed. In this case, do we 
need to modify AM code to: a. expect longer delay before think the launch 
fails. b. asks more resource on different hosts since there's no guaranteed 
launch time for OC?
So, with YARN-4597, we had introduced a container state called SCHEDULED. A 
container is in the scheduled state while it is locallizing or if it is in the 
queue. Essentially, the extra delay will look just like localization delay to 
the AM. We have verified this is fine for MapReduce and Spark.

bq. What happens if an app doesn't want to ask opportunistic container when go 
beyond headroom? (Such as online services). I think this should be a per-app 
config (give me OC when I'm go beyond headroom).
A per app config makes sense. But currently (as of YARN-5180), the 
ResourceRequest has a field called {{ExecutionTypeRequest}} which in addition 
to the {{ExecutionType}} also has an {{enforeExecutionType}} flag. By default, 
this is false - but if set to true, my latest patch ensures that only 
Guaranteed containers are returned. I have added a test case to ensure that as 
well.

bq. Existing patch makes static decision, which happens when new resource 
request added by AM. Should this be reconsidered when app's headroom changed 
over time?
So, my latest patch (002) kind of addresses this. What I do now, is the 
decision is made after container allocation. Also, now I am ignoring the 
headroom. I am downgrading if at the time of Container allocation, only if the 
queue capacity is exceeded. The existing code paths ensure that max-capacity of 
queues are never exceeded anyway.




was (Author: asuresh):
[~leftnoteasy], good questions..

bq. Use opportunistic container to do lazy preemption in NM. (Is there any 

[jira] [Comment Edited] (YARN-6808) Allow Schedulers to return OPPORTUNISTIC containers when queues go over configured capacity

2017-07-14 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087500#comment-16087500
 ] 

Arun Suresh edited comment on YARN-6808 at 7/14/17 3:50 PM:


[~leftnoteasy], good questions..

bq. Use opportunistic container to do lazy preemption in NM. (Is there any 
umbrella JIRA for this?)
Technically, this is the default behavior for opportunistic containers - as it 
is today. Opp containers are killed in the NM when a Guaranteed container is 
started by an AM - if the NM at that point does not have resources to start the 
guaranteed container. We are also working on YARN-5972 which adds some amount 
of work preservation to this by - instead of killing the Opp container, we 
PAUSE it. PAUSE will be supported using the cgroups 
[freezer|https://www.kernel.org/doc/Documentation/cgroup-v1/freezer-subsystem.txt]
 module in linux and Windows JobObjects (we are using this in production 
actually)

bq. Let's say app1 in an underutilized queue, which want to preempt containers 
from an over-utilized queue. Will preemption happens if app1 asks opportunistic 
container?
I am assuming by under-utilized, you mean starved. So currently, if app1 
SPECIFICALLY asks for Opp containers it will get them irrespective of where the 
queue is underutilized or not. Opp containers ALLOCATION today are not limited 
by queue/cluster capacity today - It is just limited by the length of queued 
containers on Nodes (YARN-1011 will in time place stricter capacity limits by 
allocating only if the allocated resources are not being used). Opp containers 
EXECUTION is obviously bound by available resources on the NM, and like I 
mentioned earlier, running Opp containers will be killed to make room for any 
Guaranteed container.

bq. For target #1, who make the decision of moving guaranteed containers to 
opportunistic containers. If it is still decided by central RM, does that mean 
preemption logics in RM are same as today except kill operation is decided by 
NM side? 
Yes, it is RM. Currently in both the Schedulers, after a container is 
allocated, candidates for preemption are chosen from containers of apps from 
queues which are above capacity - then the RM aks the NM to preempt the 
containers. What the latest patch (002) here does is: Allocation of containers 
happen in the same code path - but right before handing the container to the 
AM, it checks if the queue capacity is exceeded - If so, downgrade the 
container to Opp. Thus technically, the same apps/containers that were a target 
for normal preemption will become candidates for preemption at the NM. There 
are obviously improvements that can be made - like that I mentioned in the 
phase 2 of the JIRA in the description - where, in addition to downgrading over 
cap containers to Opp, we can upgrade running Opp containers to Guaranteed for 
apps when some of their Guaranteed containers complete.
Like I mentioned, we are still prototyping - we are running tests now to 
collect data - will keep you guys posted on results.

bq. For overall opportunistic container execution: If OC launch request will be 
queued by NM, it may wait a long time before get executed. In this case, do we 
need to modify AM code to: a. expect longer delay before think the launch 
fails. b. asks more resource on different hosts since there's no guaranteed 
launch time for OC?
So, with YARN-4597, we had introduced a container state called SCHEDULED. A 
container is in the scheduled state while it is locallizing or if it is in the 
queue. Essentially, the extra delay will look just like localization delay to 
the AM. We have verified this is fine for MapReduce and Spark.

bq. What happens if an app doesn't want to ask opportunistic container when go 
beyond headroom? (Such as online services). I think this should be a per-app 
config (give me OC when I'm go beyond headroom).
A per app config makes sense. But currently today, the ResourceRequest has a 
field called {{ExecutionTypeRequest}} which in addition to the 
{{ExecutionType}} also has an {{enforeExecutionType}} flag. By default, this is 
false - but if set to true, my latest patch ensures that only Guaranteed 
containers are returned. I have added a test case to ensure that as well.

bq. Existing patch makes static decision, which happens when new resource 
request added by AM. Should this be reconsidered when app's headroom changed 
over time?
So, my latest patch (002) kind of addresses this. What I do now, is the 
decision is made after container allocation. Also, now I am ignoring the 
headroom. I am downgrading if at the time of Container allocation, only if the 
queue capacity is exceeded. The existing code paths ensure that max-capacity of 
queues are never exceeded anyway.




was (Author: asuresh):
[~leftnoteasy], good questions..

bq. Use opportunistic container to do lazy preemption in NM. (Is there any 
umbrella JIRA 

[jira] [Comment Edited] (YARN-6808) Allow Schedulers to return OPPORTUNISTIC containers when queues go over configured capacity

2017-07-14 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087500#comment-16087500
 ] 

Arun Suresh edited comment on YARN-6808 at 7/14/17 3:49 PM:


[~leftnoteasy], good questions..

bq. Use opportunistic container to do lazy preemption in NM. (Is there any 
umbrella JIRA for this?)
Technically, this is the default behavior for opportunistic containers - as it 
is today. Opp containers are killed in the NM when a Guaranteed container is 
started by an AM - if the NM at that point does not have resources to start the 
guaranteed container. We are also working on YARN-5972 which adds some amount 
of work preservation to this by - instead of killing the Opp container, we 
PAUSE it. PAUSE will be supported using the cgroups 
[freezer|https://www.kernel.org/doc/Documentation/cgroup-v1/freezer-subsystem.txt]
 module in linux and Windows JobObjects (we are using this in production 
actually)

bq. Let's say app1 in an underutilized queue, which want to preempt containers 
from an over-utilized queue. Will preemption happens if app1 asks opportunistic 
container?
I am assuming by under-utilized, you mean starved. So currently, if app1 
SPECIFICALLY asks for Opp containers it will get them irrespective of where the 
queue is underutilized or not. Opp containers ALLOCATION today are not limited 
by queue/cluster capacity today - It is just limited by the length of queued 
containers on Nodes (YARN-1011 will in time place stricter capacity limits by 
allocating only if the allocated resources are not being used). Opp containers 
EXECUTION is obviously bound by available resources on the NM, and like I 
mentioned earlier, running Opp containers will be killed to make room for any 
Guaranteed container.

bq. For target #1, who make the decision of moving guaranteed containers to 
opportunistic containers. If it is still decided by central RM, does that mean 
preemption logics in RM are same as today except kill operation is decided by 
NM side? 
Yes, it is RM. Currently in both the Schedulers, after a container is 
allocated, candidates for preemption are chosen from containers of apps from 
queues which are above capacity - then the RM aks the NM to preempt the 
containers. What the latest patch (002) here does is: Allocation of containers 
happen in the same code path - but right before handing the container to the 
AM, it checks if the queue capacity is exceeded - If so, downgrade the 
container to Opp. Thus technically, the same apps/containers that were a target 
for normal preemption will become candidates for preemption at the NM. There 
are obviously improvements - like that I mentioned in the phase 2 of the JIRA 
in the description - where, in addition to downgrading over cap containers to 
Opp, we can upgrade running Opp containers to Guaranteed for apps when some of 
their Guaranteed containers complete.
Like I mentioned, we are still prototyping - we are running tests now to 
collect data - will keep you guys posted on results.

bq. For overall opportunistic container execution: If OC launch request will be 
queued by NM, it may wait a long time before get executed. In this case, do we 
need to modify AM code to: a. expect longer delay before think the launch 
fails. b. asks more resource on different hosts since there's no guaranteed 
launch time for OC?
So, with YARN-4597, we had introduced a container state called SCHEDULED. A 
container is in the scheduled state while it is locallizing or if it is in the 
queue. Essentially, the extra delay will look just like localization delay to 
the AM. We have verified this is fine for MapReduce and Spark.

bq. What happens if an app doesn't want to ask opportunistic container when go 
beyond headroom? (Such as online services). I think this should be a per-app 
config (give me OC when I'm go beyond headroom).
A per app config makes sense. But currently today, the ResourceRequest has a 
field called {{ExecutionTypeRequest}} which in addition to the 
{{ExecutionType}} also has an {{enforeExecutionType}} flag. By default, this is 
false - but if set to true, my latest patch ensures that only Guaranteed 
containers are returned. I have added a test case to ensure that as well.

bq. Existing patch makes static decision, which happens when new resource 
request added by AM. Should this be reconsidered when app's headroom changed 
over time?
So, my latest patch (002) kind of addresses this. What I do now, is the 
decision is made after container allocation. Also, now I am ignoring the 
headroom. I am downgrading if at the time of Container allocation, only if the 
queue capacity is exceeded. The existing code paths ensure that max-capacity of 
queues are never exceeded anyway.




was (Author: asuresh):
[~leftnoteasy], good questions..

bq. Use opportunistic container to do lazy preemption in NM. (Is there any 
umbrella JIRA for this?)

[jira] [Updated] (YARN-6808) Allow Schedulers to return OPPORTUNISTIC containers when queues go over configured capacity

2017-07-14 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-6808:
--
Attachment: YARN-6808.002.patch

Attaching v002.

> Allow Schedulers to return OPPORTUNISTIC containers when queues go over 
> configured capacity
> ---
>
> Key: YARN-6808
> URL: https://issues.apache.org/jira/browse/YARN-6808
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-6808.001.patch, YARN-6808.002.patch
>
>
> This is based on discussions with [~kasha] and [~kkaranasos].
> Currently, when a Queues goes over capacity, apps on starved queues must wait 
> either for containers to complete or for them to be pre-empted by the 
> scheduler to get resources.
> This JIRA proposes to allow Schedulers to:
> # Allocate all containers over the configured queue capacity/weight as 
> OPPORTUNISTIC.
> # Auto-promote running OPPORTUNISTIC containers of apps as and when their 
> GUARANTEED containers complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6808) Allow Schedulers to return OPPORTUNISTIC containers when queues go over configured capacity

2017-07-14 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087500#comment-16087500
 ] 

Arun Suresh commented on YARN-6808:
---

[~leftnoteasy], good questions..

bq. Use opportunistic container to do lazy preemption in NM. (Is there any 
umbrella JIRA for this?)
Technically, this is the default behavior for opportunistic containers - as it 
is today. Opp containers are killed in the NM when a Guaranteed container is 
started by an AM - if the NM at that point does not have resources to start the 
guaranteed container. We are also working on YARN-5972 which adds some amount 
of work preservation to this by - instead of killing the Opp container, we 
PAUSE it. PAUSE will be supported using the cgroups 
[freezer|https://www.kernel.org/doc/Documentation/cgroup-v1/freezer-subsystem.txt]
 module in linux and Windows JobObjects (we are using this in production 
actually)

bq. Let's say app1 in an underutilized queue, which want to preempt containers 
from an over-utilized queue. Will preemption happens if app1 asks opportunistic 
container?
I am assuming by under-utilized, you mean starved. So currently, if app1 
SPECIFICALLY asks for Opp containers it will get them irrespective of where the 
queue is underutilized or not. Opp containers ALLOCATION today are not limited 
by queue/cluster capacity today - It is just limited by the length of queued 
containers on Nodes (YARN-1011 will in time place stricter capacity limits by 
allocating only if the allocated resources are not being used). Opp containers 
EXECUTION is obviously bound by available resources on the NM, and like I 
mentioned earlier, running Opp containers will be killed to make room for any 
Guaranteed container.

bq. For target #1, who make the decision of moving guaranteed containers to 
opportunistic containers. If it is still decided by central RM, does that mean 
preemption logics in RM are same as today except kill operation is decided by 
NM side? 
Yes, it is RM. Currently in both the Schedulers, after a container is 
allocated, candidates for preemption are chosen from containers of apps from 
queues which are above capacity - then the RM aks the NM to preempt the 
containers. What the latest patch (002) here does is: Allocation of containers 
happen in the same code path - but right before handing the container to the 
AM, it checks if the queue capacity is exceeded - If so, downgrade the 
container to Opp. Thus technically, the same apps/containers that were a target 
for normal preemption will become candidates for preemption at the NM. There 
are obviously improvements - like that I mentioned in the phase 2 of the JIRA 
in the description - where, in addition to downgrading over cap containers to 
Opp, we can upgrade running Opp containers to Guaranteed for apps when some of 
their Guaranteed containers complete.
Like I mentioned, we are still prototyping - we are running tests now to 
collect data - will keep you guys posted on results.

bq. For overall opportunistic container execution: If OC launch request will be 
queued by NM, it may wait a long time before get executed. In this case, do we 
need to modify AM code to: a. expect longer delay before think the launch 
fails. b. asks more resource on different hosts since there's no guaranteed 
launch time for OC?
So, with YARN-4597, we had introduced a container state called SCHEDULED. A 
container is in the scheduled state while it is locallizing or if it is in the 
queue. Essentially, the extra delay will look just like localization delay to 
the AM. We have verified this is fine for MapReduce and Spark.

bq. What happens if an app doesn't want to ask opportunistic container when go 
beyond headroom? (Such as online services). I think this should be a per-app 
config (give me OC when I'm go beyond headroom).
A per app config makes sense. But we currently today, the ResourceRequest has a 
field called {{ExecutionTypeRequest}} which in addition to the 
{{ExecutionType}} also has an {{enforeExecutionType}} flag. By default, this is 
false - but if to true, my latest patch ensures that only Guaranteed containers 
are returned. I have added a test case to ensure that as well.

bq. Existing patch makes static decision, which happens when new resource 
request added by AM. Should this be reconsidered when app's headroom changed 
over time?
So, my latest patch (002) kind of addresses this. What I do now, is the 
decision is made after container allocation. Also, now I am ignoring the 
headroom. I am downgrading if at the time of Container allocation, only if the 
queue capacity is exceeded. The existing code paths ensure that max-capacity of 
queues are never exceeded anyway.



> Allow Schedulers to return OPPORTUNISTIC containers when queues go over 
> configured capacity
> ---
>
> Key: 

[jira] [Commented] (YARN-6822) TestContainerManagerSecurity tests fail on trunk

2017-07-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087482#comment-16087482
 ] 

Ray Chiang commented on YARN-6822:
--

[~Sonia], can you confirm whether this issue continues after applying YARN-6150?

> TestContainerManagerSecurity tests fail on trunk
> 
>
> Key: YARN-6822
> URL: https://issues.apache.org/jira/browse/YARN-6822
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
> Environment: Ubuntu 14.04 
> x86, ppc64le
> $ java -version
> openjdk version "1.8.0_111"
> OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
>Reporter: Sonia Garudi
>
> {code}
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager[0]
> java.lang.Exception: test timed out after 12 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:130)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:107)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:348)
>   at com.sun.proxy.$Proxy91.startContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.startContainer(TestContainerManagerSecurity.java:557)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testStartContainer(TestContainerManagerSecurity.java:478)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:158)
> {code}
> {code}
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager[1]
> java.lang.Exception: test timed out after 12 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:130)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:107)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:348)
>   at com.sun.proxy.$Proxy91.startContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.startContainer(TestContainerManagerSecurity.java:557)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testStartContainer(TestContainerManagerSecurity.java:478)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:158)
> {code}
> Logs -
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/463/testReport/org.apache.hadoop.yarn.server/TestContainerManagerSecurity/testContainerManager_0_/
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/463/testReport/org.apache.hadoop.yarn.server/TestContainerManagerSecurity/testContainerManager_1_/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler

2017-07-14 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-6678:
---
Attachment: YARN-6678.005.patch

Sure, upload new patch to resolve the conflict with YARN-6714 in 
TestCapacitySchedulerAsyncScheduling.

> Committer thread crashes with IllegalStateException in async-scheduling mode 
> of CapacityScheduler
> -
>
> Key: YARN-6678
> URL: https://issues.apache.org/jira/browse/YARN-6678
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-6678.001.patch, YARN-6678.002.patch, 
> YARN-6678.003.patch, YARN-6678.004.patch, YARN-6678.005.patch
>
>
> Error log:
> {noformat}
> java.lang.IllegalStateException: Trying to reserve container 
> container_e10_1495599791406_7129_01_001453 for application 
> appattempt_1495599791406_7129_01 when currently reserved container 
> container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 
> #containers=40 available=... used=...
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
> {noformat}
> Reproduce this problem:
> 1. nm1 re-reserved app-1/container-X1 and generated reserve proposal-1
> 2. nm2 had enough resource for app-1, un-reserved app-1/container-X1 and 
> allocated app-1/container-X2
> 3. nm1 reserved app-2/container-Y
> 4. proposal-1 was accepted but throw IllegalStateException when applying
> Currently the check code for reserve proposal in FiCaSchedulerApp#accept as 
> follows:
> {code}
>   // Container reserved first time will be NEW, after the container
>   // accepted & confirmed, it will become RESERVED state
>   if (schedulerContainer.getRmContainer().getState()
>   == RMContainerState.RESERVED) {
> // Set reReservation == true
> reReservation = true;
>   } else {
> // When reserve a resource (state == NEW is for new container,
> // state == RUNNING is for increase container).
> // Just check if the node is not already reserved by someone
> if (schedulerContainer.getSchedulerNode().getReservedContainer()
> != null) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Try to reserve a container, but the node is "
> + "already reserved by another container="
> + schedulerContainer.getSchedulerNode()
> .getReservedContainer().getContainerId());
>   }
>   return false;
> }
>   }
> {code}
> The reserved container on the node of reserve proposal will be checked only 
> for first-reserve container.
> We should confirm that reserved container on this node is equal to re-reserve 
> container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-07-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087389#comment-16087389
 ] 

Sunil G commented on YARN-2113:
---

Thank you very much [~eepayne]. I also just verified the same.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.3
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.0021.patch, YARN-2113.branch-2.8.0019.patch, 
> YARN-2113.branch-2.8.0020.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6714) IllegalStateException while handling APP_ATTEMPT_REMOVED event when async-scheduling enabled in CapacityScheduler

2017-07-14 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-6714:
---
Attachment: YARN-6714.branch-2.005.patch

> IllegalStateException while handling APP_ATTEMPT_REMOVED event when 
> async-scheduling enabled in CapacityScheduler
> -
>
> Key: YARN-6714
> URL: https://issues.apache.org/jira/browse/YARN-6714
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-6714.001.patch, YARN-6714.002.patch, 
> YARN-6714.003.patch, YARN-6714.branch-2.003.patch, 
> YARN-6714.branch-2.004.patch, YARN-6714.branch-2.005.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, after AM failover 
> and unreserve all reserved containers, it still have chance to get and commit 
> the outdated reserve proposal of the failed app attempt. This problem 
> happened on an app in our cluster, when this app stopped, it unreserved all 
> reserved containers and compared these appAttemptId with current 
> appAttemptId, if not match it will throw IllegalStateException and make RM 
> crashed.
> Error log:
> {noformat}
> 2017-06-08 11:02:24,339 FATAL [ResourceManager Event Processor] 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Trying to unreserve  for application 
> appattempt_1495188831758_0121_02 when currently reserved  for application 
> application_1495188831758_0121 on node host: node1:45454 #containers=2 
> available=... used=...
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.unreserveResource(FiCaSchedulerNode.java:123)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:845)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1787)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1957)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:586)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:966)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1740)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:152)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:822)
> at java.lang.Thread.run(Thread.java:834)
> {noformat}
> When async-scheduling enabled, CapacityScheduler#doneApplicationAttempt and 
> CapacityScheduler#tryCommit both need to get write_lock before executing, so 
> we can check the app attempt state in commit process to avoid committing 
> outdated proposals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6714) IllegalStateException while handling APP_ATTEMPT_REMOVED event when async-scheduling enabled in CapacityScheduler

2017-07-14 Thread Tao Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087164#comment-16087164
 ] 

Tao Yang edited comment on YARN-6714 at 7/14/17 2:14 PM:
-

Sorry to have misplaced the actual types, and there are more custom generic 
types should be explicitly specified. Upload a new patch.


was (Author: tao yang):
Sorry to have misplaced the actual types. Upload a new patch.

> IllegalStateException while handling APP_ATTEMPT_REMOVED event when 
> async-scheduling enabled in CapacityScheduler
> -
>
> Key: YARN-6714
> URL: https://issues.apache.org/jira/browse/YARN-6714
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-6714.001.patch, YARN-6714.002.patch, 
> YARN-6714.003.patch, YARN-6714.branch-2.003.patch, 
> YARN-6714.branch-2.004.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, after AM failover 
> and unreserve all reserved containers, it still have chance to get and commit 
> the outdated reserve proposal of the failed app attempt. This problem 
> happened on an app in our cluster, when this app stopped, it unreserved all 
> reserved containers and compared these appAttemptId with current 
> appAttemptId, if not match it will throw IllegalStateException and make RM 
> crashed.
> Error log:
> {noformat}
> 2017-06-08 11:02:24,339 FATAL [ResourceManager Event Processor] 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Trying to unreserve  for application 
> appattempt_1495188831758_0121_02 when currently reserved  for application 
> application_1495188831758_0121 on node host: node1:45454 #containers=2 
> available=... used=...
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.unreserveResource(FiCaSchedulerNode.java:123)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:845)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1787)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1957)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:586)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:966)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1740)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:152)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:822)
> at java.lang.Thread.run(Thread.java:834)
> {noformat}
> When async-scheduling enabled, CapacityScheduler#doneApplicationAttempt and 
> CapacityScheduler#tryCommit both need to get write_lock before executing, so 
> we can check the app attempt state in commit process to avoid committing 
> outdated proposals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6714) IllegalStateException while handling APP_ATTEMPT_REMOVED event when async-scheduling enabled in CapacityScheduler

2017-07-14 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-6714:
---
Attachment: (was: YARN-6714.branch-2.005.patch)

> IllegalStateException while handling APP_ATTEMPT_REMOVED event when 
> async-scheduling enabled in CapacityScheduler
> -
>
> Key: YARN-6714
> URL: https://issues.apache.org/jira/browse/YARN-6714
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-6714.001.patch, YARN-6714.002.patch, 
> YARN-6714.003.patch, YARN-6714.branch-2.003.patch, 
> YARN-6714.branch-2.004.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, after AM failover 
> and unreserve all reserved containers, it still have chance to get and commit 
> the outdated reserve proposal of the failed app attempt. This problem 
> happened on an app in our cluster, when this app stopped, it unreserved all 
> reserved containers and compared these appAttemptId with current 
> appAttemptId, if not match it will throw IllegalStateException and make RM 
> crashed.
> Error log:
> {noformat}
> 2017-06-08 11:02:24,339 FATAL [ResourceManager Event Processor] 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Trying to unreserve  for application 
> appattempt_1495188831758_0121_02 when currently reserved  for application 
> application_1495188831758_0121 on node host: node1:45454 #containers=2 
> available=... used=...
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.unreserveResource(FiCaSchedulerNode.java:123)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:845)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1787)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1957)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:586)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:966)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1740)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:152)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:822)
> at java.lang.Thread.run(Thread.java:834)
> {noformat}
> When async-scheduling enabled, CapacityScheduler#doneApplicationAttempt and 
> CapacityScheduler#tryCommit both need to get write_lock before executing, so 
> we can check the app attempt state in commit process to avoid committing 
> outdated proposals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6819) Application report fails if app rejected due to nodesize

2017-07-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087372#comment-16087372
 ] 

Rohith Sharma K S commented on YARN-6819:
-

+1 lgtm, would mind fixing checkstyle which is fixable? 

> Application report fails if app rejected due to nodesize
> 
>
> Key: YARN-6819
> URL: https://issues.apache.org/jira/browse/YARN-6819
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: YARN-6819.001.patch, YARN-6819.002.patch
>
>
> In YARN-5006 application rejected when nodesize limit is exceeded. 
> {{FinalSavingTransition}} stateBeforeFinalSaving  not set after skipping save 
> to store which causes application report failure



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4455) Support fetching metrics by time range

2017-07-14 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087323#comment-16087323
 ] 

Varun Saxena commented on YARN-4455:


Thanks [~rohithsharma] for the review.

bq. Just for curiosity, what are we writing and reading with supplemented 
Timestamp ?
Primarily used to distinguish between metrics written in flow run table from 
different apps. So that we have 2 different puts for different apps and one 
does not overwrite metric of other just because the timestamps were same. Not 
really required for entity and app tables but multiplying it by a factor 
ensures that code path is common while writing.

bq. Method setMetricsTimeRange has conditional statement in each line. Can it 
be optimized by implicitly assuming that tsBegin will not be less than 0 since 
TimelineDataManager validates it? Similarly for tsEnd?
The code below is to handle overflow. When we calculate supplemented timestamp 
(i.e. multiply tsBegin and tsEnd by a factor of 1000), overflow can potentially 
occur. Actual timestamps would not be in that range but if input from user is 
wrong, the value can become negative due to overflow after multiplication. 
Hence, the check. 
{code}
  query.setColumnFamilyTimeRange(metricsCf,
  ((supplementedTsBegin < 0) ? 0 : supplementedTsBegin),
  ((supplementedTsEnd < 0) ? Long.MAX_VALUE : supplementedTsEnd));
{code}

Will fix the nit.

> Support fetching metrics by time range
> --
>
> Key: YARN-4455
> URL: https://issues.apache.org/jira/browse/YARN-4455
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355, yarn-5355-merge-blocker
> Attachments: YARN-4455-YARN-5355.01.patch, 
> YARN-4455-YARN-5355.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6825) RM quit due to ApplicationStateData exceed the limit size of znode in zk

2017-07-14 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087303#comment-16087303
 ] 

Bibin A Chundatt commented on YARN-6825:


{quote}
I think 80% of configured jute buffer should be taken implicitly rather than 
allowing admin to configure 80% of jute buffer. 
{quote}
Since the default value of jute buffer size is currently 1MB , I agree we can 
make the default value as {{.8MB}}. So does this solve the issues you have 
mentioned??
I had in mind the cases mentioned in this jira during 5006 missed to explicitly 
mention in YARN-5006.

Do you find any other scenario which couldn't be handled by configuration??  
So are we good to go ahead with YARN-6819?  If its fine with you , will handle 
the  default value change also in same jira..

> RM quit due to ApplicationStateData exceed the limit size of znode in zk
> 
>
> Key: YARN-6825
> URL: https://issues.apache.org/jira/browse/YARN-6825
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>
> YARN-5006 fixes this issue by strict validation for ApplicationStateData 
> length against 1MB(default max jute buffer) during application submission 
> only. There is possibility of thrashing as dead zone was not properly 
> defined/taken care.
> But it do not consider scenarios where ApplicationStateData can be increased 
> later point of time i.e 
> # If app is submitted with less than 1MB during submission, later updated 
> like queue name or life time value or priority is changed. The app update 
> call will be sent to statestore which cause same issue because 
> ApplicationStateData length has increased.
> # Consider there is no app update, but final state are stored in ZK. This 
> adds up several fields such finishTime, finalState, finalApplicationState. 
> This increases size of ApplicationStateData.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6733) Add table for storing sub-application entities

2017-07-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087301#comment-16087301
 ] 

Rohith Sharma K S commented on YARN-6733:
-

Thanks Vrushali updating patch. I will take a look at the patch this week end. 

> Add table for storing sub-application entities
> --
>
> Key: YARN-6733
> URL: https://issues.apache.org/jira/browse/YARN-6733
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: IMG_7040.JPG, YARN-6733-YARN-5355.001.patch, 
> YARN-6733-YARN-5355.002.patch, YARN-6733-YARN-5355.003.patch, 
> YARN-6733-YARN-5355.004.patch
>
>
> After a discussion with Tez folks, we have been thinking over introducing a 
> table to store  sub-application information.
> For example, if a Tez session runs for a certain period as User X and runs a 
> few AMs. These AMs accept DAGs from other users. Tez will execute these dags 
> with a doAs user. ATSv2 should store this information in a new table perhaps 
> called as "sub_application" table. 
> This jira tracks the code changes needed for  table schema creation.
> I will file other jiras for writing to that table, updating the user name 
> fields to include sub-application user etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6280) Introduce deselect query param to skip ResourceRequest from getApp/getApps REST API

2017-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087269#comment-16087269
 ] 

Hadoop QA commented on YARN-6280:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
34s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
50s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
5s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
22s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} branch-2 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 4 new + 107 unchanged - 5 fixed = 111 total (was 112) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 44m 56s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_131. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m  
9s{color} | {color:green} hadoop-yarn-site in the patch passed with JDK 
v1.7.0_131. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License 

[jira] [Commented] (YARN-6714) IllegalStateException while handling APP_ATTEMPT_REMOVED event when async-scheduling enabled in CapacityScheduler

2017-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087252#comment-16087252
 ] 

Hadoop QA commented on YARN-6714:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
41s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 27s{color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_131
 with JDK v1.8.0_131 generated 1 new + 14 unchanged - 1 fixed = 15 total (was 
15) {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 30s{color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_131
 with JDK v1.7.0_131 generated 1 new + 13 unchanged - 1 fixed = 14 total (was 
14) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 30s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_131. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}103m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_131 Failed junit tests | 
hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
| JDK v1.7.0_131 Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5e40efe |
| JIRA Issue | YARN-6714 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12877273/YARN-6714.branch-2.005.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  

[jira] [Commented] (YARN-4455) Support fetching metrics by time range

2017-07-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087233#comment-16087233
 ] 

Rohith Sharma K S commented on YARN-4455:
-

thanks [~varun_saxena] for working on this jira. Overall patch looks good to me
nits:
TimelineReaderWebServices.java 
line 614
# printing whole stack trace. I think this should not be printed full message. 
What is the necessity of it?

HBaseTimelineStorageUtils
# Just for curiosity, what are we writing and reading with supplemented 
Timestamp ? 
# Method setMetricsTimeRange has conditional statement in each line. Can it be 
optimized by implicitly assuming that tsBegin will not be less than 0 since 
TimelineDataManager validates it?   Similarly for tsEnd?

> Support fetching metrics by time range
> --
>
> Key: YARN-4455
> URL: https://issues.apache.org/jira/browse/YARN-4455
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355, yarn-5355-merge-blocker
> Attachments: YARN-4455-YARN-5355.01.patch, 
> YARN-4455-YARN-5355.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6822) TestContainerManagerSecurity tests fail on trunk

2017-07-14 Thread Sonia Garudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sonia Garudi updated YARN-6822:
---
Description: 
{code}
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager[0]
java.lang.Exception: test timed out after 12 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:130)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:107)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:348)
at com.sun.proxy.$Proxy91.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.startContainer(TestContainerManagerSecurity.java:557)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testStartContainer(TestContainerManagerSecurity.java:478)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:158)
{code}

{code}
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager[1]
java.lang.Exception: test timed out after 12 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:130)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:107)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:348)
at com.sun.proxy.$Proxy91.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.startContainer(TestContainerManagerSecurity.java:557)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testStartContainer(TestContainerManagerSecurity.java:478)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:158)
{code}

Logs -
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/463/testReport/org.apache.hadoop.yarn.server/TestContainerManagerSecurity/testContainerManager_0_/
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/463/testReport/org.apache.hadoop.yarn.server/TestContainerManagerSecurity/testContainerManager_1_/

  was:
{code}
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager[0]
java.lang.Exception: test timed out after 12 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:130)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:107)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:348)
at com.sun.proxy.$Proxy91.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.startContainer(TestContainerManagerSecurity.java:557)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testStartContainer(TestContainerManagerSecurity.java:478)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:158)
{code}

{code}
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager[1]
java.lang.Exception: test timed out after 12 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:130)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:107)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:348)
at com.sun.proxy.$Proxy91.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.startContainer(TestContainerManagerSecurity.java:557)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testStartContainer(TestContainerManagerSecurity.java:478)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
at 

[jira] [Updated] (YARN-6822) TestContainerManagerSecurity tests fail on trunk

2017-07-14 Thread Sonia Garudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sonia Garudi updated YARN-6822:
---
Description: 
{code}
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager[0]
java.lang.Exception: test timed out after 12 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:130)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:107)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:348)
at com.sun.proxy.$Proxy91.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.startContainer(TestContainerManagerSecurity.java:557)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testStartContainer(TestContainerManagerSecurity.java:478)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:158)
{code}

{code}
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager[1]
java.lang.Exception: test timed out after 12 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:130)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:107)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:348)
at com.sun.proxy.$Proxy91.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.startContainer(TestContainerManagerSecurity.java:557)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testStartContainer(TestContainerManagerSecurity.java:478)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:158)
{code}

https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/463/testReport/org.apache.hadoop.yarn.server/TestContainerManagerSecurity/testContainerManager_0_/
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/463/testReport/org.apache.hadoop.yarn.server/TestContainerManagerSecurity/testContainerManager_1_/

  was:
{code}
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager[0]
java.lang.Exception: test timed out after 12 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:130)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:107)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:348)
at com.sun.proxy.$Proxy91.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.startContainer(TestContainerManagerSecurity.java:557)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testStartContainer(TestContainerManagerSecurity.java:478)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:158)

org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager[1]
java.lang.Exception: test timed out after 12 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:130)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:107)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:348)
at com.sun.proxy.$Proxy91.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.startContainer(TestContainerManagerSecurity.java:557)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testStartContainer(TestContainerManagerSecurity.java:478)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
at 

[jira] [Commented] (YARN-6825) RM quit due to ApplicationStateData exceed the limit size of znode in zk

2017-07-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087177#comment-16087177
 ] 

Rohith Sharma K S commented on YARN-6825:
-

I think 80% of  configured jute buffer should be taken implicitly rather than 
allowing admin to configure 80% of jute buffer. This gives a buffered space for 
allowing additional information into ApplicationStateData. 

And also diagnosis error message length i.e 64Kb should be private 
configurations. Otherwise, this issue will be there forever if there is any 
misconfigured length of diagnosis errors. 

> RM quit due to ApplicationStateData exceed the limit size of znode in zk
> 
>
> Key: YARN-6825
> URL: https://issues.apache.org/jira/browse/YARN-6825
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>
> YARN-5006 fixes this issue by strict validation for ApplicationStateData 
> length against 1MB(default max jute buffer) during application submission 
> only. There is possibility of thrashing as dead zone was not properly 
> defined/taken care.
> But it do not consider scenarios where ApplicationStateData can be increased 
> later point of time i.e 
> # If app is submitted with less than 1MB during submission, later updated 
> like queue name or life time value or priority is changed. The app update 
> call will be sent to statestore which cause same issue because 
> ApplicationStateData length has increased.
> # Consider there is no app update, but final state are stored in ZK. This 
> adds up several fields such finishTime, finalState, finalApplicationState. 
> This increases size of ApplicationStateData.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4455) Support fetching metrics by time range

2017-07-14 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087171#comment-16087171
 ] 

Varun Saxena commented on YARN-4455:


Checkstyle issues cannot be fixed as it relates to number of method parameters.

> Support fetching metrics by time range
> --
>
> Key: YARN-4455
> URL: https://issues.apache.org/jira/browse/YARN-4455
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: YARN-5355, yarn-5355-merge-blocker
> Attachments: YARN-4455-YARN-5355.01.patch, 
> YARN-4455-YARN-5355.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6714) IllegalStateException while handling APP_ATTEMPT_REMOVED event when async-scheduling enabled in CapacityScheduler

2017-07-14 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-6714:
---
Attachment: YARN-6714.branch-2.005.patch

Sorry to have misplaced the actual types. Upload a new patch.

> IllegalStateException while handling APP_ATTEMPT_REMOVED event when 
> async-scheduling enabled in CapacityScheduler
> -
>
> Key: YARN-6714
> URL: https://issues.apache.org/jira/browse/YARN-6714
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-6714.001.patch, YARN-6714.002.patch, 
> YARN-6714.003.patch, YARN-6714.branch-2.003.patch, 
> YARN-6714.branch-2.004.patch, YARN-6714.branch-2.005.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, after AM failover 
> and unreserve all reserved containers, it still have chance to get and commit 
> the outdated reserve proposal of the failed app attempt. This problem 
> happened on an app in our cluster, when this app stopped, it unreserved all 
> reserved containers and compared these appAttemptId with current 
> appAttemptId, if not match it will throw IllegalStateException and make RM 
> crashed.
> Error log:
> {noformat}
> 2017-06-08 11:02:24,339 FATAL [ResourceManager Event Processor] 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Trying to unreserve  for application 
> appattempt_1495188831758_0121_02 when currently reserved  for application 
> application_1495188831758_0121 on node host: node1:45454 #containers=2 
> available=... used=...
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.unreserveResource(FiCaSchedulerNode.java:123)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:845)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1787)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1957)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:586)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:966)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1740)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:152)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:822)
> at java.lang.Thread.run(Thread.java:834)
> {noformat}
> When async-scheduling enabled, CapacityScheduler#doneApplicationAttempt and 
> CapacityScheduler#tryCommit both need to get write_lock before executing, so 
> we can check the app attempt state in commit process to avoid committing 
> outdated proposals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6280) Introduce deselect query param to skip ResourceRequest from getApp/getApps REST API

2017-07-14 Thread Lantao Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087163#comment-16087163
 ] 

Lantao Jin commented on YARN-6280:
--

Thank you [~sunilg], re-upload.

> Introduce deselect query param to skip ResourceRequest from getApp/getApps 
> REST API
> ---
>
> Key: YARN-6280
> URL: https://issues.apache.org/jira/browse/YARN-6280
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, restapi
>Affects Versions: 2.7.3
>Reporter: Lantao Jin
>Assignee: Lantao Jin
> Fix For: 3.0.0-alpha4
>
> Attachments: YARN-6280.001.patch, YARN-6280.002.patch, 
> YARN-6280.003.patch, YARN-6280.004.patch, YARN-6280.005.patch, 
> YARN-6280.006.patch, YARN-6280.007.patch, YARN-6280.008.patch, 
> YARN-6280.009.patch, YARN-6280.010.patch, YARN-6280.011.patch, 
> YARN-6280-branch-2.001.patch, YARN-6280-branch-2.002.patch
>
>
> Begin from v2.7, the ResourceManager Cluster Applications REST API returns   
> ResourceRequest list. It's a very large construction in AppInfo.
> As a test, we use below URI to query only 2 results:
> http:// address:port>/ws/v1/cluster/apps?states=running,accepted=2
> The results are very different:
> ||Hadoop version|Total Character|Total Word|Total Lines|Size||
> |2.4.1|1192|  42| 42| 1.2 KB|
> |2.7.1|1222179|   48740|  48735|  1.21 MB|
> Most RESTful API requesters don't know about this after upgraded and their 
> old queries may cause ResourceManager more GC consuming and slower. Even if 
> they know this but have no idea to reduce the impact of ResourceManager 
> except slow down their query frequency.
> The patch adding a query parameter "showResourceRequests" to help requesters 
> who don't need this information to reduce the overhead. In consideration of 
> compatibility of interface, the default value is true if they don't set the 
> parameter, so the behaviour is the same as now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6280) Introduce deselect query param to skip ResourceRequest from getApp/getApps REST API

2017-07-14 Thread Lantao Jin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin updated YARN-6280:
-
Attachment: YARN-6280-branch-2.002.patch

> Introduce deselect query param to skip ResourceRequest from getApp/getApps 
> REST API
> ---
>
> Key: YARN-6280
> URL: https://issues.apache.org/jira/browse/YARN-6280
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, restapi
>Affects Versions: 2.7.3
>Reporter: Lantao Jin
>Assignee: Lantao Jin
> Fix For: 3.0.0-alpha4
>
> Attachments: YARN-6280.001.patch, YARN-6280.002.patch, 
> YARN-6280.003.patch, YARN-6280.004.patch, YARN-6280.005.patch, 
> YARN-6280.006.patch, YARN-6280.007.patch, YARN-6280.008.patch, 
> YARN-6280.009.patch, YARN-6280.010.patch, YARN-6280.011.patch, 
> YARN-6280-branch-2.001.patch, YARN-6280-branch-2.002.patch
>
>
> Begin from v2.7, the ResourceManager Cluster Applications REST API returns   
> ResourceRequest list. It's a very large construction in AppInfo.
> As a test, we use below URI to query only 2 results:
> http:// address:port>/ws/v1/cluster/apps?states=running,accepted=2
> The results are very different:
> ||Hadoop version|Total Character|Total Word|Total Lines|Size||
> |2.4.1|1192|  42| 42| 1.2 KB|
> |2.7.1|1222179|   48740|  48735|  1.21 MB|
> Most RESTful API requesters don't know about this after upgraded and their 
> old queries may cause ResourceManager more GC consuming and slower. Even if 
> they know this but have no idea to reduce the impact of ResourceManager 
> except slow down their query frequency.
> The patch adding a query parameter "showResourceRequests" to help requesters 
> who don't need this information to reduce the overhead. In consideration of 
> compatibility of interface, the default value is true if they don't set the 
> parameter, so the behaviour is the same as now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6504) Add support for resource profiles in MapReduce

2017-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087135#comment-16087135
 ] 

Hadoop QA commented on YARN-6504:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-6504 does not apply to YARN-3926. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6504 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12867166/YARN-6504-YARN-3926.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16441/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add support for resource profiles in MapReduce
> --
>
> Key: YARN-6504
> URL: https://issues.apache.org/jira/browse/YARN-6504
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-6504-YARN-3926.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4455) Support fetching metrics by time range

2017-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087111#comment-16087111
 ] 

Hadoop QA commented on YARN-4455:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-5355 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
 7s{color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
47s{color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} YARN-5355 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests
 {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
28s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase
 in YARN-5355 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} YARN-5355 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 2 new + 
42 unchanged - 5 fixed = 44 total (was 47) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests
 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
44s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
38s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase-tests in 
the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ac17dc |
| JIRA Issue | YARN-4455 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12877260/YARN-4455-YARN-5355.02.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  

[jira] [Comment Edited] (YARN-6825) RM quit due to ApplicationStateData exceed the limit size of znode in zk

2017-07-14 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087093#comment-16087093
 ] 

Bibin A Chundatt edited comment on YARN-6825 at 7/14/17 9:29 AM:
-

Hi [~rohithsharma]

IIUC this could be handle by configuring the value 80% of (jute buffer size). 

# Exception log could occupy space (YARN-6125 Bytes limit is added for 
diagnostics message ) 
# Finish time , priority, queuename can be handle by configuring lesser value

So incombination we should be able handle. 





was (Author: bibinchundatt):
Hi [~rohithsharma]

IIUC this could be handle by configuring the value about 80% for 1 MB(jute 
buffer size). 

# Exception log could occupy space (YARN-6125 Bytes limit is added for 
diagnostics message ) 
# Finish time , priority, queuename can be handle by configuring lesser value

So incombination we should be able handle. 




> RM quit due to ApplicationStateData exceed the limit size of znode in zk
> 
>
> Key: YARN-6825
> URL: https://issues.apache.org/jira/browse/YARN-6825
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>
> YARN-5006 fixes this issue by strict validation for ApplicationStateData 
> length against 1MB(default max jute buffer) during application submission 
> only. There is possibility of thrashing as dead zone was not properly 
> defined/taken care.
> But it do not consider scenarios where ApplicationStateData can be increased 
> later point of time i.e 
> # If app is submitted with less than 1MB during submission, later updated 
> like queue name or life time value or priority is changed. The app update 
> call will be sent to statestore which cause same issue because 
> ApplicationStateData length has increased.
> # Consider there is no app update, but final state are stored in ZK. This 
> adds up several fields such finishTime, finalState, finalApplicationState. 
> This increases size of ApplicationStateData.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6825) RM quit due to ApplicationStateData exceed the limit size of znode in zk

2017-07-14 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087093#comment-16087093
 ] 

Bibin A Chundatt commented on YARN-6825:


Hi [~rohithsharma]

IIUC this could be handle by configuring the value about 80% for 1 MB(jute 
buffer size). 

# Exception log could occupy space (YARN-6125 Bytes limit is added for 
diagnostics message ) 
# Finish time , priority, queuename can be handle by configuring lesser value

So incombination we should be able handle. 




> RM quit due to ApplicationStateData exceed the limit size of znode in zk
> 
>
> Key: YARN-6825
> URL: https://issues.apache.org/jira/browse/YARN-6825
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>
> YARN-5006 fixes this issue by strict validation for ApplicationStateData 
> length against 1MB(default max jute buffer) during application submission 
> only. There is possibility of thrashing as dead zone was not properly 
> defined/taken care.
> But it do not consider scenarios where ApplicationStateData can be increased 
> later point of time i.e 
> # If app is submitted with less than 1MB during submission, later updated 
> like queue name or life time value or priority is changed. The app update 
> call will be sent to statestore which cause same issue because 
> ApplicationStateData length has increased.
> # Consider there is no app update, but final state are stored in ZK. This 
> adds up several fields such finishTime, finalState, finalApplicationState. 
> This increases size of ApplicationStateData.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >