[jira] [Commented] (YARN-6208) Improve the log when FinishAppEvent sent to the NodeManager which didn't run the application

2017-06-01 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034166#comment-16034166
 ] 

Akira Ajisaka commented on YARN-6208:
-

Hi [~templedf], would you review the latest patch?

> Improve the log when FinishAppEvent sent to the NodeManager which didn't run 
> the application
> 
>
> Key: YARN-6208
> URL: https://issues.apache.org/jira/browse/YARN-6208
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Minor
>  Labels: newbie, supportability
> Attachments: YARN-6208.01.patch, YARN-6208.02.patch, 
> YARN-6208.03.patch
>
>
> When FinishAppEvent of an application is sent to a NodeManager and there are 
> no applications of the application ran on the NodeManager, we can see the 
> following log:
> {code}
> 2015-12-28 11:59:18,725 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: FINISH_APPLICATION sent to absent application 
> application_1446103803043_9892
> {code}
> YARN-4520 made the log as follows:
> {code}
>   LOG.warn("couldn't find application " + appID + " while processing"
>   + " FINISH_APPS event");
> {code}
> and I'm thinking it can be improved.
> * Make the log WARN from INFO
> * Add why the NodeManager couldn't find the application. For example, 
> "because there were no containers of the application ran on the NodeManager."



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6680) Avoid locking overhead for NO_LABEL lookups

2017-06-01 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034125#comment-16034125
 ] 

Naganarasimha G R commented on YARN-6680:
-

Hi [~daryn],
{{"resulting in a 3X slowdown"}} is based on test results on a non partitioned 
cluster or based on a hunch ? if based on test results map be some thing can be 
uploaded ?
Agree in the normal flows its unnecessary to use this map but if the labels are 
enabled then i feel the locks atleast in few places are required to maintain 
consistency.
for ex. {{RMNodeLabelsManager.getResourceByLabel()}} though query is done for 
nolabel read lock is required as intermittently node's partition mapping could 
be changed, or node can be deactivated etc... all the ops where write lock is 
held ?



> Avoid locking overhead for NO_LABEL lookups
> ---
>
> Key: YARN-6680
> URL: https://issues.apache.org/jira/browse/YARN-6680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: YARN-6680.patch
>
>
> Labels are managed via a hash that is protected with a read lock.  The lock 
> acquire and release are each just as expensive as the hash lookup itself - 
> resulting in a 3X slowdown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6683) Invalid event: COLLECTOR_UPDATE at KILLED

2017-06-01 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-6683:
---

Assignee: Rohith Sharma K S

> Invalid event: COLLECTOR_UPDATE at KILLED
> -
>
> Key: YARN-6683
> URL: https://issues.apache.org/jira/browse/YARN-6683
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Rohith Sharma K S
>
> {code}
> 2017-06-01 20:01:22,686 ERROR rmapp.RMAppImpl (RMAppImpl.java:handle(905)) - 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> COLLECTOR_UPDATE at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:903)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:118)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:888)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:201)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:127)
> {code}
> Below code already gets the RMApp instance and then send an event to RMApp to 
> update the collector address. Instead of updating via event, it could just 
> update via a method of RMApp. This also avoids state-machine changes.
> Also, is there any implications that COLLECTOR_UPDATE happened at KILLED 
> state ?
> {code}
>   } else {
> String previousCollectorAddr = rmApp.getCollectorAddr();
> if (previousCollectorAddr == null
> || !previousCollectorAddr.equals(collectorAddr)) {
>   // sending collector update event.
>   RMAppCollectorUpdateEvent event =
>   new RMAppCollectorUpdateEvent(appId, collectorAddr);
>   rmContext.getDispatcher().getEventHandler().handle(event);
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6467) CSQueueMetrics needs to update the current metrics for default partition only

2017-06-01 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034044#comment-16034044
 ] 

Naganarasimha G R commented on YARN-6467:
-

Thanks [~maniraj...@gmail.com] for incorporating the changes mentioned offline. 
This way it will make simpler to support metrics for multiple partitions 
YARN-6492 .
Overall approach looks good but few nits in the patch is as follows

# QueueMetrics ln no 353 - 356: setAvailableResourcesToQueue(Resource) needs to 
call setAvailableResourcesToQueue(String label, Resource limit). Similarly take 
care if any other methods are overloaded.
# QueueMetrics : please modify label to partition throughout the class where 
ever added
# QueueMetrics ln no 512 - 521: {{releaseResources(String, Resource)}} eihter 
need to consider label, or may be removed if not used anywhere (check the 
history who added it)
# QueueMetrics : we need to capture what all we will capture at 
partition/queue/user level, if not it will be confusing. for example 
{{allocatedContainers and aggregateContainersAllocated}}, we need to relook 
whether its at partition level only. we can discuss more on it.
# QueueMetrics ln 341 -345 : overloaded method setAvailableResourcesToQueue can 
be called here.
# FiCaSchedulerApp ln no 573 : unnecessary new line
# FiCaSchedulerApp ln no 574 : it should be node.getPartition() though 
container has the same partition. in some narrow cases it might be wrong 
(partition sharing).
# FairScheduler ln no 1076 -1077 : changes not required


Note:
QueueMetrics : {{_decrPendingResources(int, Resource)}}, 
{{_incrPendingResources(int, Resource)}} &  {{_incrPendingResources}} needs to 
consider partition too when updating pendingMB abd pendingVcore. (add a note so 
that it can be handled in next jira)

P.S. please check and confirm whether the checkstyle issues and test case 
failure is not related to your patch modifciations.

> CSQueueMetrics needs to update the current metrics for default partition only
> -
>
> Key: YARN-6467
> URL: https://issues.apache.org/jira/browse/YARN-6467
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha2
>Reporter: Naganarasimha G R
>Assignee: Manikandan R
> Attachments: YARN-6467.001.patch, YARN-6467.001.patch, 
> YARN-6467.002.patch
>
>
> As a followup to YARN-6195, we need to update existing metrics to only 
> default Partition.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression

2017-06-01 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034043#comment-16034043
 ] 

Jonathan Hung commented on YARN-4925:
-

Created YARN-6684 for this issue.
Seems the unit test failures are unrelated to this patch.

> ContainerRequest in AMRMClient, application should be able to specify 
> nodes/racks together with nodeLabelExpression
> ---
>
> Key: YARN-4925
> URL: https://issues.apache.org/jira/browse/YARN-4925
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4925.patch, 0002-YARN-4925.patch, 
> YARN-4925-branch-2.7.001.patch
>
>
> Currently with nodelabel AMRMClient will not be able to specify nodelabels 
> with Node/Rack requests.For application like spark NODE_LOCAL requests cannot 
> be asked with label expression.
> As per the check in  {{AMRMClientImpl#checkNodeLabelExpression}}
> {noformat}
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
> {noformat}
> {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
> OFF-SWITCH. 
> The above check is not required for ContainerRequest ask /cc [~wangda] thank 
> you for confirming



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6684) TestAMRMClient tests fail on branch-2.7

2017-06-01 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-6684:
---

 Summary: TestAMRMClient tests fail on branch-2.7
 Key: YARN-6684
 URL: https://issues.apache.org/jira/browse/YARN-6684
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Hung


{noformat}2017-06-01 19:10:44,362 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:addNode(1335)) - Added node 
jhung-ld2.linkedin.biz:58205 clusterResource: 
2017-06-01 19:10:44,370 INFO  server.MiniYARNCluster 
(MiniYARNCluster.java:waitForNodeManagersToConnect(657)) - All Node Managers 
connected in MiniYARNCluster
2017-06-01 19:10:44,376 INFO  client.RMProxy (RMProxy.java:createRMProxy(98)) - 
Connecting to ResourceManager at jhung-ld2.linkedin.biz/ipaddr:36167
2017-06-01 19:10:45,501 INFO  ipc.Client 
(Client.java:handleConnectionFailure(872)) - Retrying connect to server: 
jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-06-01 19:10:46,502 INFO  ipc.Client 
(Client.java:handleConnectionFailure(872)) - Retrying connect to server: 
jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 1 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-06-01 19:10:47,503 INFO  ipc.Client 
(Client.java:handleConnectionFailure(872)) - Retrying connect to server: 
jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 2 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2017-06-01 19:10:48,504 INFO  ipc.Client 
(Client.java:handleConnectionFailure(872)) - Retrying connect to server: 
jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 3 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS){noformat}

After some investigation, seems it is the same issue as described here: 
HDFS-11893



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2017-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034041#comment-16034041
 ] 

Hadoop QA commented on YARN-4140:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  3m  
0s{color} | {color:red} Docker failed to build yetus/hadoop:c420dfe. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-4140 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870904/YARN-4140-branch-2.7.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16077/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch, 
> YARN-4140-branch-2.7.001.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, 

[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2017-06-01 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034039#comment-16034039
 ] 

Jonathan Hung commented on YARN-4140:
-

Attached a branch-2.7 patch.

There were a few conflicts:
# AppSchedulingInfo cleanup in YARN-4524
# TestNodeLabelContainerAllocation was created in YARN-3361
# YARN-3357 consolidated the two TestFifoScheduler classes

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch, 
> YARN-4140-branch-2.7.001.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: 

[jira] [Updated] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2017-06-01 Thread Jonathan Hung (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-4140:

Attachment: YARN-4140-branch-2.7.001.patch

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch, 
> YARN-4140-branch-2.7.001.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}

[jira] [Reopened] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2017-06-01 Thread Jonathan Hung (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung reopened YARN-4140:
-

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> 

[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-06-01 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034017#comment-16034017
 ] 

Sunil G commented on YARN-2113:
---

HADOOP-14474 should help us there. Seems oracle java 1.7 download link is 
changed.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha4
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6634) [API] Define an API for ResourceManager WebServices

2017-06-01 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033995#comment-16033995
 ] 

Carlo Curino commented on YARN-6634:


[~giovanni.fumarola] The patch generally looks good, reasonable refactoring for 
YARN-6634. 

Please go over the javadoc and clean them up a bit. Example of small issues:
* Some of the comments only make sense for RM as an implementor, e.g., {{* 
@throws NotFoundException if the ResourceScheduler is null}}
* Some of the text you use misleading capitalizations e.g.,: {{* @throws 
Exception in case of a BadRequest}} where BadRequest is not a java object. 
* Comments of the kind {{Use with @link RMWSConsts#SCHEDULER.}} are not very 
helpful. Briefly state what the method does, beside how to use it. 

I understand you have in general removed exceptions where were never thrown 
(generally good). Why adding adding {{NotFoundException}} to 
{{getSchedulerInfo}}, {{getNodes}}, etc..?

Other than these nits, I am sure yetus would have caught any other issues, so I 
am good for the patch to get committed. [~leftnoteasy], since you first 
reviewedd, do you want to do the commit? (please push to branch-2 as well if 
possible).

> [API] Define an API for ResourceManager WebServices
> ---
>
> Key: YARN-6634
> URL: https://issues.apache.org/jira/browse/YARN-6634
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Subru Krishnan
>Assignee: Giovanni Matteo Fumarola
>Priority: Critical
> Attachments: YARN-6634.proto.patch, YARN-6634.v1.patch, 
> YARN-6634.v2.patch, YARN-6634.v3.patch, YARN-6634.v4.patch, YARN-6634.v5.patch
>
>
> The RM exposes few REST queries but there's no clear API interface defined. 
> This makes it painful to build either clients or extension components like 
> Router (YARN-5412) that expose REST interfaces themselves. This jira proposes 
> adding a RM WebServices protocol similar to the one we have for RPC, i.e. 
> {{ApplicationClientProtocol}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6683) Invalid event: COLLECTOR_UPDATE at KILLED

2017-06-01 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-6683:
--
Description: 
{code}
2017-06-01 20:01:22,686 ERROR rmapp.RMAppImpl (RMAppImpl.java:handle(905)) - 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
COLLECTOR_UPDATE at KILLED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:903)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:118)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:904)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:888)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:201)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:127)
{code}

Below code already gets the RMApp instance and then send an event to RMApp to 
update the collector address. Instead of updating via event, it could just 
update via a method of RMApp. This also avoids state-machine changes.
Also, is there any implications that COLLECTOR_UPDATE happened at KILLED state ?

{code}
  } else {
String previousCollectorAddr = rmApp.getCollectorAddr();
if (previousCollectorAddr == null
|| !previousCollectorAddr.equals(collectorAddr)) {
  // sending collector update event.
  RMAppCollectorUpdateEvent event =
  new RMAppCollectorUpdateEvent(appId, collectorAddr);
  rmContext.getDispatcher().getEventHandler().handle(event);
}
  }
{code}

  was:
{code}
2017-06-01 20:01:22,686 ERROR rmapp.RMAppImpl (RMAppImpl.java:handle(905)) - 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
COLLECTOR_UPDATE at KILLED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:903)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:118)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:904)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:888)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:201)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:127)
{code}

Below code already gets the RMApp instance and then send an event to RMApp to 
update the collector address. Instead of updating via event, it could just 
update via a method of RMApp. This also avoids state-machine changes i

{code}
  } else {
String previousCollectorAddr = rmApp.getCollectorAddr();
if (previousCollectorAddr == null
|| !previousCollectorAddr.equals(collectorAddr)) {
  // sending collector update event.
  RMAppCollectorUpdateEvent event =
  new RMAppCollectorUpdateEvent(appId, collectorAddr);
  rmContext.getDispatcher().getEventHandler().handle(event);
}
  }
{code}


> Invalid event: COLLECTOR_UPDATE at KILLED
> -
>
> Key: YARN-6683
> URL: https://issues.apache.org/jira/browse/YARN-6683
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>
> {code}
> 2017-06-01 20:01:22,686 ERROR rmapp.RMAppImpl (RMAppImpl.java:handle(905)) - 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> COLLECTOR_UPDATE at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> 

[jira] [Created] (YARN-6683) Invalid event: COLLECTOR_UPDATE at KILLED

2017-06-01 Thread Jian He (JIRA)
Jian He created YARN-6683:
-

 Summary: Invalid event: COLLECTOR_UPDATE at KILLED
 Key: YARN-6683
 URL: https://issues.apache.org/jira/browse/YARN-6683
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He


{code}
2017-06-01 20:01:22,686 ERROR rmapp.RMAppImpl (RMAppImpl.java:handle(905)) - 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
COLLECTOR_UPDATE at KILLED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:903)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:118)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:904)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:888)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:201)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:127)
{code}

Below code already gets the RMApp instance and then send an event to RMApp to 
update the collector address. Instead of updating via event, it could just 
update via a method of RMApp. This also avoids state-machine changes i

{code}
  } else {
String previousCollectorAddr = rmApp.getCollectorAddr();
if (previousCollectorAddr == null
|| !previousCollectorAddr.equals(collectorAddr)) {
  // sending collector update event.
  RMAppCollectorUpdateEvent event =
  new RMAppCollectorUpdateEvent(appId, collectorAddr);
  rmContext.getDispatcher().getEventHandler().handle(event);
}
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6669) Kerberos support for native service AM with the service REST API

2017-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033944#comment-16033944
 ] 

Hadoop QA commented on YARN-6669:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
33s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
35s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} yarn-native-services passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} yarn-native-services passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications: The patch generated 
16 new + 390 unchanged - 26 fixed = 406 total (was 416) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
51s{color} | {color:green} hadoop-yarn-slider-core in the patch passed. {color} 
|
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
12s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 32m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6669 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870884/YARN-6669.yarn-native-services.01.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 4c1b2156de09 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | yarn-native-services / c9d9c94 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/16076/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16076/testReport/ |
| modules | C: 

[jira] [Commented] (YARN-6316) Provide help information and documentation for TimelineSchemaCreator

2017-06-01 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033937#comment-16033937
 ] 

Haibo Chen commented on YARN-6316:
--

Agreed. Thanks [~vrushalic]!

> Provide help information and documentation for TimelineSchemaCreator
> 
>
> Key: YARN-6316
> URL: https://issues.apache.org/jira/browse/YARN-6316
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Haibo Chen
> Attachments: YARN-6316.00.patch, YARN-6316.prelim.patch
>
>
> Right now there is no help information for timeline schema creator. We may 
> probably want to provide an option to print help. Also, ideally, if users 
> passed in no argument, we may want to print out help, instead of directly 
> create the tables. This will simplify cluster operations and timeline v2 
> deployments. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5094) some YARN container events have timestamp of -1

2017-06-01 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033935#comment-16033935
 ] 

Haibo Chen commented on YARN-5094:
--

Thanks [~gtCarrera9] for your comments. Yes, I have limited the change to only 
events that we are interested in exposing to ATSv2 for the exact same reason 
that you pointed.

> some YARN container events have timestamp of -1
> ---
>
> Key: YARN-5094
> URL: https://issues.apache.org/jira/browse/YARN-5094
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Haibo Chen
>  Labels: YARN-5355
> Attachments: YARN-5094.00.patch, YARN-5094-YARN-2928.001.patch
>
>
> Some events in the YARN container entities have timestamp of -1. The 
> RM-generated container events have proper timestamps. It appears that it's 
> the NM-generated events that have -1: YARN_CONTAINER_CREATED, 
> YARN_CONTAINER_FINISHED, YARN_NM_CONTAINER_LOCALIZATION_FINISHED, 
> YARN_NM_CONTAINER_LOCALIZATION_STARTED.
> In the YARN container page,
> {noformat}
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: -1,
> info: { }
> },
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: -1,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: -1,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: -1,
> info: { }
> }
> {noformat}
> I think the data itself is OK, but the values are not being populated in the 
> REST output?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6670) Add separate NM overallocation thresholds for cpu and memory

2017-06-01 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033932#comment-16033932
 ] 

Haibo Chen commented on YARN-6670:
--

Findbug warnings are unrelated. Will update the patch to fix the checkstyle 
issues.

> Add separate NM overallocation thresholds for cpu and memory
> 
>
> Key: YARN-6670
> URL: https://issues.apache.org/jira/browse/YARN-6670
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-6670-YARN-1011.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6316) Provide help information and documentation for TimelineSchemaCreator

2017-06-01 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033929#comment-16033929
 ] 

Vrushali C commented on YARN-6316:
--

I have pushed to YARN-5355, now pushing to YARN-5355-branch-2 as well.

I think this should also go into trunk, what do you think [~haibo.chen] ?

> Provide help information and documentation for TimelineSchemaCreator
> 
>
> Key: YARN-6316
> URL: https://issues.apache.org/jira/browse/YARN-6316
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Haibo Chen
> Attachments: YARN-6316.00.patch, YARN-6316.prelim.patch
>
>
> Right now there is no help information for timeline schema creator. We may 
> probably want to provide an option to print help. Also, ideally, if users 
> passed in no argument, we may want to print out help, instead of directly 
> create the tables. This will simplify cluster operations and timeline v2 
> deployments. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6670) Add separate NM overallocation thresholds for cpu and memory

2017-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033930#comment-16033930
 ] 

Hadoop QA commented on YARN-6670:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
56s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
29s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
12s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
26s{color} | {color:green} YARN-1011 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
3s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
YARN-1011 has 1 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
49s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 in YARN-1011 has 5 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
46s{color} | {color:green} YARN-1011 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
39s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 54s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 2 new + 206 unchanged - 0 fixed = 208 total (was 206) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
34s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
25s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
34s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m  
1s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 84m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6670 |
| JIRA Patch URL | 

[jira] [Commented] (YARN-5094) some YARN container events have timestamp of -1

2017-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033917#comment-16033917
 ] 

Hadoop QA commented on YARN-5094:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
42s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 in trunk has 5 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 12m 56s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 35m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.application.TestApplication |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-5094 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870878/YARN-5094.00.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 8490e8ffcae4 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 7101477 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/16075/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/16075/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16075/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 

[jira] [Updated] (YARN-6669) Kerberos support for native service AM with the service REST API

2017-06-01 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-6669:
--
Attachment: YARN-6669.yarn-native-services.01.patch

Added a new filed in the service API spec
and also removed some unused methods or fields in existing code.

> Kerberos support for native service AM with the service REST API
> 
>
> Key: YARN-6669
> URL: https://issues.apache.org/jira/browse/YARN-6669
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-6669.yarn-native-services.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5094) some YARN container events have timestamp of -1

2017-06-01 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033890#comment-16033890
 ] 

Li Lu commented on YARN-5094:
-

Sorry about the delay. Please feel free to take it. Let's not touch 
AbstractEvent as a whole but treat different events separately? Also, even for 
NM related events we should be careful about the actual performance. I barely 
remember that my last conclusion (a year ago) was it's fine (to assign a 
timestamp for NM events), but let's be careful. 

> some YARN container events have timestamp of -1
> ---
>
> Key: YARN-5094
> URL: https://issues.apache.org/jira/browse/YARN-5094
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Haibo Chen
>  Labels: YARN-5355
> Attachments: YARN-5094.00.patch, YARN-5094-YARN-2928.001.patch
>
>
> Some events in the YARN container entities have timestamp of -1. The 
> RM-generated container events have proper timestamps. It appears that it's 
> the NM-generated events that have -1: YARN_CONTAINER_CREATED, 
> YARN_CONTAINER_FINISHED, YARN_NM_CONTAINER_LOCALIZATION_FINISHED, 
> YARN_NM_CONTAINER_LOCALIZATION_STARTED.
> In the YARN container page,
> {noformat}
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: -1,
> info: { }
> },
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: -1,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: -1,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: -1,
> info: { }
> }
> {noformat}
> I think the data itself is OK, but the values are not being populated in the 
> REST output?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5094) some YARN container events have timestamp of -1

2017-06-01 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5094:
-
Attachment: YARN-5094.00.patch

> some YARN container events have timestamp of -1
> ---
>
> Key: YARN-5094
> URL: https://issues.apache.org/jira/browse/YARN-5094
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Haibo Chen
>  Labels: YARN-5355
> Attachments: YARN-5094.00.patch, YARN-5094-YARN-2928.001.patch
>
>
> Some events in the YARN container entities have timestamp of -1. The 
> RM-generated container events have proper timestamps. It appears that it's 
> the NM-generated events that have -1: YARN_CONTAINER_CREATED, 
> YARN_CONTAINER_FINISHED, YARN_NM_CONTAINER_LOCALIZATION_FINISHED, 
> YARN_NM_CONTAINER_LOCALIZATION_STARTED.
> In the YARN container page,
> {noformat}
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: -1,
> info: { }
> },
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: -1,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: -1,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: -1,
> info: { }
> }
> {noformat}
> I think the data itself is OK, but the values are not being populated in the 
> REST output?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5094) some YARN container events have timestamp of -1

2017-06-01 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033878#comment-16033878
 ] 

Haibo Chen commented on YARN-5094:
--

Sorry, I just realized that AbstractEvent is used in everywhere. To be 
conservative, I think we can set the timestamp for events only when we actually 
need the timestamps (e.g. events exposed in ATSv2)
Thus, I am uploading a patch here to only change NM 
container/localization/application event timestamps.

> some YARN container events have timestamp of -1
> ---
>
> Key: YARN-5094
> URL: https://issues.apache.org/jira/browse/YARN-5094
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Haibo Chen
>  Labels: YARN-5355
> Attachments: YARN-5094-YARN-2928.001.patch
>
>
> Some events in the YARN container entities have timestamp of -1. The 
> RM-generated container events have proper timestamps. It appears that it's 
> the NM-generated events that have -1: YARN_CONTAINER_CREATED, 
> YARN_CONTAINER_FINISHED, YARN_NM_CONTAINER_LOCALIZATION_FINISHED, 
> YARN_NM_CONTAINER_LOCALIZATION_STARTED.
> In the YARN container page,
> {noformat}
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: -1,
> info: { }
> },
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: -1,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: -1,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: -1,
> info: { }
> }
> {noformat}
> I think the data itself is OK, but the values are not being populated in the 
> REST output?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5094) some YARN container events have timestamp of -1

2017-06-01 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5094:
-
Summary: some YARN container events have timestamp of -1  (was: some YARN 
container events have timestamp of -1 in REST output)

> some YARN container events have timestamp of -1
> ---
>
> Key: YARN-5094
> URL: https://issues.apache.org/jira/browse/YARN-5094
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Haibo Chen
>  Labels: YARN-5355
> Attachments: YARN-5094-YARN-2928.001.patch
>
>
> Some events in the YARN container entities have timestamp of -1. The 
> RM-generated container events have proper timestamps. It appears that it's 
> the NM-generated events that have -1: YARN_CONTAINER_CREATED, 
> YARN_CONTAINER_FINISHED, YARN_NM_CONTAINER_LOCALIZATION_FINISHED, 
> YARN_NM_CONTAINER_LOCALIZATION_STARTED.
> In the YARN container page,
> {noformat}
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: -1,
> info: { }
> },
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: -1,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: -1,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: -1,
> info: { }
> }
> {noformat}
> I think the data itself is OK, but the values are not being populated in the 
> REST output?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6670) Add separate NM overallocation thresholds for cpu and memory

2017-06-01 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-6670:
-
Attachment: YARN-6670-YARN-1011.00.patch

> Add separate NM overallocation thresholds for cpu and memory
> 
>
> Key: YARN-6670
> URL: https://issues.apache.org/jira/browse/YARN-6670
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-6670-YARN-1011.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6670) Add separate NM overallocation thresholds for cpu and memory

2017-06-01 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-6670:
-
Summary: Add separate NM overallocation thresholds for cpu and memory  
(was: Add separate oversubscription thresholds for cpu and memory)

> Add separate NM overallocation thresholds for cpu and memory
> 
>
> Key: YARN-6670
> URL: https://issues.apache.org/jira/browse/YARN-6670
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6670) Add separate oversubscription thresholds for cpu and memory

2017-06-01 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-6670:
-
Summary: Add separate oversubscription thresholds for cpu and memory  (was: 
Add separate configurations for cpu and memory oversubscription)

> Add separate oversubscription thresholds for cpu and memory
> ---
>
> Key: YARN-6670
> URL: https://issues.apache.org/jira/browse/YARN-6670
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6682) Improve performance of AssignmentInformation datastructures

2017-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033806#comment-16033806
 ] 

Hadoop QA commented on YARN-6682:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
12s{color} | {color:red} Docker failed to build yetus/hadoop:5970e82. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6682 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870870/YARN-6682.branch-2.8.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16073/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Improve performance of AssignmentInformation datastructures
> ---
>
> Key: YARN-6682
> URL: https://issues.apache.org/jira/browse/YARN-6682
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: YARN-6682.branch-2.8.patch, YARN-6682.branch-2.patch, 
> YARN-6682.trunk.patch
>
>
> {{AssignmentInformation}} is inefficient and creates lots of garbage that 
> increase gc pressure.  It creates 3 hashmaps that each contain only 2 
> enum-based keys. This requires wrapper node objects, boxing/unboxing of ints, 
> and more expensive lookups than simply using primitive arrays indexed by enum 
> ordinal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6682) Improve performance of AssignmentInformation datastructures

2017-06-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated YARN-6682:
--
Attachment: YARN-6682.branch-2.8.patch
YARN-6682.branch-2.patch
YARN-6682.trunk.patch

Simply uses primitive arrays indexed on ordinal.  Differences between branches 
is trivial: ex. generics removal, containerId vs rmContainer.

> Improve performance of AssignmentInformation datastructures
> ---
>
> Key: YARN-6682
> URL: https://issues.apache.org/jira/browse/YARN-6682
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: YARN-6682.branch-2.8.patch, YARN-6682.branch-2.patch, 
> YARN-6682.trunk.patch
>
>
> {{AssignmentInformation}} is inefficient and creates lots of garbage that 
> increase gc pressure.  It creates 3 hashmaps that each contain only 2 
> enum-based keys. This requires wrapper node objects, boxing/unboxing of ints, 
> and more expensive lookups than simply using primitive arrays indexed by enum 
> ordinal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6634) [API] Define an API for ResourceManager WebServices

2017-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033784#comment-16033784
 ] 

Hadoop QA commented on YARN-6634:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 22s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 8 new + 4 unchanged - 36 fixed = 12 total (was 40) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
 generated 0 new + 852 unchanged - 23 fixed = 852 total (was 875) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 39m 
16s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 61m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6634 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870855/YARN-6634.v5.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux c4f1d16d16a2 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / ece3320 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/16071/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16071/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16071/console |
| Powered by | 

[jira] [Commented] (YARN-6585) RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.

2017-06-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033720#comment-16033720
 ] 

Eric Payne commented on YARN-6585:
--

bq. i also had to a newInstance method in addToClusterNodeLabelsRequest to 
accept labels as string.
I'm sorry, maybe I'm missing something [~sunilg], but I don't understand why we 
needed to add a deprecated {{newInstance}} method that is only called by the 
new unit test.

> RM fails to start when upgrading from 2.7 to 2.8 for clusters with node 
> labels.
> ---
>
> Key: YARN-6585
> URL: https://issues.apache.org/jira/browse/YARN-6585
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-6585.0001.patch, YARN-6585.0002.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Not all labels being replaced contained by 
> known label collections, please check, new labels=[abc]
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6682) Improve performance of AssignmentInformation datastructures

2017-06-01 Thread Daryn Sharp (JIRA)
Daryn Sharp created YARN-6682:
-

 Summary: Improve performance of AssignmentInformation 
datastructures
 Key: YARN-6682
 URL: https://issues.apache.org/jira/browse/YARN-6682
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


{{AssignmentInformation}} is inefficient and creates lots of garbage that 
increase gc pressure.  It creates 3 hashmaps that each contain only 2 
enum-based keys. This requires wrapper node objects, boxing/unboxing of ints, 
and more expensive lookups than simply using primitive arrays indexed by enum 
ordinal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6681) Eliminate double-copy of child queues in canAssignToThisQueue

2017-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033699#comment-16033699
 ] 

Hadoop QA commented on YARN-6681:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
12s{color} | {color:red} Docker failed to build yetus/hadoop:8515d35. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6681 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870857/YARN-6681.branch-2.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16072/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Eliminate double-copy of child queues in canAssignToThisQueue
> -
>
> Key: YARN-6681
> URL: https://issues.apache.org/jira/browse/YARN-6681
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: YARN-6681.branch-2.8.patch, YARN-6681.branch-2.patch, 
> YARN-6681.trunk.patch
>
>
> 20% of the time in {{AbstractCSQueue#canAssignToThisQueue}} is spent 
> performing two duplications a treemap of child queues into a list - once to 
> test for null, second to see if it's empty.  Eliminating the dups reduces the 
> overhead to 2%.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6681) Eliminate double-copy of child queues in canAssignToThisQueue

2017-06-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated YARN-6681:
--
Attachment: YARN-6681.branch-2.patch
YARN-6681.branch-2.8.patch

The branch-2 patch was really branch-2.8.  All 3 patches are just context.

> Eliminate double-copy of child queues in canAssignToThisQueue
> -
>
> Key: YARN-6681
> URL: https://issues.apache.org/jira/browse/YARN-6681
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: YARN-6681.branch-2.8.patch, YARN-6681.branch-2.patch, 
> YARN-6681.trunk.patch
>
>
> 20% of the time in {{AbstractCSQueue#canAssignToThisQueue}} is spent 
> performing two duplications a treemap of child queues into a list - once to 
> test for null, second to see if it's empty.  Eliminating the dups reduces the 
> overhead to 2%.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6681) Eliminate double-copy of child queues in canAssignToThisQueue

2017-06-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated YARN-6681:
--
Attachment: (was: YARN-6681.branch-2.patch)

> Eliminate double-copy of child queues in canAssignToThisQueue
> -
>
> Key: YARN-6681
> URL: https://issues.apache.org/jira/browse/YARN-6681
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: YARN-6681.trunk.patch
>
>
> 20% of the time in {{AbstractCSQueue#canAssignToThisQueue}} is spent 
> performing two duplications a treemap of child queues into a list - once to 
> test for null, second to see if it's empty.  Eliminating the dups reduces the 
> overhead to 2%.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6634) [API] Define an API for ResourceManager WebServices

2017-06-01 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-6634:
---
Attachment: YARN-6634.v5.patch

> [API] Define an API for ResourceManager WebServices
> ---
>
> Key: YARN-6634
> URL: https://issues.apache.org/jira/browse/YARN-6634
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Subru Krishnan
>Assignee: Giovanni Matteo Fumarola
>Priority: Critical
> Attachments: YARN-6634.proto.patch, YARN-6634.v1.patch, 
> YARN-6634.v2.patch, YARN-6634.v3.patch, YARN-6634.v4.patch, YARN-6634.v5.patch
>
>
> The RM exposes few REST queries but there's no clear API interface defined. 
> This makes it painful to build either clients or extension components like 
> Router (YARN-5412) that expose REST interfaces themselves. This jira proposes 
> adding a RM WebServices protocol similar to the one we have for RPC, i.e. 
> {{ApplicationClientProtocol}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6680) Avoid locking overhead for NO_LABEL lookups

2017-06-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033665#comment-16033665
 ] 

Daryn Sharp commented on YARN-6680:
---

Findbugs warning is from {{AggregatedLogFormat}} which is not part of this 
patch.

> Avoid locking overhead for NO_LABEL lookups
> ---
>
> Key: YARN-6680
> URL: https://issues.apache.org/jira/browse/YARN-6680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: YARN-6680.patch
>
>
> Labels are managed via a hash that is protected with a read lock.  The lock 
> acquire and release are each just as expensive as the hash lookup itself - 
> resulting in a 3X slowdown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6679) Reduce Resource instance overhead via non-PBImpl

2017-06-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033655#comment-16033655
 ] 

Daryn Sharp commented on YARN-6679:
---

Thanks Daniel!

Using signum to handle a NaN (impossible, I hope?) may not be worth the cost.   
I only moved the pre-existing method from the pb impl to the base class so I'd 
suggest another jira if feel strongly that it should be changed?

bq. How thoroughly has this been tested?
I'd say very.  The first large/busy pre-production cluster was crippled by 2.8. 
 The scheduler thread was constantly pegging a cpu and falling behind.  Deploys 
were halted.  We deployed my collection of patches about 1.5w ago.  Cpu 
fluctuates a lot, but doesn't stay pegged anymore.

bq. Wanna add some unit tests to confirm the newInstance() methods and the PB 
conversion work as expected?
I would certainly hope the existing tests provide coverage! :)  I didn't expose 
any new methods to test but I'll concoct some rudimentary tests if need be.

> Reduce Resource instance overhead via non-PBImpl
> 
>
> Key: YARN-6679
> URL: https://issues.apache.org/jira/browse/YARN-6679
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: YARN-6679.branch-2.patch, YARN-6679.trunk.patch
>
>
> Creating and using transient PB-based Resource instances during scheduling is 
> very expensive.  The overhead can be transparently reduced by internally 
> using lightweight non-PB based instances.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2017-06-01 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter reassigned YARN-5464:
---

Assignee: (was: Robert Kanter)

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Priority: Blocker
> Attachments: YARN-5464.wip.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2017-06-01 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter reassigned YARN-5464:
---

Assignee: Robert Kanter

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Blocker
> Attachments: YARN-5464.wip.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2017-06-01 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter reassigned YARN-5464:
---

Assignee: (was: Robert Kanter)

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Priority: Blocker
> Attachments: YARN-5464.wip.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2017-06-01 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033637#comment-16033637
 ] 

Robert Kanter commented on YARN-5464:
-

I won't have time to work on this for quite a while, so I'm unassigning myself 
in case someone else wants to pick it up.  I did manage to implement the state 
store logic (storing the decommissioning nodes data, updating the data, 
recovering the data, unit tests, etc).  However, it's incomplete because 
recovery doesn't work properly: the problem is that there's a timing issue 
between recovering the node data, loading in the include and exclude files, and 
the NMs heartbeating.  As it stands, when the data gets recovered, it can't 
find the {{NodeId}} because it's been removed due to the exclude file.  I'll 
attach the wip patch in case anyone picks this up.

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Blocker
> Attachments: YARN-5464.wip.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2017-06-01 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-5464:

Attachment: YARN-5464.wip.patch

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Priority: Blocker
> Attachments: YARN-5464.wip.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6680) Avoid locking overhead for NO_LABEL lookups

2017-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033462#comment-16033462
 ] 

Hadoop QA commented on YARN-6680:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
8s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
53s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 53s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 97 unchanged - 0 fixed = 98 total (was 97) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
27s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 39m 
52s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 97m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6680 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870820/YARN-6680.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux b332be5b529a 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / ff31035 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/16068/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-warnings.html
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/16068/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
 |
|  Test Results | 

[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-06-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033429#comment-16033429
 ] 

Eric Payne commented on YARN-2113:
--

Hmm. Still getting the 404:
{noformat}
[0m[91mConnecting to download.oracle.com 
(download.oracle.com)|23.61.194.27|:80... [0m[91mconnected.
HTTP request sent, awaiting response... [0m[91m404 Not Found
[0m[91m2017-06-01 17:35:54 ERROR 404: Not Found.
{noformat}
I wonder what's up with the {{[0m[91m}} everywhere.

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha4
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033357#comment-16033357
 ] 

Hadoop QA commented on YARN-2113:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  4m 
10s{color} | {color:red} Docker failed to build yetus/hadoop:5970e82. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-2113 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12869941/YARN-2113.branch-2.8.0019.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16070/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 3.0.0-alpha4
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.8.0019.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6681) Eliminate double-copy of child queues in canAssignToThisQueue

2017-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033344#comment-16033344
 ] 

Hadoop QA commented on YARN-6681:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 11s{color} 
| {color:red} YARN-6681 does not apply to branch-2. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6681 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870822/YARN-6681.branch-2.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16069/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Eliminate double-copy of child queues in canAssignToThisQueue
> -
>
> Key: YARN-6681
> URL: https://issues.apache.org/jira/browse/YARN-6681
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: YARN-6681.branch-2.patch, YARN-6681.trunk.patch
>
>
> 20% of the time in {{AbstractCSQueue#canAssignToThisQueue}} is spent 
> performing two duplications a treemap of child queues into a list - once to 
> test for null, second to see if it's empty.  Eliminating the dups reduces the 
> overhead to 2%.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6681) Eliminate double-copy of child queues in canAssignToThisQueue

2017-06-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated YARN-6681:
--
Attachment: YARN-6681.branch-2.patch
YARN-6681.trunk.patch

Add a {{hasChildQueues}} method that is overridden by {{ParentQueue}} to avoid 
the tree to list duplications.

> Eliminate double-copy of child queues in canAssignToThisQueue
> -
>
> Key: YARN-6681
> URL: https://issues.apache.org/jira/browse/YARN-6681
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: YARN-6681.branch-2.patch, YARN-6681.trunk.patch
>
>
> 20% of the time in {{AbstractCSQueue#canAssignToThisQueue}} is spent 
> performing two duplications a treemap of child queues into a list - once to 
> test for null, second to see if it's empty.  Eliminating the dups reduces the 
> overhead to 2%.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6679) Reduce Resource instance overhead via non-PBImpl

2017-06-01 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033309#comment-16033309
 ] 

Daniel Templeton commented on YARN-6679:


Thanks, [~daryn].  One comment: in the {{Resource.compareTo()}} method, you can 
replace the final ternary with {{Math.signum()}}.

How thoroughly has this been tested?  Wanna add some unit tests to confirm the 
{{newInstance()}} methods and the PB conversion work as expected?

> Reduce Resource instance overhead via non-PBImpl
> 
>
> Key: YARN-6679
> URL: https://issues.apache.org/jira/browse/YARN-6679
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: YARN-6679.branch-2.patch, YARN-6679.trunk.patch
>
>
> Creating and using transient PB-based Resource instances during scheduling is 
> very expensive.  The overhead can be transparently reduced by internally 
> using lightweight non-PB based instances.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6585) RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.

2017-06-01 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-6585:
-
Target Version/s: 2.8.2  (was: 2.8.1)

> RM fails to start when upgrading from 2.7 to 2.8 for clusters with node 
> labels.
> ---
>
> Key: YARN-6585
> URL: https://issues.apache.org/jira/browse/YARN-6585
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-6585.0001.patch, YARN-6585.0002.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Not all labels being replaced contained by 
> known label collections, please check, new labels=[abc]
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6047) Documentation updates for TimelineService v2

2017-06-01 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033291#comment-16033291
 ] 

Haibo Chen commented on YARN-6047:
--

Linked YARN-6323 to document the first-time rolling upgrade case.

> Documentation updates for TimelineService v2
> 
>
> Key: YARN-6047
> URL: https://issues.apache.org/jira/browse/YARN-6047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, timelineserver
>Reporter: Varun Saxena
>Assignee: Rohith Sharma K S
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-6047-YARN-5355.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6681) Eliminate double-copy of child queues in canAssignToThisQueue

2017-06-01 Thread Daryn Sharp (JIRA)
Daryn Sharp created YARN-6681:
-

 Summary: Eliminate double-copy of child queues in 
canAssignToThisQueue
 Key: YARN-6681
 URL: https://issues.apache.org/jira/browse/YARN-6681
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


20% of the time in {{AbstractCSQueue#canAssignToThisQueue}} is spent performing 
two duplications a treemap of child queues into a list - once to test for null, 
second to see if it's empty.  Eliminating the dups reduces the overhead to 2%.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6680) Avoid locking overhead for NO_LABEL lookups

2017-06-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated YARN-6680:
--
Attachment: YARN-6680.patch

Simply maintains a reference to the no-label instances that are already being 
seeded into maps.  Lookups of the no-label key will use the reference in lieu 
of the lock and hash.

Not a perfect solution but provides a significant performance boost until the 
maps can be changed to concurrent.

> Avoid locking overhead for NO_LABEL lookups
> ---
>
> Key: YARN-6680
> URL: https://issues.apache.org/jira/browse/YARN-6680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: YARN-6680.patch
>
>
> Labels are managed via a hash that is protected with a read lock.  The lock 
> acquire and release are each just as expensive as the hash lookup itself - 
> resulting in a 3X slowdown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6680) Avoid locking overhead for NO_LABEL lookups

2017-06-01 Thread Daryn Sharp (JIRA)
Daryn Sharp created YARN-6680:
-

 Summary: Avoid locking overhead for NO_LABEL lookups
 Key: YARN-6680
 URL: https://issues.apache.org/jira/browse/YARN-6680
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Labels are managed via a hash that is protected with a read lock.  The lock 
acquire and release are each just as expensive as the hash lookup itself - 
resulting in a 3X slowdown.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5094) some YARN container events have timestamp of -1 in REST output

2017-06-01 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033262#comment-16033262
 ] 

Haibo Chen commented on YARN-5094:
--

As indicated in the comments above, YARN-5169 is the real cause. I am planning 
to fix the issue there.

> some YARN container events have timestamp of -1 in REST output
> --
>
> Key: YARN-5094
> URL: https://issues.apache.org/jira/browse/YARN-5094
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Haibo Chen
>  Labels: YARN-5355
> Attachments: YARN-5094-YARN-2928.001.patch
>
>
> Some events in the YARN container entities have timestamp of -1. The 
> RM-generated container events have proper timestamps. It appears that it's 
> the NM-generated events that have -1: YARN_CONTAINER_CREATED, 
> YARN_CONTAINER_FINISHED, YARN_NM_CONTAINER_LOCALIZATION_FINISHED, 
> YARN_NM_CONTAINER_LOCALIZATION_STARTED.
> In the YARN container page,
> {noformat}
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: -1,
> info: { }
> },
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: -1,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: -1,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: -1,
> info: { }
> }
> {noformat}
> I think the data itself is OK, but the values are not being populated in the 
> REST output?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6679) Reduce Resource instance overhead via non-PBImpl

2017-06-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated YARN-6679:
--
Attachment: YARN-6679.branch-2.patch

Same as trunk patch, plus 1-line changes to the deprecated and removed resource 
increase/decrease PBs and request PB.

> Reduce Resource instance overhead via non-PBImpl
> 
>
> Key: YARN-6679
> URL: https://issues.apache.org/jira/browse/YARN-6679
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: YARN-6679.branch-2.patch, YARN-6679.trunk.patch
>
>
> Creating and using transient PB-based Resource instances during scheduling is 
> very expensive.  The overhead can be transparently reduced by internally 
> using lightweight non-PB based instances.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-6245) Add FinalResource object to reduce overhead of Resource class instancing

2017-06-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated YARN-6245:
--
Comment: was deleted

(was: Same as trunk patch, plus 1-line changes to the deprecated and removed 
resource increase/decrease PBs and request PB.)

> Add FinalResource object to reduce overhead of Resource class instancing
> 
>
> Key: YARN-6245
> URL: https://issues.apache.org/jira/browse/YARN-6245
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
> Attachments: observable-resource.patch, 
> YARN-6245.preliminary-staled.1.patch
>
>
> There're lots of Resource object creation in YARN Scheduler, since Resource 
> object is backed by protobuf, creation of such objects is expensive and 
> becomes bottleneck.
> To address the problem, we can introduce a FinalResource (Is it better to 
> call it ImmutableResource?) object, which is not backed by PBImpl. We can use 
> this object in frequent invoke paths in the scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6245) Add FinalResource object to reduce overhead of Resource class instancing

2017-06-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated YARN-6245:
--
Attachment: YARN-6679.branch-2.patch

Same as trunk patch, plus 1-line changes to the deprecated and removed resource 
increase/decrease PBs and request PB.

> Add FinalResource object to reduce overhead of Resource class instancing
> 
>
> Key: YARN-6245
> URL: https://issues.apache.org/jira/browse/YARN-6245
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
> Attachments: observable-resource.patch, 
> YARN-6245.preliminary-staled.1.patch
>
>
> There're lots of Resource object creation in YARN Scheduler, since Resource 
> object is backed by protobuf, creation of such objects is expensive and 
> becomes bottleneck.
> To address the problem, we can introduce a FinalResource (Is it better to 
> call it ImmutableResource?) object, which is not backed by PBImpl. We can use 
> this object in frequent invoke paths in the scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6245) Add FinalResource object to reduce overhead of Resource class instancing

2017-06-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated YARN-6245:
--
Attachment: (was: YARN-6679.branch-2.patch)

> Add FinalResource object to reduce overhead of Resource class instancing
> 
>
> Key: YARN-6245
> URL: https://issues.apache.org/jira/browse/YARN-6245
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
> Attachments: observable-resource.patch, 
> YARN-6245.preliminary-staled.1.patch
>
>
> There're lots of Resource object creation in YARN Scheduler, since Resource 
> object is backed by protobuf, creation of such objects is expensive and 
> becomes bottleneck.
> To address the problem, we can introduce a FinalResource (Is it better to 
> call it ImmutableResource?) object, which is not backed by PBImpl. We can use 
> this object in frequent invoke paths in the scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6245) Add FinalResource object to reduce overhead of Resource class instancing

2017-06-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033233#comment-16033233
 ] 

Daryn Sharp commented on YARN-6245:
---

Posted to new YARN-6679 in case you wish to pursue the immutable resources, 
although the scheduler really should try to reuse instances when possible.

> Add FinalResource object to reduce overhead of Resource class instancing
> 
>
> Key: YARN-6245
> URL: https://issues.apache.org/jira/browse/YARN-6245
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
> Attachments: observable-resource.patch, 
> YARN-6245.preliminary-staled.1.patch
>
>
> There're lots of Resource object creation in YARN Scheduler, since Resource 
> object is backed by protobuf, creation of such objects is expensive and 
> becomes bottleneck.
> To address the problem, we can introduce a FinalResource (Is it better to 
> call it ImmutableResource?) object, which is not backed by PBImpl. We can use 
> this object in frequent invoke paths in the scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6679) Reduce Resource instance overhead via non-PBImpl

2017-06-01 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated YARN-6679:
--
Attachment: YARN-6679.trunk.patch

# Add private {{SimpleResource}} object with longs for memory/vcores.
# {{ResourcePBImpl#getProto(Resource)}} converts to a PBImpl as necessary to 
create a message
# Bulk of patch is regexp change of {{(ResourcePBImpl) r).getProto()}} to 
{{ProtoUtils.convertToProtoFormat(r)}} or the local class conversion method.
# In a few places, just set the PB instead of creating and checking equality 
before setting.  Otherwise simple resources will be double converted.

Overall effect is resources via PBs will remain PB-based.  The myriad of 
internal resource instances used during calculations become lightweight and 
converted to a PB iff necessary.

> Reduce Resource instance overhead via non-PBImpl
> 
>
> Key: YARN-6679
> URL: https://issues.apache.org/jira/browse/YARN-6679
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: YARN-6679.trunk.patch
>
>
> Creating and using transient PB-based Resource instances during scheduling is 
> very expensive.  The overhead can be transparently reduced by internally 
> using lightweight non-PB based instances.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6679) Reduce Resource instance overhead via non-PBImpl

2017-06-01 Thread Daryn Sharp (JIRA)
Daryn Sharp created YARN-6679:
-

 Summary: Reduce Resource instance overhead via non-PBImpl
 Key: YARN-6679
 URL: https://issues.apache.org/jira/browse/YARN-6679
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Creating and using transient PB-based Resource instances during scheduling is 
very expensive.  The overhead can be transparently reduced by internally using 
lightweight non-PB based instances.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6323) Rolling upgrade/config change is broken on timeline v2.

2017-06-01 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033183#comment-16033183
 ] 

Vrushali C commented on YARN-6323:
--

Hmm, using the same thing for flow name at both RM and NM does sound good. But 
I think using the app name as the default on the RM side is good because if the 
client does not set the tags, the app name is more meaningful than the app id. 
I wonder if at the RM side, there is a way to distinguish between regular use 
of default for flow name and the use during restart with ATSv2 on. 

> Rolling upgrade/config change is broken on timeline v2. 
> 
>
> Key: YARN-6323
> URL: https://issues.apache.org/jira/browse/YARN-6323
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Vrushali C
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-6323.001.patch
>
>
> Found this issue when deploying on real clusters. If there are apps running 
> when we enable timeline v2 (with work preserving restart enabled), node 
> managers will fail to start due to missing app context data. We should 
> probably assign some default names to these "left over" apps. I believe it's 
> suboptimal to let users clean up the whole cluster before enabling timeline 
> v2. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6245) Add FinalResource object to reduce overhead of Resource class instancing

2017-06-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033054#comment-16033054
 ] 

Daryn Sharp commented on YARN-6245:
---

I've been OOO.  I'll be posting my collection of patches today for review.

> Add FinalResource object to reduce overhead of Resource class instancing
> 
>
> Key: YARN-6245
> URL: https://issues.apache.org/jira/browse/YARN-6245
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
> Attachments: observable-resource.patch, 
> YARN-6245.preliminary-staled.1.patch
>
>
> There're lots of Resource object creation in YARN Scheduler, since Resource 
> object is backed by protobuf, creation of such objects is expensive and 
> becomes bottleneck.
> To address the problem, we can introduce a FinalResource (Is it better to 
> call it ImmutableResource?) object, which is not backed by PBImpl. We can use 
> this object in frequent invoke paths in the scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5648) [Security] Client side changes for authentication

2017-06-01 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-5648:
---
Attachment: YARN-5648-YARN-5355.03.patch

> [Security] Client side changes for authentication
> -
>
> Key: YARN-5648
> URL: https://issues.apache.org/jira/browse/YARN-5648
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-5648-YARN-5355.02.patch, 
> YARN-5648-YARN-5355.03.patch, YARN-5648-YARN-5355.wip.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6638) [Security] Timeline reader side changes for loading auth filters and principals

2017-06-01 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-6638:
---
Attachment: YARN-6638-YARN-5355.01.patch

> [Security] Timeline reader side changes for loading auth filters and 
> principals
> ---
>
> Key: YARN-6638
> URL: https://issues.apache.org/jira/browse/YARN-6638
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-6638-YARN-5355.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5647) [Security] Collector side changes for loading auth filters and principals

2017-06-01 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-5647:
---
Attachment: YARN-5647-YARN-5355.007.patch

> [Security] Collector side changes for loading auth filters and principals
> -
>
> Key: YARN-5647
> URL: https://issues.apache.org/jira/browse/YARN-5647
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-5647-YARN-5355.004.patch, 
> YARN-5647-YARN-5355.005.patch, YARN-5647-YARN-5355.006.patch, 
> YARN-5647-YARN-5355.007.patch, YARN-5647-YARN-5355.wip.002.patch, 
> YARN-5647-YARN-5355.wip.003.patch, YARN-5647-YARN-5355.wip.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler

2017-06-01 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-6678:
---
Description: 
Error log:
{noformat}
java.lang.IllegalStateException: Trying to reserve container 
container_e10_1495599791406_7129_01_001453 for application 
appattempt_1495599791406_7129_01 when currently reserved container 
container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 
#containers=40 available=... used=...
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
{noformat}

Reproduce this problem:
1. nm1 re-reserved app-1/container-X1 and generated reserve proposal-1
2. nm2 had enough resource for app-1, un-reserved app-1/container-X1 and 
allocated app-1/container-X2
3. nm1 reserved app-2/container-Y
4. proposal-1 was accepted but throw IllegalStateException when applying

Currently the check code for reserve proposal in FiCaSchedulerApp#accept as 
follows:
{code}
  // Container reserved first time will be NEW, after the container
  // accepted & confirmed, it will become RESERVED state
  if (schedulerContainer.getRmContainer().getState()
  == RMContainerState.RESERVED) {
// Set reReservation == true
reReservation = true;
  } else {
// When reserve a resource (state == NEW is for new container,
// state == RUNNING is for increase container).
// Just check if the node is not already reserved by someone
if (schedulerContainer.getSchedulerNode().getReservedContainer()
!= null) {
  if (LOG.isDebugEnabled()) {
LOG.debug("Try to reserve a container, but the node is "
+ "already reserved by another container="
+ schedulerContainer.getSchedulerNode()
.getReservedContainer().getContainerId());
  }
  return false;
}
  }
{code}

The reserved container on the node of reserve proposal will be checked only for 
first-reserve container, not for the re-reserve container.
I think FiCaSchedulerApp#accept should do this check for all reserve proposal 
not matter if the container is re-reserve or not.


  was:
Error log:
{noformat}
java.lang.IllegalStateException: Trying to reserve container 
container_e10_1495599791406_7129_01_001453 for application 
appattempt_1495599791406_7129_01 when currently reserved container 
container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 
#containers=40 available=... used=...
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
{noformat}

Reproduce this problem:
1. nm1 re-reserved app-1/container-X1 and generated reserved proposal-1
2. nm2 has enough resource for app-1, un-reserved app-1/container-X1 and 
allocated app-1/container-X2
3. nm1 reserved app-2/container-Y
4. proposal-1 was accepted but throw IllegalStateException when applying

Currently the check code for reserve proposal in FiCaSchedulerApp#accept as 
follows:
{code}
  // Container reserved first time will be NEW, after the container
  // accepted & confirmed, it will become RESERVED state
  if (schedulerContainer.getRmContainer().getState()
  == RMContainerState.RESERVED) {
// Set reReservation == true
reReservation = true;
  } else {
// When reserve a resource (state == NEW is for new container,
// state == RUNNING is for increase container).
// Just check if the node is not already reserved by someone
if (schedulerContainer.getSchedulerNode().getReservedContainer()
   

[jira] [Assigned] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler

2017-06-01 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang reassigned YARN-6678:
--

Assignee: Tao Yang

> Committer thread crashes with IllegalStateException in async-scheduling mode 
> of CapacityScheduler
> -
>
> Key: YARN-6678
> URL: https://issues.apache.org/jira/browse/YARN-6678
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-6678.001.patch
>
>
> Error log:
> {noformat}
> java.lang.IllegalStateException: Trying to reserve container 
> container_e10_1495599791406_7129_01_001453 for application 
> appattempt_1495599791406_7129_01 when currently reserved container 
> container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 
> #containers=40 available=... used=...
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
> {noformat}
> Reproduce this problem:
> 1. nm1 re-reserved app-1/container-X1 and generated reserved proposal-1
> 2. nm2 has enough resource for app-1, un-reserved app-1/container-X1 and 
> allocated app-1/container-X2
> 3. nm1 reserved app-2/container-Y
> 4. proposal-1 was accepted but throw IllegalStateException when applying
> Currently the check code for reserve proposal in FiCaSchedulerApp#accept as 
> follows:
> {code}
>   // Container reserved first time will be NEW, after the container
>   // accepted & confirmed, it will become RESERVED state
>   if (schedulerContainer.getRmContainer().getState()
>   == RMContainerState.RESERVED) {
> // Set reReservation == true
> reReservation = true;
>   } else {
> // When reserve a resource (state == NEW is for new container,
> // state == RUNNING is for increase container).
> // Just check if the node is not already reserved by someone
> if (schedulerContainer.getSchedulerNode().getReservedContainer()
> != null) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Try to reserve a container, but the node is "
> + "already reserved by another container="
> + schedulerContainer.getSchedulerNode()
> .getReservedContainer().getContainerId());
>   }
>   return false;
> }
>   }
> {code}
> The reserved container on the node of reserve proposal will be checked only 
> for first-reserve container, not for the re-reserve container.
> I think FiCaSchedulerApp#accept should do this check for all reserve proposal 
> not matter if the container is re-reserve or not.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler

2017-06-01 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-6678:
---
Attachment: YARN-6678.001.patch

Attach a patch with UT for review.

> Committer thread crashes with IllegalStateException in async-scheduling mode 
> of CapacityScheduler
> -
>
> Key: YARN-6678
> URL: https://issues.apache.org/jira/browse/YARN-6678
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Tao Yang
> Attachments: YARN-6678.001.patch
>
>
> Error log:
> {noformat}
> java.lang.IllegalStateException: Trying to reserve container 
> container_e10_1495599791406_7129_01_001453 for application 
> appattempt_1495599791406_7129_01 when currently reserved container 
> container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 
> #containers=40 available=... used=...
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
> {noformat}
> Reproduce this problem:
> 1. nm1 re-reserved app-1/container-X1 and generated reserved proposal-1
> 2. nm2 has enough resource for app-1, un-reserved app-1/container-X1 and 
> allocated app-1/container-X2
> 3. nm1 reserved app-2/container-Y
> 4. proposal-1 was accepted but throw IllegalStateException when applying
> Currently the check code for reserve proposal in FiCaSchedulerApp#accept as 
> follows:
> {code}
>   // Container reserved first time will be NEW, after the container
>   // accepted & confirmed, it will become RESERVED state
>   if (schedulerContainer.getRmContainer().getState()
>   == RMContainerState.RESERVED) {
> // Set reReservation == true
> reReservation = true;
>   } else {
> // When reserve a resource (state == NEW is for new container,
> // state == RUNNING is for increase container).
> // Just check if the node is not already reserved by someone
> if (schedulerContainer.getSchedulerNode().getReservedContainer()
> != null) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Try to reserve a container, but the node is "
> + "already reserved by another container="
> + schedulerContainer.getSchedulerNode()
> .getReservedContainer().getContainerId());
>   }
>   return false;
> }
>   }
> {code}
> The reserved container on the node of reserve proposal will be checked only 
> for first-reserve container, not for the re-reserve container.
> I think FiCaSchedulerApp#accept should do this check for all reserve proposal 
> not matter if the container is re-reserve or not.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler

2017-06-01 Thread Tao Yang (JIRA)
Tao Yang created YARN-6678:
--

 Summary: Committer thread crashes with IllegalStateException in 
async-scheduling mode of CapacityScheduler
 Key: YARN-6678
 URL: https://issues.apache.org/jira/browse/YARN-6678
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0-alpha3, 2.9.0
Reporter: Tao Yang


Error log:
{noformat}
java.lang.IllegalStateException: Trying to reserve container 
container_e10_1495599791406_7129_01_001453 for application 
appattempt_1495599791406_7129_01 when currently reserved container 
container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 
#containers=40 available=... used=...
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
{noformat}

Reproduce this problem:
1. nm1 re-reserved app-1/container-X1 and generated reserved proposal-1
2. nm2 has enough resource for app-1, un-reserved app-1/container-X1 and 
allocated app-1/container-X2
3. nm1 reserved app-2/container-Y
4. proposal-1 was accepted but throw IllegalStateException when applying

Currently the check code for reserve proposal in FiCaSchedulerApp#accept as 
follows:
{code}
  // Container reserved first time will be NEW, after the container
  // accepted & confirmed, it will become RESERVED state
  if (schedulerContainer.getRmContainer().getState()
  == RMContainerState.RESERVED) {
// Set reReservation == true
reReservation = true;
  } else {
// When reserve a resource (state == NEW is for new container,
// state == RUNNING is for increase container).
// Just check if the node is not already reserved by someone
if (schedulerContainer.getSchedulerNode().getReservedContainer()
!= null) {
  if (LOG.isDebugEnabled()) {
LOG.debug("Try to reserve a container, but the node is "
+ "already reserved by another container="
+ schedulerContainer.getSchedulerNode()
.getReservedContainer().getContainerId());
  }
  return false;
}
  }
{code}

The reserved container on the node of reserve proposal will be checked only for 
first-reserve container, not for the re-reserve container.
I think FiCaSchedulerApp#accept should do this check for all reserve proposal 
not matter if the container is re-reserve or not.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5333) Some recovered apps are put into default queue when RM HA

2017-06-01 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-5333:

Fix Version/s: 2.7.4

> Some recovered apps are put into default queue when RM HA
> -
>
> Key: YARN-5333
> URL: https://issues.apache.org/jira/browse/YARN-5333
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>  Labels: release-blocker
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: YARN-5333.01.patch, YARN-5333.02.patch, 
> YARN-5333.03.patch, YARN-5333.04.patch, YARN-5333.05.patch, 
> YARN-5333.06.patch, YARN-5333.07.patch, YARN-5333.08.patch, 
> YARN-5333.09.patch, YARN-5333.10.patch
>
>
> Enable RM HA and use FairScheduler, 
> {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, 
> {{yarn.scheduler.fair.user-as-default-queue}} is set to false.
> Reproduce steps:
> 1. Start two RMs.
> 2. After RMs are running, change both RM's file 
> {{etc/hadoop/fair-scheduler.xml}}, then add some queues.
> 3. Submit some apps to the new added queues.
> 4. Stop the active RM, then the standby RM will transit to active and recover 
> apps.
> However the new active RM will put recovered apps into default queue because 
> it might have not loaded the new {{fair-scheduler.xml}}. We need call 
> {{initScheduler}} before start active services or bring {{refreshAll()}} in 
> front of {{rm.transitionToActive()}}. *It seems it is also important for 
> other scheduler*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org