[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161551#comment-14161551
 ] 

Hadoop QA commented on YARN-1879:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673293/YARN-1879.23.patch
  against trunk revision 0fb2735.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5301//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5301//console

This message is automatically generated.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
> fail over
> 
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
> YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, 
> YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, 
> YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161550#comment-14161550
 ] 

Hadoop QA commented on YARN-2496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673291/YARN-2496.patch
  against trunk revision 0fb2735.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5300//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5300//console

This message is automatically generated.

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161547#comment-14161547
 ] 

Zhijie Shen commented on YARN-2583:
---

1. Be more specific: we find a more scalable method to only write a single log 
file per LRS?
{code}
  // we find a more scalable method.
{code}

2. Make 30  and 3600 the constants of AppLogAggregatorImpl?
{code}
int configuredRentionSize =
conf.getInt(NM_LOG_AGGREGATION_RETAIN_RETENTION_SIZE_PER_APP, 30);
{code}
{code}
if (configuredInterval > 0 && configuredInterval < 3600) {
{code}

3. Should be ">"?
{code}
  if (status.size() >= this.retentionSize) {
{code}
And should be "<"?
{code}
for (int i = 0 ; i <= statusList.size() - this.retentionSize; i++) {
{code}

4. why not using yarnclient? The packaging issue?
{code}
  private ApplicationClientProtocol rmClient;
{code}

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
> YARN-2583.3.1.patch, YARN-2583.3.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-06 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1879:
-
Attachment: YARN-1879.23.patch

Marked Idempotent annotations to registerApplicationMaster and 
finishApplicationMaster.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
> fail over
> 
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
> YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, 
> YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, 
> YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-06 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161499#comment-14161499
 ] 

Tsuyoshi OZAWA commented on YARN-1879:
--

Thanks for your comments, Jian and Karthik.

{quote}
from RM’s perspective, these are just new requests, as the new RM doesn’t have 
any cache for previous requests from client.
{quote}

I confirmed that it's true. Not only {{finishApplicationMaster}} but also 
{{registerApplicationMaster}} don't touch the data in ZK directly, so RM can 
handle retried requests transparently following cases: 

1. When EmbeddedElector choose different RM as a leader before and after the 
failover, ZK doesn't have the data of RMAttempt/RMApp. Then, RM recognizes a 
retried-request as a new request. e.g. there is active-RM(RM1) and 
standby-RM(RM2) and RM's leader failovers from RM1 to RM2.
2. Still when EmbeddedElector choose same RM as a leader before and after the 
failover, RM goes into standby state and RM stop all services before failover 
and it reload the data of RMAppAttempt/RMApp. In this case, RM recognizes a 
retried-request as a new request. e.g. there is active-RM(RM1) and 
standby-RM(RM2) and RM's leader failovers from RM1 to RM1. 

I think it has no problem to mark these methods as Idempotent.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
> fail over
> 
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
> YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, 
> YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, 
> YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161496#comment-14161496
 ] 

Jian He commented on YARN-1857:
---

I found given that  {{queueUsedResources >= userConsumed}}, we can simplify the 
formula to {code} min (userlimit - userConsumed,   queueMaxCap- 
queueUsedResources) {code}, does this make sense ?

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, 
> YARN-1857.patch, YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2496:
-
Attachment: YARN-2496.patch

Attached new patch against latest trunk

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-10-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161487#comment-14161487
 ] 

Wangda Tan commented on YARN-2056:
--

Hi [~eepayne],
Sorry to response so late,

I've carefully read and thought about what you mentioned, especially the 
algorithm. I think it can resolve most issues, but it cannot guarante all case 
will be resolved. I think in your algorithm, following case will be not correct.

{code}
total = 100
qA: used = 10, guaranteed = 10, pending = 100
qB: used = 25, guaranteed = 10, pending = 100 (non-preemptable)
qC: used = 0, guaranteed = 80, pending = 0

1. In the start, unassign = 100, It will first exclude qB, unassigned = 80
2. It will try to fill qA, qA = qA + 80 * (10/(10 + 80) = 18, unassigned = 72. 
qC will be removed this turn
3. qB will not be added back here, becaused ideal_assign(qA) = 18, 
ideal_assign(qB) = 25.
4. All resource will be used by qA. The result should be
   ideal(qA) = 75, ideal(qB) = 25
{code}

And in addition, the remove-then-add-back algorithm seems not very 
straight-forward to me.

In my mind, this problem is like fulfilling water to a water tank like 
following, some of the tank has stones, make some of them higher than others. 
Because of water flows, so the result is most equilibrized (which water surface 
has same height, and some stone can be higher than water surface).
{code}
   _
  | |   
__|  X  |  
|X  |__   
|X|
|  X X   X|
|X X X   X|
---
 1 2 3 4 5
{code}

The algorithm may look like,
{code}
At the beginning, all queue will set ideal_assign = non-preemptable resource, 
and deduct non-preemptable resource from total-remained resource (stones is 
here). All queue will keep in qAlloc

In each turn,
- All queues not completely satisfied ideal_guaranteed <= min(maximum_capacity, 
used + pending) will NOT be removed. Like what we have today. (water hasn't 
reached ceiling of the tank)
- Get normalized weight of each queue
- Get the queue with the minimum {ideal_assigned % guarantee}, say Q_min
- Get the target_height is = (Q_min.ideal_assigned + remained * 
Q_min.normalized_gurantee)
- For each queue, do TempQueue.offer like today
- The TempQueue.offer method looks like
  * If (q.ideal_assigned > target_height): skip
  * If (q.ideal_assigned <= target_height): accepted = min(q.maximum, q.used + 
q.pending, target_height * q.guaranteed) - q.ideal_assigned
- If accepted becomes zero, remove the queue from qAlloc like today. 

The loop will exit until total-remained become zero (resource are exhausted) or 
qAlloc becomes empty (all queue get satisfied). 
{code}

I think this algorithm can get a more balanced result. Does this make sense to 
you?

Thanks,
Wangda

> Disable preemption at Queue level
> -
>
> Key: YARN-2056
> URL: https://issues.apache.org/jira/browse/YARN-2056
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Mayank Bansal
>Assignee: Eric Payne
> Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
> YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
> YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, 
> YARN-2056.201409232329.txt, YARN-2056.201409242210.txt
>
>
> We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-06 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161465#comment-14161465
 ] 

Steve Loughran commented on YARN-913:
-

Sanjay, I can do most of these, 

* w.r.t. the README, we have a javadoc {{package-info.java}}, that's enough.
* I propose restricting the custom values that a service record to have to 
string attributes. support arbitrary JSON opens things up to people embedding 
entire custom JSON docs in there, which could kill the notion of having 
semi-standardised records that other apps can work with + published API 
endpoints for any extra stuff *outside the registry*
* I'm going to rename the yarn fields back to {{yarn:id}} and 
{{yarn:persistence}} if Jersey+jackson marshalls them reliably once they aren't 
introspection-driven. It makes the yarn-nature of them clearer.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
> YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
> YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
> YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, yarnregistry.pdf, 
> yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: (was: YARN-796.node-label.consolidate.13.patch)

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: YARN-796.node-label.consolidate.13.patch

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.13.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: (was: YARN-796.node-label.consolidate.13.patch)

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: YARN-796.node-label.consolidate.13.patch

Updated to trunk

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161403#comment-14161403
 ] 

Hadoop QA commented on YARN-796:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12673284/YARN-796.node-label.consolidate.13.patch
  against trunk revision 519e5a7.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5298//console

This message is automatically generated.

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.13.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161402#comment-14161402
 ] 

Hadoop QA commented on YARN-2496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673283/YARN-2496.patch
  against trunk revision 519e5a7.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5297//console

This message is automatically generated.

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: YARN-796.node-label.consolidate.13.patch

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.13.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2496:
-
Attachment: YARN-2496.patch

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161396#comment-14161396
 ] 

Jian He commented on YARN-1879:
---

bq. This is okay only if the RM handles these duplicate requests.
In RM failover, even if the request  from client’s perspective is duplicate; 
from RM’s perspective, these are just new requests, as the new RM doesn’t have 
any cache for previous requests from client.  Just to unblock this, I suggest 
marking the annotation now so that the operation can be retried in failover.  
And discuss the internal implementation separately.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
> fail over
> 
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
> YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, 
> YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, 
> YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2647) [YARN-796] Add yarn queue CLI to get queue info including labels of such queue

2014-10-06 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-2647:
-

Assignee: Sunil G

> [YARN-796] Add yarn queue CLI to get queue info including labels of such queue
> --
>
> Key: YARN-2647
> URL: https://issues.apache.org/jira/browse/YARN-2647
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Reporter: Wangda Tan
>Assignee: Sunil G
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2647) [YARN-796] Add yarn queue CLI to get queue info including labels of such queue

2014-10-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161395#comment-14161395
 ] 

Sunil G commented on YARN-2647:
---

Hi [~gp.leftnoteasy], I would like to take this up. 
Thank you.

> [YARN-796] Add yarn queue CLI to get queue info including labels of such queue
> --
>
> Key: YARN-2647
> URL: https://issues.apache.org/jira/browse/YARN-2647
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Reporter: Wangda Tan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161373#comment-14161373
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673238/YARN-1857.6.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5296//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5296//console

This message is automatically generated.

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, 
> YARN-1857.patch, YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161368#comment-14161368
 ] 

Karthik Kambatla commented on YARN-1879:


Thanks for looking at it closely, Jian and Xuan. I have missed some of these 
points. Spent a little more time thinking through.

bq. I think we are mixing two issues in this jira
When we mark an API Idempotent or AtMostOnce, the retry-policies will end up 
re-invoking the API on the other RM in case of a failover. This is okay only if 
the RM handles these duplicate requests. Further, my understanding is that the 
behavior of "Idempotent" APIs should be the same on each invocation; i.e., the 
client should receive the exact same response too. 

If we handle duplicate requests but return a different response to the client 
on duplicate calls, we can mark it AtMostOnce. If we return the same response, 
we can go ahead and mark it Idempotent. Needless to say, the RM should 
definitely duplicate requests gracefully. Does that sound reasonable? 

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
> fail over
> 
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
> YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, 
> YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, 
> YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-06 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161353#comment-14161353
 ] 

Sanjay Radia commented on YARN-913:
---

Some feedback:
# rename {{RegistryOperations.create()}} to {{bind()}}
# rename {{org/apache/hadoop/yarn/registry/client/services}} to
{{org/apache/hadoop/yarn/registry/client/impl}}
# move all ZK classes under
{{org/apache/hadoop/yarn/registry/client/impl/zk}}, i.e. the current
implementations of the registry client
# {{RegistryOperations}} implementations to remove declaration of
exceptions other than IOE.
# {{RegistryOperations.resolve()} implementation should not mention
record headers in exception text: that's an implementation detail
# Add README under {{org.apache.hadoop.yarn.registry.server}} to
emphasize this is server-side code
# Allow {{ServiceRecord}} to support arbitrary key-values
# remove {{yarn_id}} & {{yarn_persistence}} ffields rom
{{ServiceRecord}}, moving them to the set of arbitrary key-values This
ensures that there isn't explicit hard-coding of the assumption "these
are YARN apps" from the records.

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
> YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
> YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
> YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, yarnregistry.pdf, 
> yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2641) improve node decommission latency in RM.

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161299#comment-14161299
 ] 

Hadoop QA commented on YARN-2641:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673239/YARN-2641.000.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5293//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5293//console

This message is automatically generated.

> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161300#comment-14161300
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673238/YARN-1857.6.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5292//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5292//console

This message is automatically generated.

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, 
> YARN-1857.patch, YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-06 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1879:
--
Summary: Mark Idempotent/AtMostOnce annotations to 
ApplicationMasterProtocol for RM fail over  (was: Mark Idempotent/AtMostOnce 
annotations to ApplicationMasterProtocol)

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
> fail over
> 
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
> YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, 
> YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, 
> YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161282#comment-14161282
 ] 

Xuan Gong commented on YARN-2583:
-

Those testcases fail because of binding exception. I do not think they are 
related.

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
> YARN-2583.3.1.patch, YARN-2583.3.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161280#comment-14161280
 ] 

Jian He commented on YARN-1879:
---

I think we are mixing two issues in this jira:
1. Mark annotation on protocol for failover. (RM work-preserving failover won’t 
work without proper protocol annotations. RetryCache won’t help in this 
scenario, as the cache simply gets cleaned-up after failover/restart)
2. Change the API to return same response for duplicate requests.
I propose let’s do 1) first which is what really affects work-preserving RM 
failover, and do 2) separately.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
> YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, 
> YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, 
> YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161279#comment-14161279
 ] 

Hadoop QA commented on YARN-2583:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673242/YARN-2583.3.1.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

org.apache.hadoop.yarn.server.nodemanager.securTests

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5295//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5295//console

This message is automatically generated.

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
> YARN-2583.3.1.patch, YARN-2583.3.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161278#comment-14161278
 ] 

Hadoop QA commented on YARN-2583:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673242/YARN-2583.3.1.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

org.apache.hadoop.yarn.server.nodemanager.securTests

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5294//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5294//console

This message is automatically generated.

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
> YARN-2583.3.1.patch, YARN-2583.3.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2649) Flaky test TestAMRMRPCNodeUpdates

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161262#comment-14161262
 ] 

Hadoop QA commented on YARN-2649:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673231/YARN-2649.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5288//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5288//console

This message is automatically generated.

> Flaky test TestAMRMRPCNodeUpdates
> -
>
> Key: YARN-2649
> URL: https://issues.apache.org/jira/browse/YARN-2649
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ming Ma
> Attachments: YARN-2649.patch
>
>
> Sometimes the test fails with the following error:
> testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates)
>   Time elapsed: 41.73 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: AppAttempt state is not correct 
> (timedout) expected: but was:
>   at junit.framework.Assert.fail(Assert.java:50)
>   at junit.framework.Assert.failNotEquals(Assert.java:287)
>   at junit.framework.Assert.assertEquals(Assert.java:67)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:382)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:125)
> When this happens, SchedulerEventType.NODE_UPDATE was processed before 
> RMAppAttemptEvent.ATTEMPT_ADDED was processed. That is possible, given the 
> test only waits for RMAppState.ACCEPTED before having NM sending heartbeat. 
> This can be reproduced using custom AsyncDispatcher with CountDownLatch. Here 
> is the log when this happens.
> {noformat}
> App State is : ACCEPTED
> 2014-10-05 21:25:07,305 INFO  [AsyncDispatcher event handler] 
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - 
> appattempt_1412569506932_0001_01 State change from NEW to SUBMITTED
> 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType:
>  STATUS_UPDATE
> 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] 
> rmnode.RMNodeImpl (RMNodeImpl.java:handle(384)) - Processing 127.0.0.1:1234 
> of type STATUS_UPDATE
> AppAttempt : appattempt_1412569506932_0001_01 State is : SUBMITTED 
> Waiting for state : ALLOCATED
> 2014-10-05 21:25:07,306 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppAttemptAddedSchedulerEvent.EventType:
>  APP_ATTEMPT_ADDED
> 2014-10-05 21:25:07,328 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
>  NODE_UPDATE
> 2014-10-05 21:25:07,330 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent.EventType:
>  ATTEMPT_ADDED
> 2014-10-05 21:25:07,331 DEBUG [AsyncDispatcher event handler] 
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(658)) - Processing 
> event for appattempt_1412569506932_0001_000
> 001 of type ATTEMPT_ADDED
> 2014-10-05 21:25:07,333 INFO  [AsyncDispatcher event handler] 
> attempt.RMAppAttemptIm

[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161246#comment-14161246
 ] 

Hadoop QA commented on YARN-2629:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673235/YARN-2629.3.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5291//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5291//console

This message is automatically generated.

> Make distributed shell use the domain-based timeline ACLs
> -
>
> Key: YARN-2629
> URL: https://issues.apache.org/jira/browse/YARN-2629
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch
>
>
> For demonstration the usage of this feature (YARN-2102), it's good to make 
> the distributed shell create the domain, and post its timeline entities into 
> this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-06 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161241#comment-14161241
 ] 

Tsuyoshi OZAWA commented on YARN-1879:
--

For now, I have no idea to reconstruct same response after failover. Currently 
latest patch only return empty response. This is one discussion point of this 
design.

> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
> YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
> YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, 
> YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, 
> YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2583:

Attachment: YARN-2583.3.1.patch

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
> YARN-2583.3.1.patch, YARN-2583.3.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161238#comment-14161238
 ] 

Hadoop QA commented on YARN-2583:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673234/YARN-2583.3.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1268 javac 
compiler warnings (more than the trunk's current 1267 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5290//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5290//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5290//console

This message is automatically generated.

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
> YARN-2583.3.1.patch, YARN-2583.3.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161234#comment-14161234
 ] 

Hadoop QA commented on YARN-2583:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673234/YARN-2583.3.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1268 javac 
compiler warnings (more than the trunk's current 1267 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5289//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5289//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5289//console

This message is automatically generated.

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch, YARN-2583.2.patch, YARN-2583.3.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: YARN-2641.000.patch

> improve node decommission latency in RM.
> 
>
> Key: YARN-2641
> URL: https://issues.apache.org/jira/browse/YARN-2641
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2641.000.patch
>
>
> improve node decommission latency in RM. 
> Currently the node decommission only happened after RM received nodeHeartbeat 
> from the Node Manager. The node heartbeat interval is configurable. The 
> default value is 1 second.
> It will be better to do the decommission during RM Refresh(NodesListManager) 
> instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161221#comment-14161221
 ] 

Hadoop QA commented on YARN-796:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12673185/YARN-796.node-label.consolidate.12.patch
  against trunk revision 3affad9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 41 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapred.pipes.TestPipeApplication
  
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5282//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5282//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5282//console

This message is automatically generated.

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161220#comment-14161220
 ] 

Craig Welch commented on YARN-1857:
---

[~john.jian.fang] - uploaded .6 on [YARN-2644], updated headroom calculation 
comment, fixed indentation

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, 
> YARN-1857.patch, YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1857:
--
Attachment: YARN-1857.6.patch

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, 
> YARN-1857.patch, YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2649) Flaky test TestAMRMRPCNodeUpdates

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161210#comment-14161210
 ] 

Jian He commented on YARN-2649:
---

[~mingma], thanks for working on this !
bq. Another way to fix it is to change MockRM.submitApp to waitForState on 
RMAppAttempt. That might address other test cases that use MockRM.submitApp.
I recently saw some other similar test failure e.g. YARN-2483.  maybe this is 
what we should do.  could you also run all tests locally, in case we don't 
introduce regression failure? thx 

> Flaky test TestAMRMRPCNodeUpdates
> -
>
> Key: YARN-2649
> URL: https://issues.apache.org/jira/browse/YARN-2649
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ming Ma
> Attachments: YARN-2649.patch
>
>
> Sometimes the test fails with the following error:
> testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates)
>   Time elapsed: 41.73 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: AppAttempt state is not correct 
> (timedout) expected: but was:
>   at junit.framework.Assert.fail(Assert.java:50)
>   at junit.framework.Assert.failNotEquals(Assert.java:287)
>   at junit.framework.Assert.assertEquals(Assert.java:67)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:382)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:125)
> When this happens, SchedulerEventType.NODE_UPDATE was processed before 
> RMAppAttemptEvent.ATTEMPT_ADDED was processed. That is possible, given the 
> test only waits for RMAppState.ACCEPTED before having NM sending heartbeat. 
> This can be reproduced using custom AsyncDispatcher with CountDownLatch. Here 
> is the log when this happens.
> {noformat}
> App State is : ACCEPTED
> 2014-10-05 21:25:07,305 INFO  [AsyncDispatcher event handler] 
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - 
> appattempt_1412569506932_0001_01 State change from NEW to SUBMITTED
> 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType:
>  STATUS_UPDATE
> 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] 
> rmnode.RMNodeImpl (RMNodeImpl.java:handle(384)) - Processing 127.0.0.1:1234 
> of type STATUS_UPDATE
> AppAttempt : appattempt_1412569506932_0001_01 State is : SUBMITTED 
> Waiting for state : ALLOCATED
> 2014-10-05 21:25:07,306 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppAttemptAddedSchedulerEvent.EventType:
>  APP_ATTEMPT_ADDED
> 2014-10-05 21:25:07,328 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
>  NODE_UPDATE
> 2014-10-05 21:25:07,330 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent.EventType:
>  ATTEMPT_ADDED
> 2014-10-05 21:25:07,331 DEBUG [AsyncDispatcher event handler] 
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(658)) - Processing 
> event for appattempt_1412569506932_0001_000
> 001 of type ATTEMPT_ADDED
> 2014-10-05 21:25:07,333 INFO  [AsyncDispatcher event handler] 
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - 
> appattempt_1412569506932_0001_01 State change from SUBMITTED to SCHEDULED
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2629:
--
Attachment: YARN-2629.3.patch

Upload a new patch:

1. Fix the test failure
2. Remove two lines of unnecessary code in TimelineClientImpl
3. Improve the code of publishing entities in DS AM

> Make distributed shell use the domain-based timeline ACLs
> -
>
> Key: YARN-2629
> URL: https://issues.apache.org/jira/browse/YARN-2629
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch
>
>
> For demonstration the usage of this feature (YARN-2102), it's good to make 
> the distributed shell create the domain, and post its timeline entities into 
> this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2583:

Attachment: YARN-2583.3.patch

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch, YARN-2583.2.patch, YARN-2583.3.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2649) Flaky test TestAMRMRPCNodeUpdates

2014-10-06 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-2649:
--
Attachment: YARN-2649.patch

Fix the test code to wait until RMAppAttemptImpl gets to 
RMAppAttemptState.SCHEDULED state before having the nm heartbeat.

Another way to fix it is to change MockRM.submitApp to waitForState on 
RMAppAttempt. That might address other test cases that use MockRM.submitApp.

> Flaky test TestAMRMRPCNodeUpdates
> -
>
> Key: YARN-2649
> URL: https://issues.apache.org/jira/browse/YARN-2649
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ming Ma
> Attachments: YARN-2649.patch
>
>
> Sometimes the test fails with the following error:
> testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates)
>   Time elapsed: 41.73 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: AppAttempt state is not correct 
> (timedout) expected: but was:
>   at junit.framework.Assert.fail(Assert.java:50)
>   at junit.framework.Assert.failNotEquals(Assert.java:287)
>   at junit.framework.Assert.assertEquals(Assert.java:67)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:382)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:125)
> When this happens, SchedulerEventType.NODE_UPDATE was processed before 
> RMAppAttemptEvent.ATTEMPT_ADDED was processed. That is possible, given the 
> test only waits for RMAppState.ACCEPTED before having NM sending heartbeat. 
> This can be reproduced using custom AsyncDispatcher with CountDownLatch. Here 
> is the log when this happens.
> {noformat}
> App State is : ACCEPTED
> 2014-10-05 21:25:07,305 INFO  [AsyncDispatcher event handler] 
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - 
> appattempt_1412569506932_0001_01 State change from NEW to SUBMITTED
> 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType:
>  STATUS_UPDATE
> 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] 
> rmnode.RMNodeImpl (RMNodeImpl.java:handle(384)) - Processing 127.0.0.1:1234 
> of type STATUS_UPDATE
> AppAttempt : appattempt_1412569506932_0001_01 State is : SUBMITTED 
> Waiting for state : ALLOCATED
> 2014-10-05 21:25:07,306 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppAttemptAddedSchedulerEvent.EventType:
>  APP_ATTEMPT_ADDED
> 2014-10-05 21:25:07,328 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
>  NODE_UPDATE
> 2014-10-05 21:25:07,330 DEBUG [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
> event 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent.EventType:
>  ATTEMPT_ADDED
> 2014-10-05 21:25:07,331 DEBUG [AsyncDispatcher event handler] 
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(658)) - Processing 
> event for appattempt_1412569506932_0001_000
> 001 of type ATTEMPT_ADDED
> 2014-10-05 21:25:07,333 INFO  [AsyncDispatcher event handler] 
> attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - 
> appattempt_1412569506932_0001_01 State change from SUBMITTED to SCHEDULED
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161174#comment-14161174
 ] 

Hudson commented on YARN-2644:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6202 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6202/])
YARN-2644. Fixed CapacityScheduler to return up-to-date headroom when AM 
allocates. Contributed by Craig Welch (jianhe: rev 
519e5a7dd2bd540105434ec3c8939b68f6c024f8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java


> Recalculate headroom more frequently to keep it accurate
> 
>
> Key: YARN-2644
> URL: https://issues.apache.org/jira/browse/YARN-2644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.6.0
>
> Attachments: YARN-2644.11.patch, YARN-2644.14.patch, 
> YARN-2644.15.patch, YARN-2644.15.patch
>
>
> See parent (1198) for more detail - this specifically covers calculating the 
> headroom more frequently, to cover the cases where changes have occurred 
> which impact headroom but which are not reflected due to an application not 
> being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2649) Flaky test TestAMRMRPCNodeUpdates

2014-10-06 Thread Ming Ma (JIRA)
Ming Ma created YARN-2649:
-

 Summary: Flaky test TestAMRMRPCNodeUpdates
 Key: YARN-2649
 URL: https://issues.apache.org/jira/browse/YARN-2649
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma


Sometimes the test fails with the following error:

testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates)
  Time elapsed: 41.73 sec  <<< FAILURE!
junit.framework.AssertionFailedError: AppAttempt state is not correct 
(timedout) expected: but was:
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:382)
at 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:125)



When this happens, SchedulerEventType.NODE_UPDATE was processed before 
RMAppAttemptEvent.ATTEMPT_ADDED was processed. That is possible, given the test 
only waits for RMAppState.ACCEPTED before having NM sending heartbeat. This can 
be reproduced using custom AsyncDispatcher with CountDownLatch. Here is the log 
when this happens.

{noformat}
App State is : ACCEPTED
2014-10-05 21:25:07,305 INFO  [AsyncDispatcher event handler] 
attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - 
appattempt_1412569506932_0001_01 State change from NEW to SUBMITTED
2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType:
 STATUS_UPDATE
2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] rmnode.RMNodeImpl 
(RMNodeImpl.java:handle(384)) - Processing 127.0.0.1:1234 of type STATUS_UPDATE
AppAttempt : appattempt_1412569506932_0001_01 State is : SUBMITTED Waiting 
for state : ALLOCATED
2014-10-05 21:25:07,306 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppAttemptAddedSchedulerEvent.EventType:
 APP_ATTEMPT_ADDED

2014-10-05 21:25:07,328 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
 NODE_UPDATE

2014-10-05 21:25:07,330 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent.EventType:
 ATTEMPT_ADDED
2014-10-05 21:25:07,331 DEBUG [AsyncDispatcher event handler] 
attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(658)) - Processing event 
for appattempt_1412569506932_0001_000
001 of type ATTEMPT_ADDED

2014-10-05 21:25:07,333 INFO  [AsyncDispatcher event handler] 
attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - 
appattempt_1412569506932_0001_01 State change from SUBMITTED to SCHEDULED

{noformat}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161164#comment-14161164
 ] 

Jian He commented on YARN-1857:
---

could you please update the patch on top of YARN-2644 ? comments in the 
meanwhile: 
- update the code comments about the new calculation of headroom 
{code}
/** 
 * Headroom is min((userLimit, queue-max-cap) - consumed)
 */
{code}
- indentation of this line {{Resources.subtract(queueMaxCap, usedResources));}}

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.patch, YARN-1857.patch, 
> YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1061) NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager.

2014-10-06 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161160#comment-14161160
 ] 

Wilfred Spiegelenburg commented on YARN-1061:
-

This is a dupe from YARN-2578. Writes do not time out and they should.

> NodeManager is indefinitely waiting for nodeHeartBeat() response from 
> ResouceManager.
> -
>
> Key: YARN-1061
> URL: https://issues.apache.org/jira/browse/YARN-1061
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.5-alpha
>Reporter: Rohith
>
> It is observed that in one of the scenario, NodeManger is indefinetly waiting 
> for nodeHeartbeat response from ResouceManger where ResouceManger is in 
> hanged up state.
> NodeManager should get timeout exception instead of waiting indefinetly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161155#comment-14161155
 ] 

Hadoop QA commented on YARN-2496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673224/YARN-2496.patch
  against trunk revision 8dc6abf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5287//console

This message is automatically generated.

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch, YARN-2496.patch, YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161152#comment-14161152
 ] 

Jian He commented on YARN-2644:
---

looks good, committing 

> Recalculate headroom more frequently to keep it accurate
> 
>
> Key: YARN-2644
> URL: https://issues.apache.org/jira/browse/YARN-2644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-2644.11.patch, YARN-2644.14.patch, 
> YARN-2644.15.patch, YARN-2644.15.patch
>
>
> See parent (1198) for more detail - this specifically covers calculating the 
> headroom more frequently, to cover the cases where changes have occurred 
> which impact headroom but which are not reflected due to an application not 
> being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161148#comment-14161148
 ] 

Hadoop QA commented on YARN-2644:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673200/YARN-2644.15.patch
  against trunk revision 3affad9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5285//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5285//console

This message is automatically generated.

> Recalculate headroom more frequently to keep it accurate
> 
>
> Key: YARN-2644
> URL: https://issues.apache.org/jira/browse/YARN-2644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-2644.11.patch, YARN-2644.14.patch, 
> YARN-2644.15.patch, YARN-2644.15.patch
>
>
> See parent (1198) for more detail - this specifically covers calculating the 
> headroom more frequently, to cover the cases where changes have occurred 
> which impact headroom but which are not reflected due to an application not 
> being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2377) Localization exception stack traces are not passed as diagnostic info

2014-10-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161127#comment-14161127
 ] 

Jason Lowe commented on YARN-2377:
--

Thanks for the patch, Gera.  I think the toString should be on 
SerializedException rather than SerializedExceptionPBImpl, since there's 
nothing implementation-specific about the way it tries to convert to a string 
-- it always goes through the interfaces to get the necessary things.  If a 
specific implementation really needs a different toString method then they can 
always override.

Nit: sringify should be stringify.  Also curious why it isn't static or 
otherwise assumes e == this and not take the additional parameter, since we can 
delegate to cause.stringify when processing the cause portion of the traceback.

> Localization exception stack traces are not passed as diagnostic info
> -
>
> Key: YARN-2377
> URL: https://issues.apache.org/jira/browse/YARN-2377
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: YARN-2377.v01.patch
>
>
> In the Localizer log one can only see this kind of message
> {code}
> 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { 
> hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar,
>  1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos 
> tException: ha-nn-uri-0
> {code}
> And then only {{ java.net.UnknownHostException: ha-nn-uri-0}} message is 
> propagated as diagnostics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2496:
-
Attachment: YARN-2496.patch

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch, YARN-2496.patch, YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2633) TestContainerLauncherImpl sometimes fails

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161038#comment-14161038
 ] 

Hadoop QA commented on YARN-2633:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673174/YARN-2633.patch
  against trunk revision 687d83c.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5286//console

This message is automatically generated.

> TestContainerLauncherImpl sometimes fails
> -
>
> Key: YARN-2633
> URL: https://issues.apache.org/jira/browse/YARN-2633
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-2633.patch
>
>
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.api.ContainerManagementProtocol$$EnhancerByMockitoWithCGLIB$$25708415.close()
>   at java.lang.Class.getMethod(Class.java:1665)
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.stopClient(RpcClientFactoryPBImpl.java:90)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.stopProxy(HadoopYarnProtoRPC.java:54)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.mayBeCloseProxy(ContainerManagementProtocolProxy.java:79)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:225)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.shutdownAllContainers(ContainerLauncherImpl.java:320)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.serviceStop(ContainerLauncherImpl.java:331)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncherImpl.testMyShutdown(TestContainerLauncherImpl.java:315)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-06 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2644:
--
Attachment: YARN-2644.15.patch

Reupload to see if jenkins can apply the patch now

> Recalculate headroom more frequently to keep it accurate
> 
>
> Key: YARN-2644
> URL: https://issues.apache.org/jira/browse/YARN-2644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-2644.11.patch, YARN-2644.14.patch, 
> YARN-2644.15.patch, YARN-2644.15.patch
>
>
> See parent (1198) for more detail - this specifically covers calculating the 
> headroom more frequently, to cover the cases where changes have occurred 
> which impact headroom but which are not reflected due to an application not 
> being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160989#comment-14160989
 ] 

Hadoop QA commented on YARN-2583:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673189/YARN-2583.2.patch
  against trunk revision 3affad9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1268 javac 
compiler warnings (more than the trunk's current 1267 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.logaggregation.TestAggregatedLogDeletionService
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5283//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5283//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5283//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5283//console

This message is automatically generated.

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch, YARN-2583.2.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160963#comment-14160963
 ] 

Hadoop QA commented on YARN-2629:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673184/YARN-2629.2.patch
  against trunk revision 3affad9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5281//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5281//console

This message is automatically generated.

> Make distributed shell use the domain-based timeline ACLs
> -
>
> Key: YARN-2629
> URL: https://issues.apache.org/jira/browse/YARN-2629
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2629.1.patch, YARN-2629.2.patch
>
>
> For demonstration the usage of this feature (YARN-2102), it's good to make 
> the distributed shell create the domain, and post its timeline entities into 
> this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2583:

Attachment: YARN-2583.2.patch

> Modify the LogDeletionService to support Log aggregation for LRS
> 
>
> Key: YARN-2583
> URL: https://issues.apache.org/jira/browse/YARN-2583
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2583.1.patch, YARN-2583.2.patch
>
>
> Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
> will check the cut-off-time, if all logs for this application is older than 
> this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
> work for LRS. We expect a LRS application can keep running for a long time. 
> Two different scenarios: 
> 1) If we configured the rollingIntervalSeconds, the new log file will be 
> always uploaded to HDFS. The number of log files for this application will 
> become larger and larger. And there is no log files will be deleted.
> 2) If we did not configure the rollingIntervalSeconds, the log file can only 
> be uploaded to HDFS after the application is finished. It is very possible 
> that the logs are uploaded after the cut-off-time. It will cause problem 
> because at that time the app-log-dir for this application in HDFS has been 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: YARN-796.node-label.consolidate.12.patch

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2629:
--
Attachment: YARN-2629.2.patch

Add one more option "-create", which make client only try to create a new 
domain once this flag is set.

In addition, fix an existing problem in DS AM. AM should use the submitter UGI 
to put the entities.

Create a patch of these changes.

> Make distributed shell use the domain-based timeline ACLs
> -
>
> Key: YARN-2629
> URL: https://issues.apache.org/jira/browse/YARN-2629
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2629.1.patch, YARN-2629.2.patch
>
>
> For demonstration the usage of this feature (YARN-2102), it's good to make 
> the distributed shell create the domain, and post its timeline entities into 
> this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels

2014-10-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160858#comment-14160858
 ] 

Wangda Tan commented on YARN-2500:
--

Regarding comments from [~vinodkv],
bq. As with other patches, Labels -> NodeLabels. You'll need to change all of 
the following:...
Addressed
bq. ApplicationMasterService: There are multiple 
this.rmContext.getRMApps().get(appAttemptId.getApplicationId() calls in the the 
allocate method. Refactor to avoid dup calls.
Addressed
bq. TestSchedulerUtils: testValidateResourceRequestWithErrorLabelsPermission: 
Why are "" and " " accepted when only x and y are recognized labels?
Empty label expression "" should be accept by any queue, and " " will be 
trimmed to empty.
bq. Given we don't support yet other features in ResourceRequest for the AM 
container like priority, locality, shall we also hard-code them to 
AM_CONTAINER_PRIORITY, ResourceRequest.ANY respectively too?
Agree, now set values to default for 
priority/#container/resource-name/relax-locality
bq. Can we add test-case for num-containers, priority, locality for AM 
container?
Added test "testScheduleTransitionReplaceAMContainerRequestWithDefaults" in 
RMAppAttemptImpl.

Please kindly review,

Thanks,
Wangda

> [YARN-796] Miscellaneous changes in ResourceManager to support labels
> -
>
> Key: YARN-2500
> URL: https://issues.apache.org/jira/browse/YARN-2500
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2500.patch, YARN-2500.patch, YARN-2500.patch, 
> YARN-2500.patch, YARN-2500.patch
>
>
> This patches contains changes in ResourceManager to support labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2500:
-
Attachment: YARN-2500.patch

> [YARN-796] Miscellaneous changes in ResourceManager to support labels
> -
>
> Key: YARN-2500
> URL: https://issues.apache.org/jira/browse/YARN-2500
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2500.patch, YARN-2500.patch, YARN-2500.patch, 
> YARN-2500.patch, YARN-2500.patch
>
>
> This patches contains changes in ResourceManager to support labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160819#comment-14160819
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673170/YARN-1857.5.patch
  against trunk revision ea26cc0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5279//console

This message is automatically generated.

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.patch, YARN-1857.patch, 
> YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2633) TestContainerLauncherImpl sometimes fails

2014-10-06 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-2633:

Attachment: YARN-2633.patch

Attaching the patch.
The caught exception was to be ignored. Still throwing YarnRuntimeException 
from the catch clause did not make sense. deleting the line that throws the 
exception.

> TestContainerLauncherImpl sometimes fails
> -
>
> Key: YARN-2633
> URL: https://issues.apache.org/jira/browse/YARN-2633
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-2633.patch
>
>
> {noformat}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.api.ContainerManagementProtocol$$EnhancerByMockitoWithCGLIB$$25708415.close()
>   at java.lang.Class.getMethod(Class.java:1665)
>   at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.stopClient(RpcClientFactoryPBImpl.java:90)
>   at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.stopProxy(HadoopYarnProtoRPC.java:54)
>   at 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.mayBeCloseProxy(ContainerManagementProtocolProxy.java:79)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:225)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.shutdownAllContainers(ContainerLauncherImpl.java:320)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.serviceStop(ContainerLauncherImpl.java:331)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncherImpl.testMyShutdown(TestContainerLauncherImpl.java:315)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1857:
--
Attachment: YARN-1857.5.patch

Updating to current trunk on new(er) repo

> CapacityScheduler headroom doesn't account for other AM's running
> -
>
> Key: YARN-1857
> URL: https://issues.apache.org/jira/browse/YARN-1857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Chen He
>Priority: Critical
> Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
> YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.patch, YARN-1857.patch, 
> YARN-1857.patch
>
>
> Its possible to get an application to hang forever (or a long time) in a 
> cluster with multiple users.  The reason why is that the headroom sent to the 
> application is based on the user limit but it doesn't account for other 
> Application masters using space in that queue.  So the headroom (user limit - 
> user consumed) can be > 0 even though the cluster is 100% full because the 
> other space is being used by application masters from other users.  
> For instance if you have a cluster with 1 queue, user limit is 100%, you have 
> multiple users submitting applications.  One very large application by user 1 
> starts up, runs most of its maps and starts running reducers. other users try 
> to start applications and get their application masters started but not 
> tasks.  The very large application then gets to the point where it has 
> consumed the rest of the cluster resources with all reduces.  But at this 
> point it needs to still finish a few maps.  The headroom being sent to this 
> application is only based on the user limit (which is 100% of the cluster 
> capacity) its using lets say 95% of the cluster for reduces and then other 5% 
> is being used by other users running application masters.  The MRAppMaster 
> thinks it still has 5% so it doesn't know that it should kill a reduce in 
> order to run a map.  
> This can happen in other scenarios also.  Generally in a large cluster with 
> multiple queues this shouldn't cause a hang forever but it could cause the 
> application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-06 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2644:
--
Attachment: YARN-2644.15.patch

Update patch against latest trunk in new(er) git repo

> Recalculate headroom more frequently to keep it accurate
> 
>
> Key: YARN-2644
> URL: https://issues.apache.org/jira/browse/YARN-2644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-2644.11.patch, YARN-2644.14.patch, 
> YARN-2644.15.patch
>
>
> See parent (1198) for more detail - this specifically covers calculating the 
> headroom more frequently, to cover the cases where changes have occurred 
> which impact headroom but which are not reflected due to an application not 
> being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2648) need mechanism for updating HDFS delegation tokens associated with container launch contexts

2014-10-06 Thread Jonathan Maron (JIRA)
Jonathan Maron created YARN-2648:


 Summary: need mechanism for updating HDFS delegation tokens 
associated with container launch contexts
 Key: YARN-2648
 URL: https://issues.apache.org/jira/browse/YARN-2648
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Reporter: Jonathan Maron


During the launch of a container, the required delegation tokens (e.g. HDFS) 
are passed to the launch context.  If those tokens expire and the container 
requires a restart the restart attempt will fail.  Sample log output:

2014-10-06 18:37:28,609 WARN  ipc.Client (Client.java:run(675)) - Exception 
encountered while connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 token (HDFS_DELEGATION_TOKEN token 124 for hbase) can't be found in cache

 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160687#comment-14160687
 ] 

Hadoop QA commented on YARN-2566:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673150/YARN-2566.003.patch
  against trunk revision ea26cc0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5276//console

This message is automatically generated.

> IOException happen in startLocalizer of DefaultContainerExecutor due to not 
> enough disk space for the first localDir.
> -
>
> Key: YARN-2566
> URL: https://issues.apache.org/jira/browse/YARN-2566
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2566.000.patch, YARN-2566.001.patch, 
> YARN-2566.002.patch, YARN-2566.003.patch
>
>
> startLocalizer in DefaultContainerExecutor will only use the first localDir 
> to copy the token file, if the copy is failed for first localDir due to not 
> enough disk space in the first localDir, the localization will be failed even 
> there are plenty of disk space in other localDirs. We see the following error 
> for this case:
> {code}
> 2014-09-13 23:33:25,171 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to 
> create app directory 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
> java.io.IOException: mkdir of 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,185 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Localizer failed
> java.io.FileNotFoundException: File 
> file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 
> does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76)
>   at 
> org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:344)
>   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.create(FileContext.java:673)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizati

[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160684#comment-14160684
 ] 

Hadoop QA commented on YARN-2496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673145/YARN-2496.patch
  against trunk revision ea26cc0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5277//console

This message is automatically generated.

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch, YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2544) [YARN-796] Common server side PB changes (not include user API PB changes)

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160673#comment-14160673
 ] 

Hadoop QA commented on YARN-2544:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673138/YARN-2544.patch
  against trunk revision ea26cc0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5275//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5275//console

This message is automatically generated.

> [YARN-796] Common server side PB changes (not include user API PB changes)
> --
>
> Key: YARN-2544
> URL: https://issues.apache.org/jira/browse/YARN-2544
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2544.patch, YARN-2544.patch, YARN-2544.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2576) Prepare yarn-1051 branch for merging with trunk

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2576:
-
Fix Version/s: 2.6.0

> Prepare yarn-1051 branch for merging with trunk
> ---
>
> Key: YARN-2576
> URL: https://issues.apache.org/jira/browse/YARN-2576
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, scheduler
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Fix For: 2.6.0
>
> Attachments: YARN-2576.patch, YARN-2576.patch
>
>
> This JIRA is to track the changes required to ensure branch yarn-1051 is 
> ready to be merged with trunk. This includes fixing any compilation issues, 
> findbug and/or javadoc warning, test cases failures, etc if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2576) Prepare yarn-1051 branch for merging with trunk

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2576:
-
Target Version/s:   (was: 3.0.0)

> Prepare yarn-1051 branch for merging with trunk
> ---
>
> Key: YARN-2576
> URL: https://issues.apache.org/jira/browse/YARN-2576
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, scheduler
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Fix For: 2.6.0
>
> Attachments: YARN-2576.patch, YARN-2576.patch
>
>
> This JIRA is to track the changes required to ensure branch yarn-1051 is 
> ready to be merged with trunk. This includes fixing any compilation issues, 
> findbug and/or javadoc warning, test cases failures, etc if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2475) ReservationSystem: replan upon capacity reduction

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2475:
-
Target Version/s:   (was: 3.0.0)

> ReservationSystem: replan upon capacity reduction
> -
>
> Key: YARN-2475
> URL: https://issues.apache.org/jira/browse/YARN-2475
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.6.0
>
> Attachments: YARN-2475.patch, YARN-2475.patch, YARN-2475.patch
>
>
> In the context of YARN-1051, if capacity of the cluster drops significantly 
> upon machine failures we need to trigger a reorganization of the planned 
> reservations. As reservations are "absolute" it is possible that they will 
> not all fit, and some need to be rejected a-posteriori.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2611) Fix jenkins findbugs warning and test case failures for trunk merge patch

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2611:
-
Target Version/s:   (was: 3.0.0)

> Fix jenkins findbugs warning and test case failures for trunk merge patch
> -
>
> Key: YARN-2611
> URL: https://issues.apache.org/jira/browse/YARN-2611
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, scheduler
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Fix For: 2.6.0
>
> Attachments: YARN-2611.patch
>
>
> This JIRA is to fix jenkins findbugs warnings and test case failures for 
> trunk merge patch  as [reported | 
> https://issues.apache.org/jira/browse/YARN-1051?focusedCommentId=14148506] in 
> YARN-1051



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2475) ReservationSystem: replan upon capacity reduction

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2475:
-
Fix Version/s: 2.6.0

> ReservationSystem: replan upon capacity reduction
> -
>
> Key: YARN-2475
> URL: https://issues.apache.org/jira/browse/YARN-2475
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.6.0
>
> Attachments: YARN-2475.patch, YARN-2475.patch, YARN-2475.patch
>
>
> In the context of YARN-1051, if capacity of the cluster drops significantly 
> upon machine failures we need to trigger a reorganization of the planned 
> reservations. As reservations are "absolute" it is possible that they will 
> not all fit, and some need to be rejected a-posteriori.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2611) Fix jenkins findbugs warning and test case failures for trunk merge patch

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2611:
-
Fix Version/s: 2.6.0

> Fix jenkins findbugs warning and test case failures for trunk merge patch
> -
>
> Key: YARN-2611
> URL: https://issues.apache.org/jira/browse/YARN-2611
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager, scheduler
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Fix For: 2.6.0
>
> Attachments: YARN-2611.patch
>
>
> This JIRA is to fix jenkins findbugs warnings and test case failures for 
> trunk merge patch  as [reported | 
> https://issues.apache.org/jira/browse/YARN-1051?focusedCommentId=14148506] in 
> YARN-1051



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2385) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2385:
-
Fix Version/s: (was: 2.6.0)

> Consider splitting getAppsinQueue to getRunningAppsInQueue + 
> getPendingAppsInQueue
> --
>
> Key: YARN-2385
> URL: https://issues.apache.org/jira/browse/YARN-2385
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>  Labels: abstractyarnscheduler
>
> Currently getAppsinQueue returns both pending & running apps. The purpose of 
> the JIRA is to explore splitting it to getRunningAppsInQueue + 
> getPendingAppsInQueue that will provide more flexibility to callers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2389) Adding support for drainig a queue, ie killing all apps in the queue

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2389:
-
Target Version/s:   (was: 2.6.0)

> Adding support for drainig a queue, ie killing all apps in the queue
> 
>
> Key: YARN-2389
> URL: https://issues.apache.org/jira/browse/YARN-2389
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>  Labels: capacity-scheduler, fairscheduler
> Fix For: 2.6.0
>
> Attachments: YARN-2389-1.patch, YARN-2389.patch
>
>
> This is a parallel JIRA to YARN-2378. Fair scheduler already supports moving 
> a single application from one queue to another. This will add support to move 
> all applications from the specified source queue to target. This will use 
> YARN-2385 so will work for both Capacity & Fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2385) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2385:
-
Fix Version/s: 2.6.0

> Consider splitting getAppsinQueue to getRunningAppsInQueue + 
> getPendingAppsInQueue
> --
>
> Key: YARN-2385
> URL: https://issues.apache.org/jira/browse/YARN-2385
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>  Labels: abstractyarnscheduler
> Fix For: 2.6.0
>
>
> Currently getAppsinQueue returns both pending & running apps. The purpose of 
> the JIRA is to explore splitting it to getRunningAppsInQueue + 
> getPendingAppsInQueue that will provide more flexibility to callers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2378:
-
Target Version/s:   (was: 2.6.0)

> Adding support for moving apps between queues in Capacity Scheduler
> ---
>
> Key: YARN-2378
> URL: https://issues.apache.org/jira/browse/YARN-2378
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>  Labels: capacity-scheduler
> Fix For: 2.6.0
>
> Attachments: YARN-2378-1.patch, YARN-2378.patch, YARN-2378.patch, 
> YARN-2378.patch, YARN-2378.patch
>
>
> As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
> to smaller patches for manageability. This JIRA will address adding support 
> for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2080:
-
Target Version/s:   (was: 3.0.0)

> Admission Control: Integrate Reservation subsystem with ResourceManager
> ---
>
> Key: YARN-2080
> URL: https://issues.apache.org/jira/browse/YARN-2080
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Fix For: 2.6.0
>
> Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, 
> YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch
>
>
> This JIRA tracks the integration of Reservation subsystem data structures 
> introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
> of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1712) Admission Control: plan follower

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1712:
-
Target Version/s:   (was: 3.0.0)

> Admission Control: plan follower
> 
>
> Key: YARN-1712
> URL: https://issues.apache.org/jira/browse/YARN-1712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>  Labels: reservations, scheduler
> Fix For: 2.6.0
>
> Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.3.patch, 
> YARN-1712.4.patch, YARN-1712.5.patch, YARN-1712.patch
>
>
> This JIRA tracks a thread that continuously propagates the current state of 
> an reservation subsystem to the scheduler. As the inventory subsystem store 
> the "plan" of how the resources should be subdivided, the work we propose in 
> this JIRA realizes such plan by dynamically instructing the CapacityScheduler 
> to add/remove/resize queues to follow the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2080:
-
Fix Version/s: 2.6.0

> Admission Control: Integrate Reservation subsystem with ResourceManager
> ---
>
> Key: YARN-2080
> URL: https://issues.apache.org/jira/browse/YARN-2080
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Fix For: 2.6.0
>
> Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, 
> YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch
>
>
> This JIRA tracks the integration of Reservation subsystem data structures 
> introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
> of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1712) Admission Control: plan follower

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1712:
-
Fix Version/s: 2.6.0

> Admission Control: plan follower
> 
>
> Key: YARN-1712
> URL: https://issues.apache.org/jira/browse/YARN-1712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>  Labels: reservations, scheduler
> Fix For: 2.6.0
>
> Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.3.patch, 
> YARN-1712.4.patch, YARN-1712.5.patch, YARN-1712.patch
>
>
> This JIRA tracks a thread that continuously propagates the current state of 
> an reservation subsystem to the scheduler. As the inventory subsystem store 
> the "plan" of how the resources should be subdivided, the work we propose in 
> this JIRA realizes such plan by dynamically instructing the CapacityScheduler 
> to add/remove/resize queues to follow the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1711:
-
Target Version/s:   (was: 3.0.0)

> CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
> --
>
> Key: YARN-1711
> URL: https://issues.apache.org/jira/browse/YARN-1711
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>  Labels: reservations
> Fix For: 2.6.0
>
> Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, 
> YARN-1711.4.patch, YARN-1711.5.patch, YARN-1711.patch
>
>
> This JIRA tracks the development of a policy that enforces user quotas (a 
> time-extension of the notion of capacity) in the inventory subsystem 
> discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1711:
-
Fix Version/s: 2.6.0

> CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
> --
>
> Key: YARN-1711
> URL: https://issues.apache.org/jira/browse/YARN-1711
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>  Labels: reservations
> Fix For: 2.6.0
>
> Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, 
> YARN-1711.4.patch, YARN-1711.5.patch, YARN-1711.patch
>
>
> This JIRA tracks the development of a policy that enforces user quotas (a 
> time-extension of the notion of capacity) in the inventory subsystem 
> discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1710) Admission Control: agents to allocate reservation

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1710:
-
Fix Version/s: 2.6.0

> Admission Control: agents to allocate reservation
> -
>
> Key: YARN-1710
> URL: https://issues.apache.org/jira/browse/YARN-1710
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.6.0
>
> Attachments: YARN-1710.1.patch, YARN-1710.2.patch, YARN-1710.3.patch, 
> YARN-1710.4.patch, YARN-1710.patch
>
>
> This JIRA tracks the algorithms used to allocate a user ReservationRequest 
> coming in from the new reservation API (YARN-1708), in the inventory 
> subsystem (YARN-1709) maintaining the current plan for the cluster. The focus 
> of this "agents" is to quickly find a solution for the set of contraints 
> provided by the user, and the physical constraints of the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1710) Admission Control: agents to allocate reservation

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1710:
-
Target Version/s:   (was: 3.0.0)

> Admission Control: agents to allocate reservation
> -
>
> Key: YARN-1710
> URL: https://issues.apache.org/jira/browse/YARN-1710
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.6.0
>
> Attachments: YARN-1710.1.patch, YARN-1710.2.patch, YARN-1710.3.patch, 
> YARN-1710.4.patch, YARN-1710.patch
>
>
> This JIRA tracks the algorithms used to allocate a user ReservationRequest 
> coming in from the new reservation API (YARN-1708), in the inventory 
> subsystem (YARN-1709) maintaining the current plan for the cluster. The focus 
> of this "agents" is to quickly find a solution for the set of contraints 
> provided by the user, and the physical constraints of the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1709:
-
Target Version/s:   (was: 3.0.0)

> Admission Control: Reservation subsystem
> 
>
> Key: YARN-1709
> URL: https://issues.apache.org/jira/browse/YARN-1709
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Subru Krishnan
> Fix For: 2.6.0
>
> Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, 
> YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch
>
>
> This JIRA is about the key data structure used to track resources over time 
> to enable YARN-1051. The Reservation subsystem is conceptually a "plan" of 
> how the scheduler will allocate resources over-time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1709:
-
Fix Version/s: 2.6.0

> Admission Control: Reservation subsystem
> 
>
> Key: YARN-1709
> URL: https://issues.apache.org/jira/browse/YARN-1709
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Subru Krishnan
> Fix For: 2.6.0
>
> Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, 
> YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch
>
>
> This JIRA is about the key data structure used to track resources over time 
> to enable YARN-1051. The Reservation subsystem is conceptually a "plan" of 
> how the scheduler will allocate resources over-time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1708) Add a public API to reserve resources (part of YARN-1051)

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1708:
-
Target Version/s:   (was: 3.0.0)

> Add a public API to reserve resources (part of YARN-1051)
> -
>
> Key: YARN-1708
> URL: https://issues.apache.org/jira/browse/YARN-1708
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Subru Krishnan
> Fix For: 2.6.0
>
> Attachments: YARN-1708.patch, YARN-1708.patch, YARN-1708.patch, 
> YARN-1708.patch
>
>
> This JIRA tracks the definition of a new public API for YARN, which allows 
> users to reserve resources (think of time-bounded queues). This is part of 
> the admission control enhancement proposed in YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.

2014-10-06 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160654#comment-14160654
 ] 

zhihai xu commented on YARN-2566:
-

I don't see the problem "-1 javac. The patch appears to cause the build to 
fail." in my local build. Restart the Jenkins test.

> IOException happen in startLocalizer of DefaultContainerExecutor due to not 
> enough disk space for the first localDir.
> -
>
> Key: YARN-2566
> URL: https://issues.apache.org/jira/browse/YARN-2566
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2566.000.patch, YARN-2566.001.patch, 
> YARN-2566.002.patch, YARN-2566.003.patch
>
>
> startLocalizer in DefaultContainerExecutor will only use the first localDir 
> to copy the token file, if the copy is failed for first localDir due to not 
> enough disk space in the first localDir, the localization will be failed even 
> there are plenty of disk space in other localDirs. We see the following error 
> for this case:
> {code}
> 2014-09-13 23:33:25,171 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to 
> create app directory 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
> java.io.IOException: mkdir of 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,185 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Localizer failed
> java.io.FileNotFoundException: File 
> file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 
> does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76)
>   at 
> org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:344)
>   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.create(FileContext.java:673)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,186 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1410663092546_0004_01_01 transitioned from 
> LOCALIZING to LOCALIZATION_FAILED
> 2014-09-13 23:33:25,187 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera   
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESU

[jira] [Updated] (YARN-1708) Add a public API to reserve resources (part of YARN-1051)

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1708:
-
Fix Version/s: 2.6.0

> Add a public API to reserve resources (part of YARN-1051)
> -
>
> Key: YARN-1708
> URL: https://issues.apache.org/jira/browse/YARN-1708
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Subru Krishnan
> Fix For: 2.6.0
>
> Attachments: YARN-1708.patch, YARN-1708.patch, YARN-1708.patch, 
> YARN-1708.patch
>
>
> This JIRA tracks the definition of a new public API for YARN, which allows 
> users to reserve resources (think of time-bounded queues). This is part of 
> the admission control enhancement proposed in YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160653#comment-14160653
 ] 

Jian He commented on YARN-2312:
---

Patch looks good to me too, thanks Jason for reviewing the patch.
bq. Wondering if there should be a utility method on ContainerId to provide 
this value or if the masking constant should be obtainable from ContainerId.
I prefer exposing the constant

> Marking ContainerId#getId as deprecated
> ---
>
> Key: YARN-2312
> URL: https://issues.apache.org/jira/browse/YARN-2312
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, 
> YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, 
> YARN-2312.4.patch, YARN-2312.5.patch
>
>
> {{ContainerId#getId}} will only return partial value of containerId, only 
> sequence number of container id without epoch, after YARN-2229. We should 
> mark {{ContainerId#getId}} as deprecated and use 
> {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1707) Making the CapacityScheduler more dynamic

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1707:
-
Target Version/s:   (was: 3.0.0)

> Making the CapacityScheduler more dynamic
> -
>
> Key: YARN-1707
> URL: https://issues.apache.org/jira/browse/YARN-1707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>  Labels: capacity-scheduler
> Fix For: 2.6.0
>
> Attachments: YARN-1707.10.patch, YARN-1707.2.patch, 
> YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, 
> YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.9.patch, YARN-1707.patch
>
>
> The CapacityScheduler is a rather static at the moment, and refreshqueue 
> provides a rather heavy-handed way to reconfigure it. Moving towards 
> long-running services (tracked in YARN-896) and to enable more advanced 
> admission control and resource parcelling we need to make the 
> CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
> YARN-1051.
> Concretely this require the following changes:
> * create queues dynamically
> * destroy queues dynamically
> * dynamically change queue parameters (e.g., capacity) 
> * modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% 
> instead of ==100%
> We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.

2014-10-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2566:

Attachment: (was: YARN-2566.003.patch)

> IOException happen in startLocalizer of DefaultContainerExecutor due to not 
> enough disk space for the first localDir.
> -
>
> Key: YARN-2566
> URL: https://issues.apache.org/jira/browse/YARN-2566
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2566.000.patch, YARN-2566.001.patch, 
> YARN-2566.002.patch, YARN-2566.003.patch
>
>
> startLocalizer in DefaultContainerExecutor will only use the first localDir 
> to copy the token file, if the copy is failed for first localDir due to not 
> enough disk space in the first localDir, the localization will be failed even 
> there are plenty of disk space in other localDirs. We see the following error 
> for this case:
> {code}
> 2014-09-13 23:33:25,171 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to 
> create app directory 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
> java.io.IOException: mkdir of 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,185 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Localizer failed
> java.io.FileNotFoundException: File 
> file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 
> does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76)
>   at 
> org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:344)
>   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.create(FileContext.java:673)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,186 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1410663092546_0004_01_01 transitioned from 
> LOCALIZING to LOCALIZATION_FAILED
> 2014-09-13 23:33:25,187 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera   
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESULT=FAILURE  DESCRIPTION=Container failed with state: LOCALIZATION_FAILED  
>   APPID=application_1410663092546_0004
> CONTAINERID=conta

[jira] [Updated] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.

2014-10-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2566:

Attachment: YARN-2566.003.patch

> IOException happen in startLocalizer of DefaultContainerExecutor due to not 
> enough disk space for the first localDir.
> -
>
> Key: YARN-2566
> URL: https://issues.apache.org/jira/browse/YARN-2566
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2566.000.patch, YARN-2566.001.patch, 
> YARN-2566.002.patch, YARN-2566.003.patch
>
>
> startLocalizer in DefaultContainerExecutor will only use the first localDir 
> to copy the token file, if the copy is failed for first localDir due to not 
> enough disk space in the first localDir, the localization will be failed even 
> there are plenty of disk space in other localDirs. We see the following error 
> for this case:
> {code}
> 2014-09-13 23:33:25,171 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to 
> create app directory 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
> java.io.IOException: mkdir of 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed
>   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
>   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721)
>   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,185 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Localizer failed
> java.io.FileNotFoundException: File 
> file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 
> does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76)
>   at 
> org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:344)
>   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677)
>   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.create(FileContext.java:673)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021)
>   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,186 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1410663092546_0004_01_01 transitioned from 
> LOCALIZING to LOCALIZATION_FAILED
> 2014-09-13 23:33:25,187 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera   
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESULT=FAILURE  DESCRIPTION=Container failed with state: LOCALIZATION_FAILED  
>   APPID=application_1410663092546_0004
> CONTAINERID=container_141066

[jira] [Updated] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2500:
-
Attachment: YARN-2500.patch

Attached updated patch

> [YARN-796] Miscellaneous changes in ResourceManager to support labels
> -
>
> Key: YARN-2500
> URL: https://issues.apache.org/jira/browse/YARN-2500
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2500.patch, YARN-2500.patch, YARN-2500.patch, 
> YARN-2500.patch
>
>
> This patches contains changes in ResourceManager to support labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2496:
-
Attachment: YARN-2496.patch

Attached updated patch

> [YARN-796] Changes for capacity scheduler to support allocate resource 
> respect labels
> -
>
> Key: YARN-2496
> URL: https://issues.apache.org/jira/browse/YARN-2496
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
> YARN-2496.patch, YARN-2496.patch
>
>
> This JIRA Includes:
> - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
> options of queue like capacity/maximum-capacity, etc.
> - Include a "default-label-expression" option in queue config, if an app 
> doesn't specify label-expression, "default-label-expression" of queue will be 
> used.
> - Check if labels can be accessed by the queue when submit an app with 
> labels-expression to queue or update ResourceRequest with label-expression
> - Check labels on NM when trying to allocate ResourceRequest on the NM with 
> label-expression
> - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1707) Making the CapacityScheduler more dynamic

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1707:
-
Fix Version/s: 2.6.0

> Making the CapacityScheduler more dynamic
> -
>
> Key: YARN-1707
> URL: https://issues.apache.org/jira/browse/YARN-1707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>  Labels: capacity-scheduler
> Fix For: 2.6.0
>
> Attachments: YARN-1707.10.patch, YARN-1707.2.patch, 
> YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, 
> YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.9.patch, YARN-1707.patch
>
>
> The CapacityScheduler is a rather static at the moment, and refreshqueue 
> provides a rather heavy-handed way to reconfigure it. Moving towards 
> long-running services (tracked in YARN-896) and to enable more advanced 
> admission control and resource parcelling we need to make the 
> CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
> YARN-1051.
> Concretely this require the following changes:
> * create queues dynamically
> * destroy queues dynamically
> * dynamically change queue parameters (e.g., capacity) 
> * modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% 
> instead of ==100%
> We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >