[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060379#comment-14060379
 ] 

Jian He commented on YARN-1408:
---

More comments after looking at the latest patch:
- is it possible that schedulerAttempt here is null? e.g. preemption happens 
after the attempt completed.
{code}
SchedulerApplicationAttempt schedulerAttempt 
= getCurrentAttemptForContainer(rmContainer.getContainerId());
schedulerAttempt.recoverResourceRequests(requests);
{code}
- AbstractYarnScheduler#recoverResourceRequest, how about renaming to 
recoverResourceRequestForContainer ?
- assert the size of the requests. it can be empty and the assertion will be 
skipped. similarly for CapacityScheduler test
{code}
List requests = rmContainer.getResourceRequests();
// Once recovered, resource request will be present again in app
for (ResourceRequest request : requests) {
  Assert.assertEquals(1,
  app.getResourceRequest(priority, request.getResourceName())
  .getNumContainers());
}
{code}
- Alternatively, calling warnOrKillContainer twice and setting 
WAIT_TIME_BEFORE_KILL to a small value may do the work. 
{code}
// Create a preempt event by sending KILL event. In real cases,
// FairScheduler#warnOrKillContainer will perform below steps.
ContainerStatus status = SchedulerUtils.createPreemptedContainerStatus(
rmContainer.getContainerId(), SchedulerUtils.PREEMPTED_CONTAINER);
scheduler.recoverResourceRequest(rmContainer);
app.containerCompleted(rmContainer, status, RMContainerEventType.KILL);
{code}


> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, 
> Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, Yarn-1408.5.patch, 
> Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, Yarn-1408.9.patch, 
> Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext

2014-07-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060373#comment-14060373
 ] 

Hadoop QA commented on YARN-2130:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655497/YARN-2130.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4290//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4290//console

This message is automatically generated.

> Cleanup: Adding getRMAppManager, getQueueACLsManager, 
> getApplicationACLsManager to RMContext
> 
>
> Key: YARN-2130
> URL: https://issues.apache.org/jira/browse/YARN-2130
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
> YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch, YARN-2130.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2234) Incorrect description in RM audit logs while refreshing Admin ACL

2014-07-13 Thread Kenji Kikushima (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060366#comment-14060366
 ] 

Kenji Kikushima commented on YARN-2234:
---

This patch contains log message modification only. I think no additional test 
needed.
And I tried failed tests locally, no error occurred.

{noformat}
Running org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.42 sec - in 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
Running 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
Tests run: 40, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 168.308 sec - 
in org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
Running 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.725 sec - in 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler
Running 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 75.733 sec - 
in 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
Running 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
Tests run: 22, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.253 sec - in 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
Running 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.783 sec - in 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched

Results :

Tests run: 105, Failures: 0, Errors: 0, Skipped: 0

[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 4:51.743s
[INFO] Finished at: Mon Jul 14 14:56:46 UTC 2014
[INFO] Final Memory: 31M/375M
[INFO] 
{noformat}

> Incorrect description in RM audit logs while refreshing Admin ACL
> -
>
> Key: YARN-2234
> URL: https://issues.apache.org/jira/browse/YARN-2234
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>Assignee: Kenji Kikushima
> Attachments: YARN-2234.patch
>
>
> In method, AdminService#refreshAdminAcls(AdminService.java:446), failure RM 
> audit log, which is generated when RM is not active, has following 
> description :
>   "ResourceManager is not active. Can not refresh user-groups."
> This should instead be changed to "ResourceManager is not active. Can not 
> refresh admin ACLs'".



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext

2014-07-13 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2130:
-

Attachment: YARN-2130.7.patch

Rebased on trunk.

> Cleanup: Adding getRMAppManager, getQueueACLsManager, 
> getApplicationACLsManager to RMContext
> 
>
> Key: YARN-2130
> URL: https://issues.apache.org/jira/browse/YARN-2130
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
> YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch, YARN-2130.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2234) Incorrect description in RM audit logs while refreshing Admin ACL

2014-07-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060341#comment-14060341
 ] 

Hadoop QA commented on YARN-2234:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655491/YARN-2234.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4289//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4289//console

This message is automatically generated.

> Incorrect description in RM audit logs while refreshing Admin ACL
> -
>
> Key: YARN-2234
> URL: https://issues.apache.org/jira/browse/YARN-2234
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>Assignee: Kenji Kikushima
> Attachments: YARN-2234.patch
>
>
> In method, AdminService#refreshAdminAcls(AdminService.java:446), failure RM 
> audit log, which is generated when RM is not active, has following 
> description :
>   "ResourceManager is not active. Can not refresh user-groups."
> This should instead be changed to "ResourceManager is not active. Can not 
> refresh admin ACLs'".



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2234) Incorrect description in RM audit logs while refreshing Admin ACL

2014-07-13 Thread Kenji Kikushima (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima reassigned YARN-2234:
-

Assignee: Kenji Kikushima

> Incorrect description in RM audit logs while refreshing Admin ACL
> -
>
> Key: YARN-2234
> URL: https://issues.apache.org/jira/browse/YARN-2234
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>Assignee: Kenji Kikushima
> Attachments: YARN-2234.patch
>
>
> In method, AdminService#refreshAdminAcls(AdminService.java:446), failure RM 
> audit log, which is generated when RM is not active, has following 
> description :
>   "ResourceManager is not active. Can not refresh user-groups."
> This should instead be changed to "ResourceManager is not active. Can not 
> refresh admin ACLs'".



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2234) Incorrect description in RM audit logs while refreshing Admin ACL

2014-07-13 Thread Kenji Kikushima (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-2234:
--

Attachment: YARN-2234.patch

Attached a patch.
Changed audit log message to "ResourceManager is not active. Can not refresh 
admin ACLs."

> Incorrect description in RM audit logs while refreshing Admin ACL
> -
>
> Key: YARN-2234
> URL: https://issues.apache.org/jira/browse/YARN-2234
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
> Attachments: YARN-2234.patch
>
>
> In method, AdminService#refreshAdminAcls(AdminService.java:446), failure RM 
> audit log, which is generated when RM is not active, has following 
> description :
>   "ResourceManager is not active. Can not refresh user-groups."
> This should instead be changed to "ResourceManager is not active. Can not 
> refresh admin ACLs'".



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-13 Thread Yuliya Feldman (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060258#comment-14060258
 ] 

Yuliya Feldman commented on YARN-796:
-

1)
{quote}
Agree, what I meant is, we need consider performance of 2 things,
- Time to evaluate a label expression, IMO we need to add labels in per 
container level.
- If it is important to get headroom or how many nodes can be used for an 
expression. The easier expression will be easier for us to get result mentioned 
previously easier.
{quote}
Regarding time to evaluate label expression - we need to get some performance 
stats on how many ops we can process - I will try to get those performance 
numbers based different levels complexity of expression
Did not do anything to include labels evaluation into calculation of headroom, 
so I don't have comments there

2)
bq. Do you have any ideas about what’s the API will like?
It can be as simple as "yarn rmadmin -loadlabels  
"
I am not sure if you mean anything else  

3)
bq. I think for different schedulers, we should specify queue related 
parameters in different configurations. Let’s get more ideas about how to 
specify queue parameters from community before move ahead. 
I have some examples in the document for Fair and Capacity Schedulers

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060138#comment-14060138
 ] 

Wangda Tan commented on YARN-1408:
--

LGTM, +1
Thanks,

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, 
> Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, Yarn-1408.5.patch, 
> Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, Yarn-1408.9.patch, 
> Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060133#comment-14060133
 ] 

Wangda Tan commented on YARN-796:
-


Reply:
Hi Yuliya,
Thanks for your reply. it’s great to read your doc and discuss with you too. :)
Please see my reply below.

1) 
bq. What probably needs to be evaluated is what nodes satisfy a final/effective 
LabelExpression, as nodes can come and go, labels on them can change
Agree, what I meant is, we need consider performance of 2 things,
* Time to evaluate a label expression, IMO we need to add labels in per 
container level.
* If it is important to get headroom or how many nodes can be used for an 
expression. The easier expression will be easier for us to get result mentioned 
previously easier.

2) 
bq. Let me understand it better: If application provides multiple labels they 
are "AND"ed and so only nodes that have the same set of labels or their 
superset will be used?
Yes, 
Why I think this is important because label is treat as a tangible resource 
here. Imaging you running a HBase master, you may want the node is “stable”, 
“large_memory”, “for_long_running_service”. Or you try to run a scientific 
computing program, you want a node has “GPU”, “large_memory”, “strong_cpu”. It 
is not make sense to use “OR” in these cases.

To Sandy/Amit, do you have any specific use case for OR?
My basic feeling to support different OPs like “OR”/“NOT” here is, we may 
support different OPs if they have clear use case and highly demanded. But we’d 
better not use a combined expression. If we use combined expression, we need to 
add parentheses here, which will increase complexity to evaluate them.
Let's hear more thoughts from community about this.


3) 
bq. Yes - so far this is a procedure. Not sure what is "hard" here, but we can 
have some API to do it.
Do you have any ideas about what’s the API will like?


4)
bq. Agree - that today this file may be only relevant to RM. If it is stored as 
local file or by other means it is greater chance for it to be overwritten, 
lost in upgrade process.
Agree

5)
bq. And if we support this, it will be not sufficient to change isBlackListed 
at AppSchedulingInfo only in scheduler to make fair/capacity scheduler works. 
We may need to modify implementations of different schedulers.
Agree


6)
bq. Sure we can make them consistent, our thought process was that if you have 
multiple leaf queues that should share the same label/policy you can specify it 
on the parent level, so you don't need to "type" more then necessary 
I think for different schedulers, we should specify queue related parameters in 
different configurations. Let’s get more ideas about how to specify queue 
parameters from community before move ahead. :)

Thanks,
Wangda

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations

2014-07-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060129#comment-14060129
 ] 

Hudson commented on YARN-2274:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1830 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1830/])
YARN-2274. FairScheduler: Add debug information about cluster capacity, 
availability and reservations. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1609942)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


> FairScheduler: Add debug information about cluster capacity, availability and 
> reservations
> --
>
> Key: YARN-2274
> URL: https://issues.apache.org/jira/browse/YARN-2274
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.4.1
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Trivial
> Fix For: 2.6.0
>
> Attachments: yarn-2274-1.patch, yarn-2274-2.patch, yarn-2274-3.patch
>
>
> FairScheduler logs have little information on cluster capacity and 
> availability. Need this information to debug production issues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations

2014-07-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060112#comment-14060112
 ] 

Hudson commented on YARN-2274:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1803 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1803/])
YARN-2274. FairScheduler: Add debug information about cluster capacity, 
availability and reservations. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1609942)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


> FairScheduler: Add debug information about cluster capacity, availability and 
> reservations
> --
>
> Key: YARN-2274
> URL: https://issues.apache.org/jira/browse/YARN-2274
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.4.1
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Trivial
> Fix For: 2.6.0
>
> Attachments: yarn-2274-1.patch, yarn-2274-2.patch, yarn-2274-3.patch
>
>
> FairScheduler logs have little information on cluster capacity and 
> availability. Need this information to debug production issues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations

2014-07-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060074#comment-14060074
 ] 

Hudson commented on YARN-2274:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #611 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/611/])
YARN-2274. FairScheduler: Add debug information about cluster capacity, 
availability and reservations. (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1609942)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


> FairScheduler: Add debug information about cluster capacity, availability and 
> reservations
> --
>
> Key: YARN-2274
> URL: https://issues.apache.org/jira/browse/YARN-2274
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.4.1
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Trivial
> Fix For: 2.6.0
>
> Attachments: yarn-2274-1.patch, yarn-2274-2.patch, yarn-2274-3.patch
>
>
> FairScheduler logs have little information on cluster capacity and 
> availability. Need this information to debug production issues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart

2014-07-13 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060041#comment-14060041
 ] 

Tsuyoshi OZAWA commented on YARN-2229:
--

The test failure is not related, and javac warning is caused by using {{getId}}.

> ContainerId can overflow with RM restart
> 
>
> Key: YARN-2229
> URL: https://issues.apache.org/jira/browse/YARN-2229
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2229.1.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
> YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, 
> YARN-2229.7.patch, YARN-2229.8.patch
>
>
> On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
> lower 22 bits are for sequence number of Ids. This is for preserving 
> semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
> {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
> {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
> restarts 1024 times.
> To avoid the problem, its better to make containerId long. We need to define 
> the new format of container Id with preserving backward compatibility on this 
> JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)