[jira] [Updated] (YARN-117) Enhance YARN service model

2013-06-13 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-117:
-

Attachment: YARN-117-025.patch

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-007.patch, YARN-117-008.patch, 
 YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, 
 YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, 
 YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, 
 YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, 
 YARN-117-022.patch, YARN-117-023.patch, YARN-117-024.patch, 
 YARN-117-025.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, 
 YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner 
 methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
 retrofit the same behaviour to everything that extends {{AbstractService}} 
 -something that must be done before the class is considered stable (because 
 once the lifecycle methods are declared final, all subclasses that are out of 
 the source tree will need fixing by the respective developers.
 h2. AbstractService state change doesn't defend against race conditions.
 There's no concurrency locks on the state transitions. Whatever fix for wrong 
 state calls is added should correct this to prevent re-entrancy, such as 
 {{stop()}} being called from two threads.
 h2.  Static methods to choreograph of lifecycle operations
 Helper methods to move things through lifecycles. init-start is common, 
 stop-if-service!=null another. Some static methods can execute these, and 
 even call {{stop()}} if {{init()}} raises an exception. These could go into a 
 class {{ServiceOps}} in the same package. These can be used by those services 
 that wrap other services, and help manage more robust shutdowns.
 h2. state transition failures are something that registered service listeners 
 may wish to be informed of.
 When a state transition fails a {{RuntimeException}} can be thrown -and the 
 service listeners are not informed as the notification point isn't reached. 
 They may wish to know this, especially for management and diagnostics.
 *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
 {{stateChangeFailed(Service service,Service.State targeted-state, 
 RuntimeException e)}} that is invoked from the (final) state change methods 
 in the {{AbstractService}} class (once they delegate to their inner 
 {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing 
 implementations of the interface.
 h2. 

[jira] [Commented] (YARN-801) Expose container locations and capabilities in the RM REST APIs

2013-06-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681944#comment-13681944
 ] 

Sandy Ryza commented on YARN-801:
-

[~djp], you're right, we should include ContainerState as well.  What do you 
mean by running task info?

 Expose container locations and capabilities in the RM REST APIs
 ---

 Key: YARN-801
 URL: https://issues.apache.org/jira/browse/YARN-801
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 It would be useful to be able to query container allocation info via the RM 
 REST APIs.  We should be able to query per application, and for each 
 container we should provide (at least):
 * location
 * resource capabilty

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-588) Two identical utility methods in different classes to build Resource

2013-06-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681947#comment-13681947
 ] 

Sandy Ryza commented on YARN-588:
-

Thanks for the patch, Niranjan!  Since I filed this JIRA, a newInstance method 
was added to Resource itself, so I think it would be preferable to use that one 
over the one in BuilderUtils, which hopefully will be phased out.

Also, would it be possible to remove the single-argument 
Resources#createResource as well?  It will require explicitly putting 0 or 1 in 
the places it's used, but I think this is probably advantageous for preventing 
unexpected behavior as YARN becomes more multi-resource centric. 

 Two identical utility methods in different classes to build Resource
 

 Key: YARN-588
 URL: https://issues.apache.org/jira/browse/YARN-588
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Niranjan Singh
Priority: Minor
  Labels: newbie
 Attachments: YARN-588.patch


 Both Resources and BuilderUtils have a method that takes the same arguments 
 to build a Resource object.  The code doesn't need both of these methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-800) Clicking on an AM link for a running app leads to a HTTP 500

2013-06-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681954#comment-13681954
 ] 

Jian He commented on YARN-800:
--

The link is not broken in local cluster because by default local cluster use 
localhost(0.0.0.0) as RM Ip Address.
This problem occurs on real cluster where RM is using real IP address, and 
RM_WEBAPP_ADDRESS is still defaulting to '0.0.0.0', we should default 
RM_WEBAPP_ADDRESS to RM_ADDRESS.
If we specify RM_WEBAPP_ADDRESS explicitly, this problem will not occur

 Clicking on an AM link for a running app leads to a HTTP 500
 

 Key: YARN-800
 URL: https://issues.apache.org/jira/browse/YARN-800
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Priority: Critical

 Clicking the AM link tries to open up a page with url like
 http://hostname:8088/proxy/application_1370886527995_0645/
 and this leads to an HTTP 500

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-588) Two identical utility methods in different classes to build Resource

2013-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681973#comment-13681973
 ] 

Hadoop QA commented on YARN-588:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587207/YARN-588.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1215//console

This message is automatically generated.

 Two identical utility methods in different classes to build Resource
 

 Key: YARN-588
 URL: https://issues.apache.org/jira/browse/YARN-588
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Niranjan Singh
Priority: Minor
  Labels: newbie
 Attachments: YARN-588.patch


 Both Resources and BuilderUtils have a method that takes the same arguments 
 to build a Resource object.  The code doesn't need both of these methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2013-06-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13681972#comment-13681972
 ] 

Arun C Murthy commented on YARN-796:


Yes, we'll need to add admin api (via rmadmin) to add/remove labels - obviously 
we can allow configs for nodes to startup with set of labels which they report 
during registration.

Initially, I'm thinking AND is simplest for multiple labels. Consecutive 
requests, as today, override previous requests.

Use case, as I mentioned, are ability to segregate clusters based on OS, 
processor architecture etc. and hence they aren't resource capabilities, rather 
they are *constraints* which is what labels try to model explicitly.

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-117) Enhance YARN service model

2013-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682007#comment-13682007
 ] 

Hadoop QA commented on YARN-117:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587566/YARN-117-025.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 38 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1214//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1214//console

This message is automatically generated.

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-007.patch, YARN-117-008.patch, 
 YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, 
 YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, 
 YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, 
 YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, 
 YARN-117-022.patch, YARN-117-023.patch, YARN-117-024.patch, 
 YARN-117-025.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, 
 YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these 

[jira] [Commented] (YARN-117) Enhance YARN service model

2013-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682014#comment-13682014
 ] 

Hadoop QA commented on YARN-117:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587566/YARN-117-025.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 38 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1216//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1216//console

This message is automatically generated.

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-007.patch, YARN-117-008.patch, 
 YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, 
 YARN-117-012.patch, YARN-117-013.patch, YARN-117-014.patch, 
 YARN-117-015.patch, YARN-117-016.patch, YARN-117-018.patch, 
 YARN-117-019.patch, YARN-117-020.patch, YARN-117-021.patch, 
 YARN-117-022.patch, YARN-117-023.patch, YARN-117-024.patch, 
 YARN-117-025.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, 
 YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these 

[jira] [Commented] (YARN-801) Expose container locations and capabilities in the RM REST APIs

2013-06-13 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682033#comment-13682033
 ] 

Junping Du commented on YARN-801:
-

I mean if container is in running state, do we want to expose some information 
related with task that is running on? like task ID, running time, etc.

 Expose container locations and capabilities in the RM REST APIs
 ---

 Key: YARN-801
 URL: https://issues.apache.org/jira/browse/YARN-801
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 It would be useful to be able to query container allocation info via the RM 
 REST APIs.  We should be able to query per application, and for each 
 container we should provide (at least):
 * location
 * resource capabilty

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-291) Dynamic resource configuration

2013-06-13 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-291:


Description: The current Hadoop YARN resource management logic assumes per 
node resource is static during the lifetime of the NM process. Allowing 
run-time configuration on per node resource will give us finer granularity of 
resource elasticity. This allows Hadoop workloads to coexist with other 
workloads on the same hardware efficiently, whether or not the environment is 
virtualized. More background and design details can be found in attached 
proposal.  (was: The current Hadoop YARN resource management logic assumes per 
node resource is static during the lifetime of the NM process. Allowing 
run-time configuration on per node resource will give us finer granularity of 
resource elasticity. This allows Hadoop workloads to coexist with other 
workloads on the same hardware efficiently, whether or not the environment is 
virtualized. About more background and design details, please refer: 
HADOOP-9165.)

 Dynamic resource configuration
 --

 Key: YARN-291
 URL: https://issues.apache.org/jira/browse/YARN-291
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: Elastic Resources for YARN-v0.2.pdf, 
 YARN-291-AddClientRMProtocolToSetNodeResource-03.patch, 
 YARN-291-all-v1.patch, YARN-291-core-HeartBeatAndScheduler-01.patch, 
 YARN-291-JMXInterfaceOnNM-02.patch, 
 YARN-291-OnlyUpdateWhenResourceChange-01-fix.patch, 
 YARN-291-YARNClientCommandline-04.patch


 The current Hadoop YARN resource management logic assumes per node resource 
 is static during the lifetime of the NM process. Allowing run-time 
 configuration on per node resource will give us finer granularity of resource 
 elasticity. This allows Hadoop workloads to coexist with other workloads on 
 the same hardware efficiently, whether or not the environment is virtualized. 
 More background and design details can be found in attached proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682115#comment-13682115
 ] 

Hudson commented on YARN-427:
-

Integrated in Hadoop-Yarn-trunk #239 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/239/])
YARN-427. Coverage fix for org.apache.hadoop.yarn.server.api.* (Aleksey 
Gorshkov via jeagles) (Revision 1492282)

 Result = SUCCESS
jeagles : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492282
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 

 Key: YARN-427
 URL: https://issues.apache.org/jira/browse/YARN-427
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Fix For: 3.0.0, 0.23.9, 2.3.0

 Attachments: YARN-427-branch-0.23-b.patch, 
 YARN-427-branch-0.23-c.patch, YARN-427-branch-2-a.patch, 
 YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, 
 YARN-427-trunk-b.patch, YARN-427-trunk-c.patch, YARN-427-trunk.patch


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 patch YARN-427-trunk.patch for trunk
 patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682114#comment-13682114
 ] 

Hudson commented on YARN-600:
-

Integrated in Hadoop-Yarn-trunk #239 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/239/])
YARN-600. Hook up cgroups CPU settings to the number of virtual cores 
allocated. (sandyr via tucu) (Revision 1492365)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492365
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java


 Hook up cgroups CPU settings to the number of virtual cores allocated
 -

 Key: YARN-600
 URL: https://issues.apache.org/jira/browse/YARN-600
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.0-beta

 Attachments: YARN-600.patch


 YARN-3 introduced CPU isolation and monitoring through cgroups.  YARN-2 and 
 introduced CPU scheduling in the capacity scheduler, and YARN-326 will 
 introduce it in the fair scheduler.  The number of virtual cores allocated to 
 a container should be used to weight the number of cgroups CPU shares given 
 to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-648) FS: Add documentation for pluggable policy

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682118#comment-13682118
 ] 

Hudson commented on YARN-648:
-

Integrated in Hadoop-Yarn-trunk #239 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/239/])
YARN-648. FS: Add documentation for pluggable policy. (kkambatl via tucu) 
(Revision 1492388)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492388
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm


 FS: Add documentation for pluggable policy
 --

 Key: YARN-648
 URL: https://issues.apache.org/jira/browse/YARN-648
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: documentaion
 Fix For: 2.1.0-beta

 Attachments: yarn-648-1.patch, yarn-648-2.patch


 YARN-469 and YARN-482 make the scheduling policy in FS pluggable. Need to add 
 documentation on how to use this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-700) TestInfoBlock fails on Windows because of line ending missmatch

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682117#comment-13682117
 ] 

Hudson commented on YARN-700:
-

Integrated in Hadoop-Yarn-trunk #239 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/239/])
YARN-700. TestInfoBlock fails on Windows because of line ending missmatch. 
Contributed by Ivan Mitic. (Revision 1492383)

 Result = SUCCESS
cnauroth : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492383
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/view/TestInfoBlock.java


 TestInfoBlock fails on Windows because of line ending missmatch
 ---

 Key: YARN-700
 URL: https://issues.apache.org/jira/browse/YARN-700
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Fix For: 3.0.0, 2.1.0-beta

 Attachments: YARN-700.patch


 Exception:
 {noformat}
 Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec  
 FAILURE!
 testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock)  
 Time elapsed: 873 sec   FAILURE!
 java.lang.AssertionError: 
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-291) Dynamic resource configuration

2013-06-13 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682124#comment-13682124
 ] 

Junping Du commented on YARN-291:
-

Hi [~tucu00], thanks for comments above. Actually, just like Luke's comments, 
in this JIRA, we tried to address the case that YARN node's resource is changed 
by plan as mixing resources with non-yarn applications (like HBase, Drill, 
etc.) but not a react to app's short-term thrashing behavior. Thus, we don't 
have to change anything related to app's API, but just provide a way to change 
node's resource through Admin APIs (Admin Protocol, CLI, REST and JMX). The 
only special case is: we should deal with case that running Containers' 
resource larger than NM's total resource (after changed), but it can be handled 
well by involving a minus value for available resource (we can call it a debt 
resource model) which shows great flexibility of YARN framework. And YARN 
scheduler just stop to assign containers on over-loaded NM (in debt) until 
its resource being balanced again which looks perfect in our previous test (100 
virtual nodes on 10 physical servers). In the long term, preemption mechanism 
can also be involved for releasing containers/resource in some cases. Thoughts?

 Dynamic resource configuration
 --

 Key: YARN-291
 URL: https://issues.apache.org/jira/browse/YARN-291
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: Elastic Resources for YARN-v0.2.pdf, 
 YARN-291-AddClientRMProtocolToSetNodeResource-03.patch, 
 YARN-291-all-v1.patch, YARN-291-core-HeartBeatAndScheduler-01.patch, 
 YARN-291-JMXInterfaceOnNM-02.patch, 
 YARN-291-OnlyUpdateWhenResourceChange-01-fix.patch, 
 YARN-291-YARNClientCommandline-04.patch


 The current Hadoop YARN resource management logic assumes per node resource 
 is static during the lifetime of the NM process. Allowing run-time 
 configuration on per node resource will give us finer granularity of resource 
 elasticity. This allows Hadoop workloads to coexist with other workloads on 
 the same hardware efficiently, whether or not the environment is virtualized. 
 More background and design details can be found in attached proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-588) Two identical utility methods in different classes to build Resource

2013-06-13 Thread Niranjan Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682126#comment-13682126
 ] 

Niranjan Singh commented on YARN-588:
-

Hi Sandy,
Thanks for the comments , I agree Resources#createResource should be used.So I 
have changed all BuilderUtils#newResource to Resources#createResource. But the 
problem is that I am getting a compilation error as NodeManager module is first 
compiled and then the ResourceManager, as some of the tests in NodeManager 
module use BuilderUtils#newResource, when changed to Resources#createResource , 
the module cannot find the resourcemanager package.

Below is just one line of compilation error:

hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java:[75,61]
 error: package org.apache.hadoop.yarn.server.resourcemanager.resource does not 
exist

Yeah, I am stuck here, if you do vice-versa , there's no compilation error as 
BuilderUtils is available to both the nodemanager and resourcemanager modules.
Any suggestions?


 Two identical utility methods in different classes to build Resource
 

 Key: YARN-588
 URL: https://issues.apache.org/jira/browse/YARN-588
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Niranjan Singh
Priority: Minor
  Labels: newbie
 Attachments: YARN-588.patch


 Both Resources and BuilderUtils have a method that takes the same arguments 
 to build a Resource object.  The code doesn't need both of these methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682203#comment-13682203
 ] 

Hudson commented on YARN-427:
-

Integrated in Hadoop-Hdfs-0.23-Build #637 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/637/])
YARN-427. Coverage fix for org.apache.hadoop.yarn.server.api.* (Aleksey 
Gorshkov via jeagles) (Revision 1492272)

 Result = SUCCESS
jeagles : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492272
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 

 Key: YARN-427
 URL: https://issues.apache.org/jira/browse/YARN-427
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Fix For: 3.0.0, 0.23.9, 2.3.0

 Attachments: YARN-427-branch-0.23-b.patch, 
 YARN-427-branch-0.23-c.patch, YARN-427-branch-2-a.patch, 
 YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, 
 YARN-427-trunk-b.patch, YARN-427-trunk-c.patch, YARN-427-trunk.patch


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 patch YARN-427-trunk.patch for trunk
 patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682218#comment-13682218
 ] 

Hudson commented on YARN-600:
-

Integrated in Hadoop-Hdfs-trunk #1429 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1429/])
YARN-600. Hook up cgroups CPU settings to the number of virtual cores 
allocated. (sandyr via tucu) (Revision 1492365)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492365
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java


 Hook up cgroups CPU settings to the number of virtual cores allocated
 -

 Key: YARN-600
 URL: https://issues.apache.org/jira/browse/YARN-600
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.0-beta

 Attachments: YARN-600.patch


 YARN-3 introduced CPU isolation and monitoring through cgroups.  YARN-2 and 
 introduced CPU scheduling in the capacity scheduler, and YARN-326 will 
 introduce it in the fair scheduler.  The number of virtual cores allocated to 
 a container should be used to weight the number of cgroups CPU shares given 
 to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-648) FS: Add documentation for pluggable policy

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682223#comment-13682223
 ] 

Hudson commented on YARN-648:
-

Integrated in Hadoop-Hdfs-trunk #1429 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1429/])
YARN-648. FS: Add documentation for pluggable policy. (kkambatl via tucu) 
(Revision 1492388)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492388
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm


 FS: Add documentation for pluggable policy
 --

 Key: YARN-648
 URL: https://issues.apache.org/jira/browse/YARN-648
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: documentaion
 Fix For: 2.1.0-beta

 Attachments: yarn-648-1.patch, yarn-648-2.patch


 YARN-469 and YARN-482 make the scheduling policy in FS pluggable. Need to add 
 documentation on how to use this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-700) TestInfoBlock fails on Windows because of line ending missmatch

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1368#comment-1368
 ] 

Hudson commented on YARN-700:
-

Integrated in Hadoop-Hdfs-trunk #1429 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1429/])
YARN-700. TestInfoBlock fails on Windows because of line ending missmatch. 
Contributed by Ivan Mitic. (Revision 1492383)

 Result = FAILURE
cnauroth : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492383
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/view/TestInfoBlock.java


 TestInfoBlock fails on Windows because of line ending missmatch
 ---

 Key: YARN-700
 URL: https://issues.apache.org/jira/browse/YARN-700
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Fix For: 3.0.0, 2.1.0-beta

 Attachments: YARN-700.patch


 Exception:
 {noformat}
 Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec  
 FAILURE!
 testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock)  
 Time elapsed: 873 sec   FAILURE!
 java.lang.AssertionError: 
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682220#comment-13682220
 ] 

Hudson commented on YARN-427:
-

Integrated in Hadoop-Hdfs-trunk #1429 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1429/])
YARN-427. Coverage fix for org.apache.hadoop.yarn.server.api.* (Aleksey 
Gorshkov via jeagles) (Revision 1492282)

 Result = FAILURE
jeagles : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492282
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 

 Key: YARN-427
 URL: https://issues.apache.org/jira/browse/YARN-427
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Fix For: 3.0.0, 0.23.9, 2.3.0

 Attachments: YARN-427-branch-0.23-b.patch, 
 YARN-427-branch-0.23-c.patch, YARN-427-branch-2-a.patch, 
 YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, 
 YARN-427-trunk-b.patch, YARN-427-trunk-c.patch, YARN-427-trunk.patch


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 patch YARN-427-trunk.patch for trunk
 patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-789) Enable zero capabilities resource requests in fair scheduler

2013-06-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682230#comment-13682230
 ] 

Alejandro Abdelnur commented on YARN-789:
-

[~acmurthy], I assume you refer to the ResourceManager.validateConfigs() 
changes that validate min/max resources.

The scheduler config values validation should be done by the scheduler 
implementation itself.

It seems the simplest way of doing this would be to make scheduler 
implementations to implement Configurable so it receives the Yarn config on 
instantiation via ReflectionUtils and do the validation check during the 
setConf().

Sound good? If so I'll do this in a separate JIRA before this one.


 Enable zero capabilities resource requests in fair scheduler
 

 Key: YARN-789
 URL: https://issues.apache.org/jira/browse/YARN-789
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-789.patch, YARN-789.patch


 Per discussion in YARN-689, reposting updated use case:
 1. I have a set of services co-existing with a Yarn cluster.
 2. These services run out of band from Yarn. They are not started as yarn 
 containers and they don't use Yarn containers for processing.
 3. These services use, dynamically, different amounts of CPU and memory based 
 on their load. They manage their CPU and memory requirements independently. 
 In other words, depending on their load, they may require more CPU but not 
 memory or vice-versa.
 By using YARN as RM for these services I'm able share and utilize the 
 resources of the cluster appropriately and in a dynamic way. Yarn keeps tab 
 of all the resources.
 These services run an AM that reserves resources on their behalf. When this 
 AM gets the requested resources, the services bump up their CPU/memory 
 utilization out of band from Yarn. If the Yarn allocations are 
 released/preempted, the services back off on their resources utilization. By 
 doing this, Yarn and these service correctly share the cluster resources, 
 being Yarn RM the only one that does the overall resource bookkeeping.
 The services AM, not to break the lifecycle of containers, start containers 
 in the corresponding NMs. These container processes do basically a sleep 
 forever (i.e. sleep 1d). They are almost not using any CPU nor memory 
 (less than 1MB). Thus it is reasonable to assume their required CPU and 
 memory utilization is NIL (more on hard enforcement later). Because of this 
 almost NIL utilization of CPU and memory, it is possible to specify, when 
 doing a request, zero as one of the dimensions (CPU or memory).
 The current limitation is that the increment is also the minimum. 
 If we set the memory increment to 1MB. When doing a pure CPU request, we 
 would have to specify 1MB of memory. That would work. However it would allow 
 discretionary memory requests without a desired normalization (increments of 
 256, 512, etc).
 If we set the CPU increment to 1CPU. When doing a pure memory request, we 
 would have to specify 1CPU. CPU amounts a much smaller than memory amounts, 
 and because we don't have fractional CPUs, it would mean that all my pure 
 memory requests will be wasting 1 CPU thus reducing the overall utilization 
 of the cluster.
 Finally, on hard enforcement. 
 * For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an 
 absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we 
 ensure there is enough CPU cycles to run the sleep process. This absolute 
 minimum would only kick-in if zero is allowed, otherwise will never kick in 
 as the shares for 1 CPU are 1024.
 * For Memory. Hard enforcement is currently done by the 
 ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would 
 take care of zero memory resources. And again,  this absolute minimum would 
 only kick-in if zero is allowed, otherwise will never kick in as the 
 increment memory is in several MBs if not 1GB.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2013-06-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682247#comment-13682247
 ] 

Alejandro Abdelnur commented on YARN-796:
-

bq. Yes, we'll need to add admin api (via rmadmin) to add/remove labels - 
obviously we can allow configs for nodes to startup with set of labels which 
they report during registration.

Are labels set via the API persisted in the RM? Where? 

When a node registers, how labels are synched between the ones it had in its 
config and the ones added/removed via the rmadmin.

Given the usecase you are mentioning it seems these labels are rather static 
and determined by node characteristics/features. Wouldn't be simpler to start 
without an rmadmin API and just get them from the nodes on node registration?

bq. Initially, I'm thinking AND is simplest for multiple labels. Consecutive 
requests, as today, override previous requests.

So the labels would not be part of the resource-request key, right?

bq. Use case, as I mentioned, are ability to segregate clusters based on OS, 
processor architecture etc. and hence they aren't resource capabilities, rather 
they are constraints which is what labels try to model explicitly.

They are expressing capabilities of a node, just capabilities that don't have a 
quantity that drains. My suggestion for modeling this labels as a resource 
capability, is that you could use them as a dimension in DRF. 


 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-427) Coverage fix for org.apache.hadoop.yarn.server.api.*

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682268#comment-13682268
 ] 

Hudson commented on YARN-427:
-

Integrated in Hadoop-Mapreduce-trunk #1456 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1456/])
YARN-427. Coverage fix for org.apache.hadoop.yarn.server.api.* (Aleksey 
Gorshkov via jeagles) (Revision 1492282)

 Result = SUCCESS
jeagles : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492282
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 

 Key: YARN-427
 URL: https://issues.apache.org/jira/browse/YARN-427
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Fix For: 3.0.0, 0.23.9, 2.3.0

 Attachments: YARN-427-branch-0.23-b.patch, 
 YARN-427-branch-0.23-c.patch, YARN-427-branch-2-a.patch, 
 YARN-427-branch-2-b.patch, YARN-427-branch-2.patch, YARN-427-trunk-a.patch, 
 YARN-427-trunk-b.patch, YARN-427-trunk-c.patch, YARN-427-trunk.patch


 Coverage fix for org.apache.hadoop.yarn.server.api.*
 patch YARN-427-trunk.patch for trunk
 patch YARN-427-branch-2.patch for branch-2 and branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-600) Hook up cgroups CPU settings to the number of virtual cores allocated

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682267#comment-13682267
 ] 

Hudson commented on YARN-600:
-

Integrated in Hadoop-Mapreduce-trunk #1456 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1456/])
YARN-600. Hook up cgroups CPU settings to the number of virtual cores 
allocated. (sandyr via tucu) (Revision 1492365)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492365
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java


 Hook up cgroups CPU settings to the number of virtual cores allocated
 -

 Key: YARN-600
 URL: https://issues.apache.org/jira/browse/YARN-600
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.0-beta

 Attachments: YARN-600.patch


 YARN-3 introduced CPU isolation and monitoring through cgroups.  YARN-2 and 
 introduced CPU scheduling in the capacity scheduler, and YARN-326 will 
 introduce it in the fair scheduler.  The number of virtual cores allocated to 
 a container should be used to weight the number of cgroups CPU shares given 
 to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-648) FS: Add documentation for pluggable policy

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682271#comment-13682271
 ] 

Hudson commented on YARN-648:
-

Integrated in Hadoop-Mapreduce-trunk #1456 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1456/])
YARN-648. FS: Add documentation for pluggable policy. (kkambatl via tucu) 
(Revision 1492388)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492388
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm


 FS: Add documentation for pluggable policy
 --

 Key: YARN-648
 URL: https://issues.apache.org/jira/browse/YARN-648
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: documentaion
 Fix For: 2.1.0-beta

 Attachments: yarn-648-1.patch, yarn-648-2.patch


 YARN-469 and YARN-482 make the scheduling policy in FS pluggable. Need to add 
 documentation on how to use this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-700) TestInfoBlock fails on Windows because of line ending missmatch

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682270#comment-13682270
 ] 

Hudson commented on YARN-700:
-

Integrated in Hadoop-Mapreduce-trunk #1456 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1456/])
YARN-700. TestInfoBlock fails on Windows because of line ending missmatch. 
Contributed by Ivan Mitic. (Revision 1492383)

 Result = SUCCESS
cnauroth : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492383
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/view/TestInfoBlock.java


 TestInfoBlock fails on Windows because of line ending missmatch
 ---

 Key: YARN-700
 URL: https://issues.apache.org/jira/browse/YARN-700
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Fix For: 3.0.0, 2.1.0-beta

 Attachments: YARN-700.patch


 Exception:
 {noformat}
 Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec  
 FAILURE!
 testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock)  
 Time elapsed: 873 sec   FAILURE!
 java.lang.AssertionError: 
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-13 Thread Avner BenHanoch (JIRA)
Avner BenHanoch created YARN-802:


 Summary: APPLICATION_INIT is never sent to AuxServices other than 
the builtin ShuffleHandler
 Key: YARN-802
 URL: https://issues.apache.org/jira/browse/YARN-802
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Avner BenHanoch


APPLICATION_INIT is never sent to AuxServices other than the builtin 
ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be able 
to function, because APPLICATION_INIT enables the AuxiliaryService to map 
jobId-userId. This is needed for properly finding the MOFs of a job per 
reducers' requests.

NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
...) and ignores any additional AuxiliaryService. As a result, only the 
built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
AuxillaryService will never ger APPLICATION_INIT events.


I think a solution can be in one of two ways:
1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
each of them, by calling serviceData.put (…) in loop.
2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
APPLICATION_STOP is never sent to AuxServices.  This means that in case the 
'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
Services regardless of the value in event.getServiceID().

I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the needed 
patch for any option that people like.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-13 Thread Avner BenHanoch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avner BenHanoch updated YARN-802:
-

Description: 
APPLICATION_INIT is never sent to AuxServices other than the built-in 
ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be able 
to function, because APPLICATION_INIT enables the AuxiliaryService to map 
jobId-userId. This is needed for properly finding the MOFs of a job per 
reducers' requests.

NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
...) and ignores any additional AuxiliaryService. As a result, only the 
built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
AuxillaryService will never get APPLICATION_INIT events.


I think a solution can be in one of two ways:
1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
each of them, by calling serviceData.put (…) in loop.
2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
APPLICATION_STOP is never sent to AuxServices.  This means that in case the 
'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
Services regardless of the value in event.getServiceID().

I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the needed 
patch for any option that people like.

  was:
APPLICATION_INIT is never sent to AuxServices other than the builtin 
ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be able 
to function, because APPLICATION_INIT enables the AuxiliaryService to map 
jobId-userId. This is needed for properly finding the MOFs of a job per 
reducers' requests.

NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
...) and ignores any additional AuxiliaryService. As a result, only the 
built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
AuxillaryService will never ger APPLICATION_INIT events.


I think a solution can be in one of two ways:
1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
each of them, by calling serviceData.put (…) in loop.
2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
APPLICATION_STOP is never sent to AuxServices.  This means that in case the 
'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
Services regardless of the value in event.getServiceID().

I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the needed 
patch for any option that people like.


 APPLICATION_INIT is never sent to AuxServices other than the builtin 
 ShuffleHandler
 ---

 Key: YARN-802
 URL: https://issues.apache.org/jira/browse/YARN-802
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Avner BenHanoch

 APPLICATION_INIT is never sent to AuxServices other than the built-in 
 ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
 able to function, because APPLICATION_INIT enables the AuxiliaryService to 
 map jobId-userId. This is needed for properly finding the MOFs of a job per 
 reducers' requests.
 NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
 hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
 explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
 ...) and ignores any additional AuxiliaryService. As a result, only the 
 built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
 AuxillaryService will never get APPLICATION_INIT events.
 I think a solution can be in one of two ways:
 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
 each of them, by calling serviceData.put (…) in loop.
 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
 APPLICATION_STOP is never sent to AuxServices.  This means that in case the 
 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
 Services regardless of the value in event.getServiceID().
 I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
 needed patch for any option that people like.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2013-06-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682323#comment-13682323
 ] 

Alejandro Abdelnur commented on YARN-796:
-

One thing I forgot to mention before is that labels seem to make sense if the 
resource-requests location is ANY or for a rack, for resource-requests that are 
host specific it does not make sense. We would have verify then that aspect of 
a resource-request on arrival to the RM.

I guess something this would enable are resource-request like 
[(location=node1),(location=rack1),(location=ANY,label=wired-to-switch1)]. The 
topology would be node1 is in rack1 and rack1 is connected to switch-1. The 
request says, I prefer node1, then rack1 and then a node in another rack if 
that rack is connected to switch-1.

A bit more on modeling them as a resource (I have not thought it in full, so 
just an idea to think).

Labels are resources which max=1 and a total that is equal to the max number of 
containers a node may have (driven by mem or cpu, whichever allows less 
containers). Then a node that has a given label-resource always has capability 
for it if there is enough capability of memory and/or CPU. Then, a DRF 
scheduler could use label-resources for 'fair' allocation decisions.

Another option, away from the resources modeling, would be to model labels 
similar to location.

Said all this, I think this is a great idea that opens a new set of allocation 
possibilities. We should spend some time defining exactly what functionality we 
want to achieve and then decide if we do that in incremental phases.


 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-801) Expose container locations and capabilities in the RM REST APIs

2013-06-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682342#comment-13682342
 ] 

Alejandro Abdelnur commented on YARN-801:
-

Traversing the scheduler data structures  preparing all this info in large 
cluster under load it can be expensive and if we do a lock to do this data 
collection/generation we'll be adding contention to the scheduler. I think we 
should model this as a service independent of the scheduler which receives 
async events on container allocations/terminations and keeps its own data 
structure. This would decouple any contention from the scheduler from calls to 
this REST API.


 Expose container locations and capabilities in the RM REST APIs
 ---

 Key: YARN-801
 URL: https://issues.apache.org/jira/browse/YARN-801
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 It would be useful to be able to query container allocation info via the RM 
 REST APIs.  We should be able to query per application, and for each 
 container we should provide (at least):
 * location
 * resource capabilty

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-291) Dynamic resource configuration

2013-06-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682361#comment-13682361
 ] 

Alejandro Abdelnur commented on YARN-291:
-

[~djp], yes that should do it, at least in theory. We should make sure we have 
tests that exercise NM capabilities and things don' break.

One more think (along the lines of a comment I've done in YARN-796). NM 
get/report their total capacity to the RM from the NM configuration. If you 
make a change of the total capacity through the rmadmin API, are this changes 
transient? If yes, what happens if the NM restarts. If not, where do you 
persist this changes and how do you merge them back when the NM restarts?

 Dynamic resource configuration
 --

 Key: YARN-291
 URL: https://issues.apache.org/jira/browse/YARN-291
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: Elastic Resources for YARN-v0.2.pdf, 
 YARN-291-AddClientRMProtocolToSetNodeResource-03.patch, 
 YARN-291-all-v1.patch, YARN-291-core-HeartBeatAndScheduler-01.patch, 
 YARN-291-JMXInterfaceOnNM-02.patch, 
 YARN-291-OnlyUpdateWhenResourceChange-01-fix.patch, 
 YARN-291-YARNClientCommandline-04.patch


 The current Hadoop YARN resource management logic assumes per node resource 
 is static during the lifetime of the NM process. Allowing run-time 
 configuration on per node resource will give us finer granularity of resource 
 elasticity. This allows Hadoop workloads to coexist with other workloads on 
 the same hardware efficiently, whether or not the environment is virtualized. 
 More background and design details can be found in attached proposal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682375#comment-13682375
 ] 

Hudson commented on YARN-530:
-

Integrated in Hadoop-trunk-Commit #3911 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3911/])
YARN-530. Defined Service model strictly, implemented AbstractService for 
robust subclassing and migrated yarn-common services. Contributed by Steve 
Loughran.
YARN-117. Migrated rest of YARN to the new service model. Contributed by Steve 
Louhran.
MAPREDUCE-5298. Moved MapReduce services to YARN-530 stricter lifecycle. 
Contributed by Steve Loughran. (Revision 1492718)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492718
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/LocalContainerLauncher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryCopyService.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/client/MRClientService.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/local/LocalContainerAllocator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerRequestor.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/speculate/DefaultSpeculator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRAppBenchmark.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFail.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/CachedHistoryStorage.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryClientService.java
* 

[jira] [Commented] (YARN-117) Enhance YARN service model

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682373#comment-13682373
 ] 

Hudson commented on YARN-117:
-

Integrated in Hadoop-trunk-Commit #3911 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3911/])
YARN-530. Defined Service model strictly, implemented AbstractService for 
robust subclassing and migrated yarn-common services. Contributed by Steve 
Loughran.
YARN-117. Migrated rest of YARN to the new service model. Contributed by Steve 
Louhran.
MAPREDUCE-5298. Moved MapReduce services to YARN-530 stricter lifecycle. 
Contributed by Steve Loughran. (Revision 1492718)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492718
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/LocalContainerLauncher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryCopyService.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/TaskHeartbeatHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/client/MRClientService.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/local/LocalContainerAllocator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerRequestor.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/speculate/DefaultSpeculator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRAppBenchmark.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFail.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestMRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/CachedHistoryStorage.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryClientService.java
* 

[jira] [Commented] (YARN-588) Two identical utility methods in different classes to build Resource

2013-06-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682395#comment-13682395
 ] 

Sandy Ryza commented on YARN-588:
-

I meant that neither Resources#createResource nor BuilderUtils#newResource 
should be used, and instead use the recently added Resource#newInstance.  Sorry 
that there are so many times that sound so similar. As Resource is in the YARN 
API package, you should no longer face the problem you're encountering.

 Two identical utility methods in different classes to build Resource
 

 Key: YARN-588
 URL: https://issues.apache.org/jira/browse/YARN-588
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Niranjan Singh
Priority: Minor
  Labels: newbie
 Attachments: YARN-588.patch


 Both Resources and BuilderUtils have a method that takes the same arguments 
 to build a Resource object.  The code doesn't need both of these methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-588) Two identical utility methods in different classes to build Resource

2013-06-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682398#comment-13682398
 ] 

Sandy Ryza commented on YARN-588:
-

*so many *names* that sound so similar

 Two identical utility methods in different classes to build Resource
 

 Key: YARN-588
 URL: https://issues.apache.org/jira/browse/YARN-588
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Niranjan Singh
Priority: Minor
  Labels: newbie
 Attachments: YARN-588.patch


 Both Resources and BuilderUtils have a method that takes the same arguments 
 to build a Resource object.  The code doesn't need both of these methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created YARN-803:
---

 Summary: factor out scheduler config validation from the 
ResourceManager to each scheduler implementation
 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur


Per discussion in YARN-789 we should factor out from the ResourceManager class 
the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-803:


Attachment: YARN-803.patch

patch moving conf validation to the setConf() method of the scheduler impls. 
Making FS to implement Configurable.

 factor out scheduler config validation from the ResourceManager to each 
 scheduler implementation
 

 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-803.patch


 Per discussion in YARN-789 we should factor out from the ResourceManager 
 class the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-789) Enable zero capabilities resource requests in fair scheduler

2013-06-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682409#comment-13682409
 ] 

Alejandro Abdelnur commented on YARN-789:
-

Created YARN-803 to factor out conf validations. I'll rebase this one on top of 
that one.

 Enable zero capabilities resource requests in fair scheduler
 

 Key: YARN-789
 URL: https://issues.apache.org/jira/browse/YARN-789
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-789.patch, YARN-789.patch


 Per discussion in YARN-689, reposting updated use case:
 1. I have a set of services co-existing with a Yarn cluster.
 2. These services run out of band from Yarn. They are not started as yarn 
 containers and they don't use Yarn containers for processing.
 3. These services use, dynamically, different amounts of CPU and memory based 
 on their load. They manage their CPU and memory requirements independently. 
 In other words, depending on their load, they may require more CPU but not 
 memory or vice-versa.
 By using YARN as RM for these services I'm able share and utilize the 
 resources of the cluster appropriately and in a dynamic way. Yarn keeps tab 
 of all the resources.
 These services run an AM that reserves resources on their behalf. When this 
 AM gets the requested resources, the services bump up their CPU/memory 
 utilization out of band from Yarn. If the Yarn allocations are 
 released/preempted, the services back off on their resources utilization. By 
 doing this, Yarn and these service correctly share the cluster resources, 
 being Yarn RM the only one that does the overall resource bookkeeping.
 The services AM, not to break the lifecycle of containers, start containers 
 in the corresponding NMs. These container processes do basically a sleep 
 forever (i.e. sleep 1d). They are almost not using any CPU nor memory 
 (less than 1MB). Thus it is reasonable to assume their required CPU and 
 memory utilization is NIL (more on hard enforcement later). Because of this 
 almost NIL utilization of CPU and memory, it is possible to specify, when 
 doing a request, zero as one of the dimensions (CPU or memory).
 The current limitation is that the increment is also the minimum. 
 If we set the memory increment to 1MB. When doing a pure CPU request, we 
 would have to specify 1MB of memory. That would work. However it would allow 
 discretionary memory requests without a desired normalization (increments of 
 256, 512, etc).
 If we set the CPU increment to 1CPU. When doing a pure memory request, we 
 would have to specify 1CPU. CPU amounts a much smaller than memory amounts, 
 and because we don't have fractional CPUs, it would mean that all my pure 
 memory requests will be wasting 1 CPU thus reducing the overall utilization 
 of the cluster.
 Finally, on hard enforcement. 
 * For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an 
 absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we 
 ensure there is enough CPU cycles to run the sleep process. This absolute 
 minimum would only kick-in if zero is allowed, otherwise will never kick in 
 as the shares for 1 CPU are 1024.
 * For Memory. Hard enforcement is currently done by the 
 ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would 
 take care of zero memory resources. And again,  this absolute minimum would 
 only kick-in if zero is allowed, otherwise will never kick in as the 
 increment memory is in several MBs if not 1GB.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-801) Expose container locations and capabilities in the RM REST APIs

2013-06-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682403#comment-13682403
 ] 

Sandy Ryza commented on YARN-801:
-

As YARN is agnostic to what its containers are used for, the RM won't have any 
information about what tasks are running inside them.  This could possibly be 
added to the AM REST APIs, if it's not already there.  

 Expose container locations and capabilities in the RM REST APIs
 ---

 Key: YARN-801
 URL: https://issues.apache.org/jira/browse/YARN-801
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 It would be useful to be able to query container allocation info via the RM 
 REST APIs.  We should be able to query per application, and for each 
 container we should provide (at least):
 * location
 * resource capabilty

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-125) Make Yarn Client service shutdown operations robust

2013-06-13 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-125.
-

   Resolution: Fixed
Fix Version/s: 2.1.0-beta

 Make Yarn Client service shutdown operations robust
 ---

 Key: YARN-125
 URL: https://issues.apache.org/jira/browse/YARN-125
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-4023.patch


 Make the yarn client services more robust against being shut down while not 
 started, or shutdown more than once, by null-checking fields before closing 
 them, setting to null afterwards to prevent double-invocation. This is a 
 subset of MAPREDUCE-3502

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-123) Make yarn Resource Manager services robust against shutdown

2013-06-13 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-123.
-

   Resolution: Fixed
Fix Version/s: 2.1.0-beta

 Make yarn Resource Manager services robust against shutdown
 ---

 Key: YARN-123
 URL: https://issues.apache.org/jira/browse/YARN-123
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-4021.patch


 Split MAPREDUCE-3502 patches to make the RM code more resilient to being 
 stopped more than once, or before started.
 This depends on MAPREDUCE-4014.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-124) Make Yarn Node Manager services robust against shutdown

2013-06-13 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-124.
-

   Resolution: Fixed
Fix Version/s: 2.1.0-beta

 Make Yarn Node Manager services robust against shutdown
 ---

 Key: YARN-124
 URL: https://issues.apache.org/jira/browse/YARN-124
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-4022.patch


 Add the nodemanager bits of MAPREDUCE-3502 to shut down the Nodemanager 
 services. This is done by checking for fields being non-null before shutting 
 down/closing etc, and setting the fields to null afterwards -to be resilient 
 against re-entrancy.
 No tests other than manual review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-789) Enable zero capabilities resource requests in fair scheduler

2013-06-13 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682424#comment-13682424
 ] 

Bikas Saha commented on YARN-789:
-

Do the resource calculator changes need to be considered also?

 Enable zero capabilities resource requests in fair scheduler
 

 Key: YARN-789
 URL: https://issues.apache.org/jira/browse/YARN-789
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-789.patch, YARN-789.patch


 Per discussion in YARN-689, reposting updated use case:
 1. I have a set of services co-existing with a Yarn cluster.
 2. These services run out of band from Yarn. They are not started as yarn 
 containers and they don't use Yarn containers for processing.
 3. These services use, dynamically, different amounts of CPU and memory based 
 on their load. They manage their CPU and memory requirements independently. 
 In other words, depending on their load, they may require more CPU but not 
 memory or vice-versa.
 By using YARN as RM for these services I'm able share and utilize the 
 resources of the cluster appropriately and in a dynamic way. Yarn keeps tab 
 of all the resources.
 These services run an AM that reserves resources on their behalf. When this 
 AM gets the requested resources, the services bump up their CPU/memory 
 utilization out of band from Yarn. If the Yarn allocations are 
 released/preempted, the services back off on their resources utilization. By 
 doing this, Yarn and these service correctly share the cluster resources, 
 being Yarn RM the only one that does the overall resource bookkeeping.
 The services AM, not to break the lifecycle of containers, start containers 
 in the corresponding NMs. These container processes do basically a sleep 
 forever (i.e. sleep 1d). They are almost not using any CPU nor memory 
 (less than 1MB). Thus it is reasonable to assume their required CPU and 
 memory utilization is NIL (more on hard enforcement later). Because of this 
 almost NIL utilization of CPU and memory, it is possible to specify, when 
 doing a request, zero as one of the dimensions (CPU or memory).
 The current limitation is that the increment is also the minimum. 
 If we set the memory increment to 1MB. When doing a pure CPU request, we 
 would have to specify 1MB of memory. That would work. However it would allow 
 discretionary memory requests without a desired normalization (increments of 
 256, 512, etc).
 If we set the CPU increment to 1CPU. When doing a pure memory request, we 
 would have to specify 1CPU. CPU amounts a much smaller than memory amounts, 
 and because we don't have fractional CPUs, it would mean that all my pure 
 memory requests will be wasting 1 CPU thus reducing the overall utilization 
 of the cluster.
 Finally, on hard enforcement. 
 * For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an 
 absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we 
 ensure there is enough CPU cycles to run the sleep process. This absolute 
 minimum would only kick-in if zero is allowed, otherwise will never kick in 
 as the shares for 1 CPU are 1024.
 * For Memory. Hard enforcement is currently done by the 
 ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would 
 take care of zero memory resources. And again,  this absolute minimum would 
 only kick-in if zero is allowed, otherwise will never kick in as the 
 increment memory is in several MBs if not 1GB.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-804) mark AbstractService init/start/stop methods as final

2013-06-13 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-804:
---

 Summary: mark AbstractService init/start/stop methods as final
 Key: YARN-804
 URL: https://issues.apache.org/jira/browse/YARN-804
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Steve Loughran


Now that YARN-117 and MAPREDUCE-5298 are checked in, we can mark the public 
AbstractService init/start/stop methods as final.

Why? It puts the lifecycle check and error handling around the subclass code, 
ensuring no lifecycle method gets called in the wrong state or gets called more 
than once.When a {{serviceInit(), serviceStart()   serviceStop()}} method 
throws an exception, it's caught and auto-triggers stop. 

Marking the methods as final forces service implementations to move to the 
stricter lifecycle. It has one side effect: some of the mocking tests play up 
-I'll need some assistance here

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682434#comment-13682434
 ] 

Hadoop QA commented on YARN-803:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587663/YARN-803.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1217//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1217//console

This message is automatically generated.

 factor out scheduler config validation from the ResourceManager to each 
 scheduler implementation
 

 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-803.patch


 Per discussion in YARN-789 we should factor out from the ResourceManager 
 class the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682444#comment-13682444
 ] 

Zhijie Shen commented on YARN-803:
--

It sounds a good idea to support polymorphy of config validation. The patch 
looks almost fine. Here's two minor suggestions:

1. As all schedulers have implemented setConf, how about defining it in 
YarnScheduler as well? Therefore, the newly added scheduler in the future will 
also be forced to implement the method to validate its config (probably it have 
to do so).

2. setConf sounds a bit confusing, because the method doesn't set the config, 
but validate it. How about renaming it as validateConf?

 factor out scheduler config validation from the ResourceManager to each 
 scheduler implementation
 

 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-803.patch


 Per discussion in YARN-789 we should factor out from the ResourceManager 
 class the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-805) Fix yarn-api javadoc annotations

2013-06-13 Thread Jian He (JIRA)
Jian He created YARN-805:


 Summary: Fix yarn-api javadoc annotations
 Key: YARN-805
 URL: https://issues.apache.org/jira/browse/YARN-805
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-746) rename Service.register() and Service.unregister() to registerServiceListener() unregisterServiceListener() respectively

2013-06-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682459#comment-13682459
 ] 

Vinod Kumar Vavilapalli commented on YARN-746:
--

Looks good. +1. Checking this in.

 rename Service.register() and Service.unregister() to 
 registerServiceListener()  unregisterServiceListener() respectively
 --

 Key: YARN-746
 URL: https://issues.apache.org/jira/browse/YARN-746
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-746-001.patch


 make it clear what you are registering on a {{Service}} by naming the methods 
 {{registerServiceListener()}}  {{unregisterServiceListener()}} respectively.
 This only affects a couple of production classes; {{Service.register()}} and 
 is used in some of the lifecycle tests of the YARN-530. There are no tests of 
 {{Service.unregister()}}, which is something that could be corrected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-692) Creating NMToken master key on RM and sharing it with NM as a part of RM-NM heartbeat.

2013-06-13 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682458#comment-13682458
 ] 

Omkar Vinit Joshi commented on YARN-692:


rebasing after YARN-117

 Creating NMToken master key on RM and sharing it with NM as a part of RM-NM 
 heartbeat.
 --

 Key: YARN-692
 URL: https://issues.apache.org/jira/browse/YARN-692
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-692.20130530.1.patch, YARN-692.20130530.2.patch, 
 YARN-692.20130531.1.patch, YARN-692.20130531.3.patch, 
 YARN-692.20130531.patch, YARN-692-20130611.patch


 This is related to YARN-613 . Here we will be implementing NMToken generation 
 on RM side and sharing it with NM during RM-NM heartbeat. As a part of this 
 JIRA mater key will only be made available to NM but there will be no 
 validation done until AM-NM communication is fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-804) mark AbstractService init/start/stop methods as final

2013-06-13 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-804:
-

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-117

 mark AbstractService init/start/stop methods as final
 -

 Key: YARN-804
 URL: https://issues.apache.org/jira/browse/YARN-804
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Steve Loughran

 Now that YARN-117 and MAPREDUCE-5298 are checked in, we can mark the public 
 AbstractService init/start/stop methods as final.
 Why? It puts the lifecycle check and error handling around the subclass code, 
 ensuring no lifecycle method gets called in the wrong state or gets called 
 more than once.When a {{serviceInit(), serviceStart()   serviceStop()}} 
 method throws an exception, it's caught and auto-triggers stop. 
 Marking the methods as final forces service implementations to move to the 
 stricter lifecycle. It has one side effect: some of the mocking tests play up 
 -I'll need some assistance here

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-746) rename Service.register() and Service.unregister() to registerServiceListener() unregisterServiceListener() respectively

2013-06-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682477#comment-13682477
 ] 

Hudson commented on YARN-746:
-

Integrated in Hadoop-trunk-Commit #3912 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3912/])
YARN-746. Renamed Service.register() and Service.unregister() to 
registerServiceListener()  unregisterServiceListener() respectively. 
Contributed by Steve Loughran. (Revision 1492780)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1492780
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/service/AbstractService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/service/FilterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/service/Service.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/service/TestServiceLifecycle.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java


 rename Service.register() and Service.unregister() to 
 registerServiceListener()  unregisterServiceListener() respectively
 --

 Key: YARN-746
 URL: https://issues.apache.org/jira/browse/YARN-746
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
 Fix For: 2.1.0-beta

 Attachments: YARN-746-001.patch


 make it clear what you are registering on a {{Service}} by naming the methods 
 {{registerServiceListener()}}  {{unregisterServiceListener()}} respectively.
 This only affects a couple of production classes; {{Service.register()}} and 
 is used in some of the lifecycle tests of the YARN-530. There are no tests of 
 {{Service.unregister()}}, which is something that could be corrected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-805) Fix yarn-api javadoc annotations

2013-06-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682478#comment-13682478
 ] 

Jian He commented on YARN-805:
--

Proposal to fix annotations for yarn api as:
1. for all protocols in package yarn.api , annotate all methods of those 
protocols with public/stable.
2. for all protocol-record-request, annotate all methods with public/stable.
   for all protocol-record-response, annotate the getters as public/stable, 
setters and factory method as private/unstable.
3. for all user-facing api records, annotate all methods of those records as 
public/stable;
   for all non-user-facing api records, getters as public/stable, setters and 
factory methods as private/unstable.


 Fix yarn-api javadoc annotations
 

 Key: YARN-805
 URL: https://issues.apache.org/jira/browse/YARN-805
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-804) mark AbstractService init/start/stop methods as final

2013-06-13 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-804:


Assignee: Vinod Kumar Vavilapalli

Let me take a quick stab at it to get it out of the way..

 mark AbstractService init/start/stop methods as final
 -

 Key: YARN-804
 URL: https://issues.apache.org/jira/browse/YARN-804
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Steve Loughran
Assignee: Vinod Kumar Vavilapalli

 Now that YARN-117 and MAPREDUCE-5298 are checked in, we can mark the public 
 AbstractService init/start/stop methods as final.
 Why? It puts the lifecycle check and error handling around the subclass code, 
 ensuring no lifecycle method gets called in the wrong state or gets called 
 more than once.When a {{serviceInit(), serviceStart()   serviceStop()}} 
 method throws an exception, it's caught and auto-triggers stop. 
 Marking the methods as final forces service implementations to move to the 
 stricter lifecycle. It has one side effect: some of the mocking tests play up 
 -I'll need some assistance here

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-805) Fix yarn-api javadoc annotations

2013-06-13 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-805:
-

Issue Type: Sub-task  (was: Bug)
Parent: YARN-386

 Fix yarn-api javadoc annotations
 

 Key: YARN-805
 URL: https://issues.apache.org/jira/browse/YARN-805
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682488#comment-13682488
 ] 

Siddharth Seth commented on YARN-802:
-

Can the MR AM specify the service to be used via configuration, and set the 
service data accordingly.

 APPLICATION_INIT is never sent to AuxServices other than the builtin 
 ShuffleHandler
 ---

 Key: YARN-802
 URL: https://issues.apache.org/jira/browse/YARN-802
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Avner BenHanoch

 APPLICATION_INIT is never sent to AuxServices other than the built-in 
 ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
 able to function, because APPLICATION_INIT enables the AuxiliaryService to 
 map jobId-userId. This is needed for properly finding the MOFs of a job per 
 reducers' requests.
 NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
 hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
 explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
 ...) and ignores any additional AuxiliaryService. As a result, only the 
 built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
 AuxillaryService will never get APPLICATION_INIT events.
 I think a solution can be in one of two ways:
 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
 each of them, by calling serviceData.put (…) in loop.
 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
 APPLICATION_STOP is never sent to AuxServices.  This means that in case the 
 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
 Services regardless of the value in event.getServiceID().
 I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
 needed patch for any option that people like.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-806) Move ContainerExitStatus from yarn.api to yarn.api.records

2013-06-13 Thread Jian He (JIRA)
Jian He created YARN-806:


 Summary: Move ContainerExitStatus from yarn.api to yarn.api.records
 Key: YARN-806
 URL: https://issues.apache.org/jira/browse/YARN-806
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-692) Creating NMToken master key on RM and sharing it with NM as a part of RM-NM heartbeat.

2013-06-13 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-692:
---

Attachment: YARN-692-20130613.patch

 Creating NMToken master key on RM and sharing it with NM as a part of RM-NM 
 heartbeat.
 --

 Key: YARN-692
 URL: https://issues.apache.org/jira/browse/YARN-692
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-692.20130530.1.patch, YARN-692.20130530.2.patch, 
 YARN-692.20130531.1.patch, YARN-692.20130531.3.patch, 
 YARN-692.20130531.patch, YARN-692-20130611.patch, YARN-692-20130613.patch


 This is related to YARN-613 . Here we will be implementing NMToken generation 
 on RM side and sharing it with NM during RM-NM heartbeat. As a part of this 
 JIRA mater key will only be made available to NM but there will be no 
 validation done until AM-NM communication is fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682495#comment-13682495
 ] 

Alejandro Abdelnur commented on YARN-803:
-

Zhijie,

On adding the setConf() method to the scheduler API, it is up to the scheduler 
impl to use or not the Yarn configuration. If a scheduler impl decides to do 
so, by implementing Configurable it will get it.

I've initially thought about adding a validateConf() method to the scheduler 
API. I've decided to go the Configurable route for the reason above.

Because of that, I think we should do things as in the current patch.

 factor out scheduler config validation from the ResourceManager to each 
 scheduler implementation
 

 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-803.patch


 Per discussion in YARN-789 we should factor out from the ResourceManager 
 class the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

2013-06-13 Thread Sandy Ryza (JIRA)
Sandy Ryza created YARN-807:
---

 Summary: When querying apps by queue, iterating over all apps is 
inefficient and limiting 
 Key: YARN-807
 URL: https://issues.apache.org/jira/browse/YARN-807
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza


The question which apps are in queue x can be asked via the RM REST APIs, 
through the ClientRMService, and through the command line.  In all these cases, 
the question is answered by scanning through every RMApp and filtering by the 
app's queue name.

All schedulers maintain a mapping of queues to applications.  I think it would 
make more sense to ask the schedulers which applications are in a given queue. 
This is what was done in MR1. This would also have the advantage of allowing a 
parent queue to return all the applications on leaf queues under it, and allow 
queue name aliases, as in the way that root.default and default refer to 
the same queue in the fair scheduler.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682503#comment-13682503
 ] 

Sandy Ryza commented on YARN-803:
-

A Configuration already gets passed in when a scheduler's reinitialize method 
is called.  Would it be possible to do it there?

 factor out scheduler config validation from the ResourceManager to each 
 scheduler implementation
 

 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-803.patch


 Per discussion in YARN-789 we should factor out from the ResourceManager 
 class the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-736) Add a multi-resource fair sharing metric

2013-06-13 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-736:


Attachment: YARN-736.patch

 Add a multi-resource fair sharing metric
 

 Key: YARN-736
 URL: https://issues.apache.org/jira/browse/YARN-736
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-736.patch


 Currently, at a regular interval, the fair scheduler computes a fair memory 
 share for each queue and application inside it.  This fair share is not used 
 for scheduling decisions, but is displayed in the web UI, exposed as a 
 metric, and used for preemption decisions.
 With DRF and multi-resource scheduling, assigning a memory share as the fair 
 share metric to every queue no longer makes sense.  It's not obvious what the 
 replacement should be, but probably something like fractional fairness within 
 a queue, or distance from an ideal cluster state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-13 Thread Avner BenHanoch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682539#comment-13682539
 ] 

Avner BenHanoch commented on YARN-802:
--

Hi Siddharth,

I am not sure I understand the question.  Do you suggest that we'll have a 
configuration with sub-list of the AuxServices and only members of the new list 
will get the APPLICATION_INIT event?
- This is possible, however, it will not match the current behavior with 
APPLICATION_STOP event, since the last event is being sent to ALL AuxServices.

Please elaborate.

 APPLICATION_INIT is never sent to AuxServices other than the builtin 
 ShuffleHandler
 ---

 Key: YARN-802
 URL: https://issues.apache.org/jira/browse/YARN-802
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Avner BenHanoch

 APPLICATION_INIT is never sent to AuxServices other than the built-in 
 ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
 able to function, because APPLICATION_INIT enables the AuxiliaryService to 
 map jobId-userId. This is needed for properly finding the MOFs of a job per 
 reducers' requests.
 NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
 hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
 explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
 ...) and ignores any additional AuxiliaryService. As a result, only the 
 built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
 AuxillaryService will never get APPLICATION_INIT events.
 I think a solution can be in one of two ways:
 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
 each of them, by calling serviceData.put (…) in loop.
 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
 APPLICATION_STOP is never sent to AuxServices.  This means that in case the 
 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
 Services regardless of the value in event.getServiceID().
 I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
 needed patch for any option that people like.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-692) Creating NMToken master key on RM and sharing it with NM as a part of RM-NM heartbeat.

2013-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682530#comment-13682530
 ] 

Hadoop QA commented on YARN-692:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12587673/YARN-692-20130613.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 32 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1219//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1219//console

This message is automatically generated.

 Creating NMToken master key on RM and sharing it with NM as a part of RM-NM 
 heartbeat.
 --

 Key: YARN-692
 URL: https://issues.apache.org/jira/browse/YARN-692
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-692.20130530.1.patch, YARN-692.20130530.2.patch, 
 YARN-692.20130531.1.patch, YARN-692.20130531.3.patch, 
 YARN-692.20130531.patch, YARN-692-20130611.patch, YARN-692-20130613.patch


 This is related to YARN-613 . Here we will be implementing NMToken generation 
 on RM side and sharing it with NM during RM-NM heartbeat. As a part of this 
 JIRA mater key will only be made available to NM but there will be no 
 validation done until AM-NM communication is fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-693) Sending NMToken to AM on allocate call

2013-06-13 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-693:
---

Attachment: YARN-693-20130613.patch

rebasing patch after YARN-117

 Sending NMToken to AM on allocate call
 --

 Key: YARN-693
 URL: https://issues.apache.org/jira/browse/YARN-693
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-693-20130610.patch, YARN-693-20130613.patch


 This is part of YARN-613.
 As per the updated design, AM will receive per NM, NMToken in following 
 scenarios
 * AM is receiving first container on underlying NM.
 * AM is receiving container on underlying NM after either NM or RM rebooted.
 ** After RM reboot, as RM doesn't remember (persist) the information about 
 keys issued per AM per NM, it will reissue tokens in case AM gets new 
 container on underlying NM. However on NM side NM will still retain older 
 token until it receives new token to support long running jobs (in work 
 preserving environment).
 ** After NM reboot, RM will delete the token information corresponding to 
 that AM for all AMs.
 * AM is receiving container on underlying NM after NMToken master key is 
 rolled over on RM side.
 In all the cases if AM receives new NMToken then it is suppose to store it 
 for future NM communication until it receives a new one.
 AMRMClient should expose these NMToken to client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-736) Add a multi-resource fair sharing metric

2013-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682552#comment-13682552
 ] 

Hadoop QA commented on YARN-736:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587678/YARN-736.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1220//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1220//console

This message is automatically generated.

 Add a multi-resource fair sharing metric
 

 Key: YARN-736
 URL: https://issues.apache.org/jira/browse/YARN-736
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-736.patch


 Currently, at a regular interval, the fair scheduler computes a fair memory 
 share for each queue and application inside it.  This fair share is not used 
 for scheduling decisions, but is displayed in the web UI, exposed as a 
 metric, and used for preemption decisions.
 With DRF and multi-resource scheduling, assigning a memory share as the fair 
 share metric to every queue no longer makes sense.  It's not obvious what the 
 replacement should be, but probably something like fractional fairness within 
 a queue, or distance from an ideal cluster state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682564#comment-13682564
 ] 

Alejandro Abdelnur commented on YARN-803:
-

mmh, so i guess we could do a mix of what [~zjshen] and [~sandyr] suggest, have 
a private validateConf() in the schedulers and call it from the reinitialize(). 
I'll upload a patch with that approach.

 factor out scheduler config validation from the ResourceManager to each 
 scheduler implementation
 

 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-803.patch


 Per discussion in YARN-789 we should factor out from the ResourceManager 
 class the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-773) Move YarnRuntimeException from package api.yarn to api.yarn.exceptions

2013-06-13 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-773:
-

Attachment: YARN-773.1.patch

rebased after YARN-117

 Move YarnRuntimeException from package api.yarn to api.yarn.exceptions
 --

 Key: YARN-773
 URL: https://issues.apache.org/jira/browse/YARN-773
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-773.1.patch, YARN-773.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2013-06-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682565#comment-13682565
 ] 

Arun C Murthy commented on YARN-796:


bq. One thing I forgot to mention before is that labels seem to make sense if 
the resource-requests location is ANY or for a rack, for resource-requests that 
are host specific it does not make sense.

Agreed, makes sense. We should probably throw a InvalidResourceRequestException 
if a user tries to make a host-specific RR with a label.



 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-06-13 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682571#comment-13682571
 ] 

Chris Douglas commented on YARN-569:


Thanks for the feedback; we revised the patch. We comment below on questions 
that required explanation, while all the small ones are addressed directly in 
the code following your suggestions.

bq. This doesnt seem to affect the fair scheduler or does it? If not, then it 
can be misleading for users.
bq. How do we envisage multiple policies working together without stepping on 
each other? Better off limiting to 1?

The intent was for orthogonal policies to interact with the scheduler, or- if 
conflicting- be coordinated by a composite policy. Though you're right, the 
naming toward preemption is confusing; the patch renames the properties to 
refer to monitors, only. Because the only example is the 
{{ProportionalCapacityPreemptionPolicy}}, {{null}} seemed like the correct 
default. As for limiting to 1 monitor or not, we are experiencing with other 
policies that focus on different aspect of the schedule (e.g., deadlines and 
automatic tuning of queue capacity) and it seems possible to play nice with 
other policies (e.g., ProportionalCapacityPreemptionPolicy), so we would prefer 
to have the mechanism to remain capable of loading multiple monitors.

bq. Not joining the thread to make sure its cleaned up?

The contract for shutting down a monitor is not baked into the API, yet. While 
the proportional policy runs quickly, it's not obvious whether other policies 
would be both long running and respond to interrupts. By way of illustration, 
other monitors we've experimented with call into third party code for 
CPU-intensive calculation. Since YARN-117 went in a few hours ago, that might 
be a chance to define that more crisply. Thoughts?

bq. Why no lock here when the other new methods have a lock? Do we not care 
that the app remains in applications during the duration of the operations?

The semantics of the {{\@Lock}} annotation were not entirely clear from the 
examples in the code, so it's possible the inconsistency is our application of 
it. We're probably making the situation worse, so we omitted the annotations in 
the updated patch. To answer your question: we don't care, because the selected 
container already exited (part of the natural termination factor in the policy).

bq. There is one critical difference between old and new behavior. The new code 
will not send the finish event to the container if its not part of the 
liveContainers. This probably is wrong.
bq. FicaSchedulerNode.unreserveResource(). Checks have been added for the 
reserved container but will the code reach that point if there was no 
reservation actually left on that node? In the same vein, can it happen that 
the node has a new reservation that was made out of band of the preemption 
logic cycle. Hence, the reserved container on the node would exist but could be 
from a different application. 

Good catch, these are related. The change to boolean was necessary because 
we're calling the {{unreserve}} logic from a new context. Since only one 
application can have a single reservation on a node, and because we're freeing 
it through that application, we won't accidentally free another application's 
reservation. However, calling {{unreserve}} on a reservation that converted to 
a container will fail, so we need to know whether the state changed before 
updating the metric.

bq. Couldnt quite grok this. What is delta? What is 0.5? A percentage? Whats 
the math behind the calculation? Should it be even absent preemption instead 
of even absent natural termination? Is this applied before or after 
TOTAL_PREEMPTION_PER_ROUND?

The delta is the difference between the computed ideal capacity and the actual. 
A value of 0.5 would preempt only 50% of the containers the policy thinks 
should be preempted, as the rest are expected to exit naturally. The comment 
is saying that- even without any containers exiting on their own- the policy 
will geometrically push capacity into the deadzone. In this case, 50% per 
round, in 5 rounds the policy will be within a 5% deadzone of the ideal 
capacity. It's applied before the total preemption per round; the latter 
proportionally affects all preemption targets.

Because some containers will complete while the policy runs, it may make sense 
to tune it aggressively (or affect it with observed completion rates), but 
we'll want to get some experience running with this.


 CapacityScheduler: support for preemption (using a capacity monitor)
 

 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: 

[jira] [Created] (YARN-808) ApplicationReport does not clearly tell that the attempt is running or not

2013-06-13 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-808:
---

 Summary: ApplicationReport does not clearly tell that the attempt 
is running or not
 Key: YARN-808
 URL: https://issues.apache.org/jira/browse/YARN-808
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha


When an app attempt fails and is being retried, ApplicationReport immediately 
gives the new attemptId and non-null values of host etc. There is no way for 
clients to know that the attempt is running other than connecting to it and 
timing out on invalid host. Solution would be to expose the attempt state or 
return a null value for host instead of N/A

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Moved] (YARN-809) Enable better parallelism in the Fair Scheduler

2013-06-13 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza moved MAPREDUCE-5321 to YARN-809:


Key: YARN-809  (was: MAPREDUCE-5321)
Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Enable better parallelism in the Fair Scheduler
 ---

 Key: YARN-809
 URL: https://issues.apache.org/jira/browse/YARN-809
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Currently, the Fair Scheduler is locked on pretty much every operation, node 
 updates, application additions and removals, every time the update thread 
 runs, and every time the RM queries it for information.  Most of this locking 
 is unnecessary, especially as only the core scheduling operations like 
 application additions, removals, and node updates need a consistent view of 
 scheduler state.
 We can probably increase parallelism by using concurrent data structures when 
 applicable, as well as keeping a slightly stale view to serve via the RM 
 APIs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-803:


Attachment: YARN-803.patch

patch integrating feedback from Zhijie and Sandy as per my previous comment.

 factor out scheduler config validation from the ResourceManager to each 
 scheduler implementation
 

 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-803.patch, YARN-803.patch


 Per discussion in YARN-789 we should factor out from the ResourceManager 
 class the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-809) Enable better parallelism in the Fair Scheduler

2013-06-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682595#comment-13682595
 ] 

Alejandro Abdelnur commented on YARN-809:
-

One thing to keep in mind is that if using Concurrent data strucs, if doing 
multiple operations on them you are getting/release locks on them. depending on 
the operations those are read/write locks. It may be worth considering to use 
non-thread-safe data structures and wrap logical operations in a single 
read/write lock block.

 Enable better parallelism in the Fair Scheduler
 ---

 Key: YARN-809
 URL: https://issues.apache.org/jira/browse/YARN-809
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 Currently, the Fair Scheduler is locked on pretty much every operation, node 
 updates, application additions and removals, every time the update thread 
 runs, and every time the RM queries it for information.  Most of this locking 
 is unnecessary, especially as only the core scheduling operations like 
 application additions, removals, and node updates need a consistent view of 
 scheduler state.
 We can probably increase parallelism by using concurrent data structures when 
 applicable, as well as keeping a slightly stale view to serve via the RM 
 APIs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-804) mark AbstractService init/start/stop methods as final

2013-06-13 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-804:


Attachment: YARN-804-001.patch

Marking the methods as final, and fixing a (recent) test that needs to be moved 
to the serviceXXX methods.

This patch breaks mock tests in YARN and mapreduce

 mark AbstractService init/start/stop methods as final
 -

 Key: YARN-804
 URL: https://issues.apache.org/jira/browse/YARN-804
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Steve Loughran
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-804-001.patch


 Now that YARN-117 and MAPREDUCE-5298 are checked in, we can mark the public 
 AbstractService init/start/stop methods as final.
 Why? It puts the lifecycle check and error handling around the subclass code, 
 ensuring no lifecycle method gets called in the wrong state or gets called 
 more than once.When a {{serviceInit(), serviceStart()   serviceStop()}} 
 method throws an exception, it's caught and auto-triggers stop. 
 Marking the methods as final forces service implementations to move to the 
 stricter lifecycle. It has one side effect: some of the mocking tests play up 
 -I'll need some assistance here

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682619#comment-13682619
 ] 

Sandy Ryza commented on YARN-803:
-

Would it make sense to put validateConf in SchedulerUtils, where we also have 
normalizeRequests?  Otherwise, LGTM.

 factor out scheduler config validation from the ResourceManager to each 
 scheduler implementation
 

 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-803.patch, YARN-803.patch


 Per discussion in YARN-789 we should factor out from the ResourceManager 
 class the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682636#comment-13682636
 ] 

Sandy Ryza commented on YARN-803:
-

+1

 factor out scheduler config validation from the ResourceManager to each 
 scheduler implementation
 

 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-803.patch, YARN-803.patch


 Per discussion in YARN-789 we should factor out from the ResourceManager 
 class the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682634#comment-13682634
 ] 

Alejandro Abdelnur commented on YARN-803:
-

Sandhy, SchedulerUtils is for all schedulers, the validation is specific to a 
scheduler impl.

 factor out scheduler config validation from the ResourceManager to each 
 scheduler implementation
 

 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-803.patch, YARN-803.patch


 Per discussion in YARN-789 we should factor out from the ResourceManager 
 class the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682635#comment-13682635
 ] 

Sandy Ryza commented on YARN-803:
-

Ok, makes sense.

 factor out scheduler config validation from the ResourceManager to each 
 scheduler implementation
 

 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-803.patch, YARN-803.patch


 Per discussion in YARN-789 we should factor out from the ResourceManager 
 class the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-789) Enable zero capabilities resource requests in fair scheduler

2013-06-13 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-789:


Attachment: YARN-789.patch

[~bikassaha], FS uses ResourceCalculator methods to normalize resources, I've 
just added methods signatures that allow different values for minimum and 
increment. I've left unchanged the signatures of existing methods, so no code 
changes trickle to other schedulers. Otherwise we'll have to reimplement all 
the normalization logic in the FS.

In this patch I'm undoing the introduction of the normalizeInt() of the 
previous patch. The reason I've added was because it seemed to me it was easier 
to follow than a min(round(max)) but it is one IF more expensive.

So the only difference are the new signature methods.

 Enable zero capabilities resource requests in fair scheduler
 

 Key: YARN-789
 URL: https://issues.apache.org/jira/browse/YARN-789
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-789.patch, YARN-789.patch, YARN-789.patch


 Per discussion in YARN-689, reposting updated use case:
 1. I have a set of services co-existing with a Yarn cluster.
 2. These services run out of band from Yarn. They are not started as yarn 
 containers and they don't use Yarn containers for processing.
 3. These services use, dynamically, different amounts of CPU and memory based 
 on their load. They manage their CPU and memory requirements independently. 
 In other words, depending on their load, they may require more CPU but not 
 memory or vice-versa.
 By using YARN as RM for these services I'm able share and utilize the 
 resources of the cluster appropriately and in a dynamic way. Yarn keeps tab 
 of all the resources.
 These services run an AM that reserves resources on their behalf. When this 
 AM gets the requested resources, the services bump up their CPU/memory 
 utilization out of band from Yarn. If the Yarn allocations are 
 released/preempted, the services back off on their resources utilization. By 
 doing this, Yarn and these service correctly share the cluster resources, 
 being Yarn RM the only one that does the overall resource bookkeeping.
 The services AM, not to break the lifecycle of containers, start containers 
 in the corresponding NMs. These container processes do basically a sleep 
 forever (i.e. sleep 1d). They are almost not using any CPU nor memory 
 (less than 1MB). Thus it is reasonable to assume their required CPU and 
 memory utilization is NIL (more on hard enforcement later). Because of this 
 almost NIL utilization of CPU and memory, it is possible to specify, when 
 doing a request, zero as one of the dimensions (CPU or memory).
 The current limitation is that the increment is also the minimum. 
 If we set the memory increment to 1MB. When doing a pure CPU request, we 
 would have to specify 1MB of memory. That would work. However it would allow 
 discretionary memory requests without a desired normalization (increments of 
 256, 512, etc).
 If we set the CPU increment to 1CPU. When doing a pure memory request, we 
 would have to specify 1CPU. CPU amounts a much smaller than memory amounts, 
 and because we don't have fractional CPUs, it would mean that all my pure 
 memory requests will be wasting 1 CPU thus reducing the overall utilization 
 of the cluster.
 Finally, on hard enforcement. 
 * For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an 
 absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we 
 ensure there is enough CPU cycles to run the sleep process. This absolute 
 minimum would only kick-in if zero is allowed, otherwise will never kick in 
 as the shares for 1 CPU are 1024.
 * For Memory. Hard enforcement is currently done by the 
 ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would 
 take care of zero memory resources. And again,  this absolute minimum would 
 only kick-in if zero is allowed, otherwise will never kick in as the 
 increment memory is in several MBs if not 1GB.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682652#comment-13682652
 ] 

Zhijie Shen commented on YARN-803:
--

looks good. +1

 factor out scheduler config validation from the ResourceManager to each 
 scheduler implementation
 

 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-803.patch, YARN-803.patch


 Per discussion in YARN-789 we should factor out from the ResourceManager 
 class the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-789) Enable zero capabilities resource requests in fair scheduler

2013-06-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682656#comment-13682656
 ] 

Alejandro Abdelnur commented on YARN-789:
-

btw, the current patch requires YARN-803.

 Enable zero capabilities resource requests in fair scheduler
 

 Key: YARN-789
 URL: https://issues.apache.org/jira/browse/YARN-789
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-789.patch, YARN-789.patch, YARN-789.patch


 Per discussion in YARN-689, reposting updated use case:
 1. I have a set of services co-existing with a Yarn cluster.
 2. These services run out of band from Yarn. They are not started as yarn 
 containers and they don't use Yarn containers for processing.
 3. These services use, dynamically, different amounts of CPU and memory based 
 on their load. They manage their CPU and memory requirements independently. 
 In other words, depending on their load, they may require more CPU but not 
 memory or vice-versa.
 By using YARN as RM for these services I'm able share and utilize the 
 resources of the cluster appropriately and in a dynamic way. Yarn keeps tab 
 of all the resources.
 These services run an AM that reserves resources on their behalf. When this 
 AM gets the requested resources, the services bump up their CPU/memory 
 utilization out of band from Yarn. If the Yarn allocations are 
 released/preempted, the services back off on their resources utilization. By 
 doing this, Yarn and these service correctly share the cluster resources, 
 being Yarn RM the only one that does the overall resource bookkeeping.
 The services AM, not to break the lifecycle of containers, start containers 
 in the corresponding NMs. These container processes do basically a sleep 
 forever (i.e. sleep 1d). They are almost not using any CPU nor memory 
 (less than 1MB). Thus it is reasonable to assume their required CPU and 
 memory utilization is NIL (more on hard enforcement later). Because of this 
 almost NIL utilization of CPU and memory, it is possible to specify, when 
 doing a request, zero as one of the dimensions (CPU or memory).
 The current limitation is that the increment is also the minimum. 
 If we set the memory increment to 1MB. When doing a pure CPU request, we 
 would have to specify 1MB of memory. That would work. However it would allow 
 discretionary memory requests without a desired normalization (increments of 
 256, 512, etc).
 If we set the CPU increment to 1CPU. When doing a pure memory request, we 
 would have to specify 1CPU. CPU amounts a much smaller than memory amounts, 
 and because we don't have fractional CPUs, it would mean that all my pure 
 memory requests will be wasting 1 CPU thus reducing the overall utilization 
 of the cluster.
 Finally, on hard enforcement. 
 * For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an 
 absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we 
 ensure there is enough CPU cycles to run the sleep process. This absolute 
 minimum would only kick-in if zero is allowed, otherwise will never kick in 
 as the shares for 1 CPU are 1024.
 * For Memory. Hard enforcement is currently done by the 
 ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would 
 take care of zero memory resources. And again,  this absolute minimum would 
 only kick-in if zero is allowed, otherwise will never kick in as the 
 increment memory is in several MBs if not 1GB.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-810) Support CGroup ceiling enforcement on CPU

2013-06-13 Thread Chris Riccomini (JIRA)
Chris Riccomini created YARN-810:


 Summary: Support CGroup ceiling enforcement on CPU
 Key: YARN-810
 URL: https://issues.apache.org/jira/browse/YARN-810
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 2.0.5-alpha
Reporter: Chris Riccomini


Problem statement:

YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
Containers are then allowed to request vcores between the minimum and maximum 
defined in the yarn-site.xml.

In the case where a single-threaded container requests 1 vcore, with a 
pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
the core it's using, provided that no other container is also using it. This 
happens, even though the only guarantee that YARN/CGroups is making is that the 
container will get at least 1/4th of the core.

If a second container then comes along, the second container can take resources 
from the first, provided that the first container is still getting at least its 
fair share (1/4th).

There are certain cases where this is desirable. There are also certain cases 
where it might be desirable to have a hard limit on CPU usage, and not allow 
the process to go above the specified resource requirement, even if it's 
available.

Here's an RFC that describes the problem in more detail:

http://lwn.net/Articles/336127/

Solution:

As it happens, when CFS is used in combination with CGroups, you can enforce a 
ceiling using two files in cgroups:

{noformat}
cpu.cfs_quota_us
cpu.cfs_period_us
{noformat}

The usage of these two files is documented in more detail here:

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html

Testing:

I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, it 
behaves as described above (it is a soft cap, and allows containers to use more 
than they asked for). I then tested CFS CPU quotas manually with YARN.

First, you can see that CFS is in use in the CGroup, based on the file names:

{noformat}
[criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
total 0
-r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
-r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
-rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
-rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
[criccomi@eat1-qa464 ~]$ sudo -u app cat
/cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
10
[criccomi@eat1-qa464 ~]$ sudo -u app cat
/cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
-1
{noformat}

Oddly, it appears that the cfs_period_us is set to .1s, not 1s.

We can place processes in hard limits. I have process 4370
running YARN container container_1371141151815_0003_01_03 on a host. By
default, it's running at ~300% cpu usage.

{noformat}
CPU
4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
{noformat}

When I set the CFS quote:

{noformat}
echo 1000  
/cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
 CPU
4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
{noformat}

It drops to 1% usage, and you can see the box has room to spare:

{noformat}
Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si,
0.0%st
{noformat}

Turning the quota back to -1:

{noformat}
echo -1 
/cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
{noformat}

Burns the cores again:

{noformat}
Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 
0.0%st
CPU
4370 criccomi  20   0 1157m 563m  14m S 253.9  0.8  89:32.31 ...
{noformat}

On my dev box, I was testing CGroups by running a python process eight
times, to burn through all the cores, since it was doing as described above
(giving extra CPU to the process, even with a cpu.shares limit). Toggling
the cfs_quota_us seems to enforce a hard limit.

Implementation:

What do you guys think about introducing a variable to YarnConfiguration:

bq. yarn.nodemanager.linux-container.executor.cgroups.cpu-ceiling-enforcement

The default would be false. Setting to true, would cause YARN's LCE to set:

{noformat}
cpu.cfs_quota_us=(container-request-vcores/nm-vcore-to-pcore-ratio) * 100
cpu.cfs_period_us=100
{noformat}

For example, if a container asks for 2 vcores, and the vcore:pcore ratio 

[jira] [Assigned] (YARN-379) yarn [node,application] command print logger info messages

2013-06-13 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash reassigned YARN-379:
-

Assignee: Ravi Prakash  (was: Abhishek Kapoor)

 yarn [node,application] command print logger info messages
 --

 Key: YARN-379
 URL: https://issues.apache.org/jira/browse/YARN-379
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
Assignee: Ravi Prakash
  Labels: usability
 Attachments: YARN-379.patch, YARN-379.patch


 Running the yarn node and yarn applications command results in annoying log 
 info messages being printed:
 $ yarn node -list
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Total Nodes:1
  Node-IdNode-State  Node-Http-Address   
 Health-Status(isNodeHealthy)Running-Containers
 foo:8041RUNNING  foo:8042   true  
  0
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.
 $ yarn application
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Invalid Command Usage : 
 usage: application
  -kill arg Kills the application.
  -list   Lists all the Applications from RM.
  -status arg   Prints the status of the application.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-379) yarn [node,application] command print logger info messages

2013-06-13 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682666#comment-13682666
 ] 

Ravi Prakash commented on YARN-379:
---

Hi Abhishek! Thanks for the contribution. I can help drive this fix in from 
here. If you feel you can, please feel free to take it back.

Assigning to myself.

 yarn [node,application] command print logger info messages
 --

 Key: YARN-379
 URL: https://issues.apache.org/jira/browse/YARN-379
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
Assignee: Abhishek Kapoor
  Labels: usability
 Attachments: YARN-379.patch, YARN-379.patch


 Running the yarn node and yarn applications command results in annoying log 
 info messages being printed:
 $ yarn node -list
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Total Nodes:1
  Node-IdNode-State  Node-Http-Address   
 Health-Status(isNodeHealthy)Running-Containers
 foo:8041RUNNING  foo:8042   true  
  0
 13/02/06 02:36:50 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.
 $ yarn application
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Invalid Command Usage : 
 usage: application
  -kill arg Kills the application.
  -list   Lists all the Applications from RM.
  -status arg   Prints the status of the application.
 13/02/06 02:38:47 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-773) Move YarnRuntimeException from package api.yarn to api.yarn.exceptions

2013-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682663#comment-13682663
 ] 

Hadoop QA commented on YARN-773:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587687/YARN-773.1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1223//console

This message is automatically generated.

 Move YarnRuntimeException from package api.yarn to api.yarn.exceptions
 --

 Key: YARN-773
 URL: https://issues.apache.org/jira/browse/YARN-773
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-773.1.patch, YARN-773.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-810) Support CGroup ceiling enforcement on CPU

2013-06-13 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated YARN-810:
-

Description: 
Problem statement:

YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
Containers are then allowed to request vcores between the minimum and maximum 
defined in the yarn-site.xml.

In the case where a single-threaded container requests 1 vcore, with a 
pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
the core it's using, provided that no other container is also using it. This 
happens, even though the only guarantee that YARN/CGroups is making is that the 
container will get at least 1/4th of the core.

If a second container then comes along, the second container can take resources 
from the first, provided that the first container is still getting at least its 
fair share (1/4th).

There are certain cases where this is desirable. There are also certain cases 
where it might be desirable to have a hard limit on CPU usage, and not allow 
the process to go above the specified resource requirement, even if it's 
available.

Here's an RFC that describes the problem in more detail:

http://lwn.net/Articles/336127/

Solution:

As it happens, when CFS is used in combination with CGroups, you can enforce a 
ceiling using two files in cgroups:

{noformat}
cpu.cfs_quota_us
cpu.cfs_period_us
{noformat}

The usage of these two files is documented in more detail here:

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html

Testing:

I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, it 
behaves as described above (it is a soft cap, and allows containers to use more 
than they asked for). I then tested CFS CPU quotas manually with YARN.

First, you can see that CFS is in use in the CGroup, based on the file names:

{noformat}
[criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
total 0
-r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
-r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
-rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
-rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
[criccomi@eat1-qa464 ~]$ sudo -u app cat
/cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
10
[criccomi@eat1-qa464 ~]$ sudo -u app cat
/cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
-1
{noformat}

Oddly, it appears that the cfs_period_us is set to .1s, not 1s.

We can place processes in hard limits. I have process 4370
running YARN container container_1371141151815_0003_01_03 on a host. By
default, it's running at ~300% cpu usage.

{noformat}
CPU
4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
{noformat}

When I set the CFS quote:

{noformat}
echo 1000  
/cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
 CPU
4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
{noformat}

It drops to 1% usage, and you can see the box has room to spare:

{noformat}
Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si,
0.0%st
{noformat}

Turning the quota back to -1:

{noformat}
echo -1  
/cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
{noformat}

Burns the cores again:

{noformat}
Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 
0.0%st
CPU
4370 criccomi  20   0 1157m 563m  14m S 253.9  0.8  89:32.31 ...
{noformat}

On my dev box, I was testing CGroups by running a python process eight
times, to burn through all the cores, since it was doing as described above
(giving extra CPU to the process, even with a cpu.shares limit). Toggling
the cfs_quota_us seems to enforce a hard limit.

Implementation:

What do you guys think about introducing a variable to YarnConfiguration:

bq. yarn.nodemanager.linux-container.executor.cgroups.cpu-ceiling-enforcement

The default would be false. Setting to true, would cause YARN's LCE to set:

{noformat}
cpu.cfs_quota_us=(container-request-vcores/nm-vcore-to-pcore-ratio) * 100
cpu.cfs_period_us=100
{noformat}

For example, if a container asks for 2 vcores, and the vcore:pcore ratio is 4, 
you'd get:

{noformat}
cpu.cfs_quota_us=(2/4) * 100 = 50
cpu.cfs_period_us=100
{noformat}

This would cause CFS to cap the process at 50% of clock cycles.

What do you guys think?

1. 

[jira] [Updated] (YARN-810) Support CGroup ceiling enforcement on CPU

2013-06-13 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated YARN-810:
-

Description: 
Problem statement:

YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
Containers are then allowed to request vcores between the minimum and maximum 
defined in the yarn-site.xml.

In the case where a single-threaded container requests 1 vcore, with a 
pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
the core it's using, provided that no other container is also using it. This 
happens, even though the only guarantee that YARN/CGroups is making is that the 
container will get at least 1/4th of the core.

If a second container then comes along, the second container can take resources 
from the first, provided that the first container is still getting at least its 
fair share (1/4th).

There are certain cases where this is desirable. There are also certain cases 
where it might be desirable to have a hard limit on CPU usage, and not allow 
the process to go above the specified resource requirement, even if it's 
available.

Here's an RFC that describes the problem in more detail:

http://lwn.net/Articles/336127/

Solution:

As it happens, when CFS is used in combination with CGroups, you can enforce a 
ceiling using two files in cgroups:

{noformat}
cpu.cfs_quota_us
cpu.cfs_period_us
{noformat}

The usage of these two files is documented in more detail here:

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html

Testing:

I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, it 
behaves as described above (it is a soft cap, and allows containers to use more 
than they asked for). I then tested CFS CPU quotas manually with YARN.

First, you can see that CFS is in use in the CGroup, based on the file names:

{noformat}
[criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
total 0
-r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
-r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
-rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
-rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
[criccomi@eat1-qa464 ~]$ sudo -u app cat
/cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
10
[criccomi@eat1-qa464 ~]$ sudo -u app cat
/cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
-1
{noformat}

Oddly, it appears that the cfs_period_us is set to .1s, not 1s.

We can place processes in hard limits. I have process 4370
running YARN container container_1371141151815_0003_01_03 on a host. By
default, it's running at ~300% cpu usage.

{noformat}
CPU
4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
{noformat}

When I set the CFS quote:

{noformat}
echo 1000  
/cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
 CPU
4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
{noformat}

It drops to 1% usage, and you can see the box has room to spare:

{noformat}
Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si,
0.0%st
{noformat}

Turning the quota back to -1:

{noformat}
echo -1  
/cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
{noformat}

Burns the cores again:

{noformat}
Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 
0.0%st
CPU
4370 criccomi  20   0 1157m 563m  14m S 253.9  0.8  89:32.31 ...
{noformat}

On my dev box, I was testing CGroups by running a python process eight times, 
to burn through all the cores, since it was doing as described above (giving 
extra CPU to the process, even with a cpu.shares limit). Toggling the 
cfs_quota_us seems to enforce a hard limit.

Implementation:

What do you guys think about introducing a variable to YarnConfiguration:

bq. yarn.nodemanager.linux-container.executor.cgroups.cpu-ceiling-enforcement

The default would be false. Setting to true, would cause YARN's LCE to set:

{noformat}
cpu.cfs_quota_us=(container-request-vcores/nm-vcore-to-pcore-ratio) * 100
cpu.cfs_period_us=100
{noformat}

For example, if a container asks for 2 vcores, and the vcore:pcore ratio is 4, 
you'd get:

{noformat}
cpu.cfs_quota_us=(2/4) * 100 = 50
cpu.cfs_period_us=100
{noformat}

This would cause CFS to cap the process at 50% of clock cycles.

What do you guys think?


[jira] [Updated] (YARN-810) Support CGroup ceiling enforcement on CPU

2013-06-13 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated YARN-810:
-

Description: 
Problem statement:

YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
Containers are then allowed to request vcores between the minimum and maximum 
defined in the yarn-site.xml.

In the case where a single-threaded container requests 1 vcore, with a 
pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
the core it's using, provided that no other container is also using it. This 
happens, even though the only guarantee that YARN/CGroups is making is that the 
container will get at least 1/4th of the core.

If a second container then comes along, the second container can take resources 
from the first, provided that the first container is still getting at least its 
fair share (1/4th).

There are certain cases where this is desirable. There are also certain cases 
where it might be desirable to have a hard limit on CPU usage, and not allow 
the process to go above the specified resource requirement, even if it's 
available.

Here's an RFC that describes the problem in more detail:

http://lwn.net/Articles/336127/

Solution:

As it happens, when CFS is used in combination with CGroups, you can enforce a 
ceiling using two files in cgroups:

{noformat}
cpu.cfs_quota_us
cpu.cfs_period_us
{noformat}

The usage of these two files is documented in more detail here:

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html

Testing:

I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, it 
behaves as described above (it is a soft cap, and allows containers to use more 
than they asked for). I then tested CFS CPU quotas manually with YARN.

First, you can see that CFS is in use in the CGroup, based on the file names:

{noformat}
[criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
total 0
-r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
-r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
-rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
-rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
[criccomi@eat1-qa464 ~]$ sudo -u app cat
/cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
10
[criccomi@eat1-qa464 ~]$ sudo -u app cat
/cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
-1
{noformat}

Oddly, it appears that the cfs_period_us is set to .1s, not 1s.

We can place processes in hard limits. I have process 4370
running YARN container container_1371141151815_0003_01_03 on a host. By
default, it's running at ~300% cpu usage.

{noformat}
CPU
4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
{noformat}

When I set the CFS quote:

{noformat}
echo 1000  
/cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
 CPU
4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
{noformat}

It drops to 1% usage, and you can see the box has room to spare:

{noformat}
Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si, 
0.0%st
{noformat}

Turning the quota back to -1:

{noformat}
echo -1  
/cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
{noformat}

Burns the cores again:

{noformat}
Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 
0.0%st
CPU
4370 criccomi  20   0 1157m 563m  14m S 253.9  0.8  89:32.31 ...
{noformat}

On my dev box, I was testing CGroups by running a python process eight times, 
to burn through all the cores, since it was doing as described above (giving 
extra CPU to the process, even with a cpu.shares limit). Toggling the 
cfs_quota_us seems to enforce a hard limit.

Implementation:

What do you guys think about introducing a variable to YarnConfiguration:

bq. yarn.nodemanager.linux-container.executor.cgroups.cpu-ceiling-enforcement

The default would be false. Setting to true, would cause YARN's LCE to set:

{noformat}
cpu.cfs_quota_us=(container-request-vcores/nm-vcore-to-pcore-ratio) * 100
cpu.cfs_period_us=100
{noformat}

For example, if a container asks for 2 vcores, and the vcore:pcore ratio is 4, 
you'd get:

{noformat}
cpu.cfs_quota_us=(2/4) * 100 = 50
cpu.cfs_period_us=100
{noformat}

This would cause CFS to cap the process at 50% of clock cycles.

What do you guys think?


[jira] [Updated] (YARN-810) Support CGroup ceiling enforcement on CPU

2013-06-13 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated YARN-810:
-

Description: 
Problem statement:

YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
Containers are then allowed to request vcores between the minimum and maximum 
defined in the yarn-site.xml.

In the case where a single-threaded container requests 1 vcore, with a 
pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
the core it's using, provided that no other container is also using it. This 
happens, even though the only guarantee that YARN/CGroups is making is that the 
container will get at least 1/4th of the core.

If a second container then comes along, the second container can take resources 
from the first, provided that the first container is still getting at least its 
fair share (1/4th).

There are certain cases where this is desirable. There are also certain cases 
where it might be desirable to have a hard limit on CPU usage, and not allow 
the process to go above the specified resource requirement, even if it's 
available.

Here's an RFC that describes the problem in more detail:

http://lwn.net/Articles/336127/

Solution:

As it happens, when CFS is used in combination with CGroups, you can enforce a 
ceiling using two files in cgroups:

{noformat}
cpu.cfs_quota_us
cpu.cfs_period_us
{noformat}

The usage of these two files is documented in more detail here:

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html

Testing:

I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, it 
behaves as described above (it is a soft cap, and allows containers to use more 
than they asked for). I then tested CFS CPU quotas manually with YARN.

First, you can see that CFS is in use in the CGroup, based on the file names:

{noformat}
[criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
total 0
-r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
-rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
-r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
-rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
-rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
[criccomi@eat1-qa464 ~]$ sudo -u app cat
/cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
10
[criccomi@eat1-qa464 ~]$ sudo -u app cat
/cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
-1
{noformat}

Oddly, it appears that the cfs_period_us is set to .1s, not 1s.

We can place processes in hard limits. I have process 4370 running YARN 
container container_1371141151815_0003_01_03 on a host. By default, it's 
running at ~300% cpu usage.

{noformat}
CPU
4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
{noformat}

When I set the CFS quote:

{noformat}
echo 1000  
/cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
 CPU
4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
{noformat}

It drops to 1% usage, and you can see the box has room to spare:

{noformat}
Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si, 
0.0%st
{noformat}

Turning the quota back to -1:

{noformat}
echo -1  
/cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
{noformat}

Burns the cores again:

{noformat}
Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 
0.0%st
CPU
4370 criccomi  20   0 1157m 563m  14m S 253.9  0.8  89:32.31 ...
{noformat}

On my dev box, I was testing CGroups by running a python process eight times, 
to burn through all the cores, since it was doing as described above (giving 
extra CPU to the process, even with a cpu.shares limit). Toggling the 
cfs_quota_us seems to enforce a hard limit.

Implementation:

What do you guys think about introducing a variable to YarnConfiguration:

bq. yarn.nodemanager.linux-container.executor.cgroups.cpu-ceiling-enforcement

The default would be false. Setting to true, would cause YARN's LCE to set:

{noformat}
cpu.cfs_quota_us=(container-request-vcores/nm-vcore-to-pcore-ratio) * 100
cpu.cfs_period_us=100
{noformat}

For example, if a container asks for 2 vcores, and the vcore:pcore ratio is 4, 
you'd get:

{noformat}
cpu.cfs_quota_us=(2/4) * 100 = 50
cpu.cfs_period_us=100
{noformat}

This would cause CFS to cap the process at 50% of clock cycles.

What do you guys 

[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU

2013-06-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682677#comment-13682677
 ] 

Sandy Ryza commented on YARN-810:
-

[~criccomini], I'm intending to remove the vcore-pcore ratio in YARN-782.  If 
we did this and set a % ceiling on the amount of CPU that the sum of all 
containers can occupy, would that also satisfy your use case?

 Support CGroup ceiling enforcement on CPU
 -

 Key: YARN-810
 URL: https://issues.apache.org/jira/browse/YARN-810
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 2.0.5-alpha
Reporter: Chris Riccomini

 Problem statement:
 YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
 Containers are then allowed to request vcores between the minimum and maximum 
 defined in the yarn-site.xml.
 In the case where a single-threaded container requests 1 vcore, with a 
 pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
 the core it's using, provided that no other container is also using it. This 
 happens, even though the only guarantee that YARN/CGroups is making is that 
 the container will get at least 1/4th of the core.
 If a second container then comes along, the second container can take 
 resources from the first, provided that the first container is still getting 
 at least its fair share (1/4th).
 There are certain cases where this is desirable. There are also certain cases 
 where it might be desirable to have a hard limit on CPU usage, and not allow 
 the process to go above the specified resource requirement, even if it's 
 available.
 Here's an RFC that describes the problem in more detail:
 http://lwn.net/Articles/336127/
 Solution:
 As it happens, when CFS is used in combination with CGroups, you can enforce 
 a ceiling using two files in cgroups:
 {noformat}
 cpu.cfs_quota_us
 cpu.cfs_period_us
 {noformat}
 The usage of these two files is documented in more detail here:
 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
 Testing:
 I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, 
 it behaves as described above (it is a soft cap, and allows containers to use 
 more than they asked for). I then tested CFS CPU quotas manually with YARN.
 First, you can see that CFS is in use in the CGroup, based on the file names:
 {noformat}
 [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
 total 0
 -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
 drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
 -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
 -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
 -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
 10
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
 -1
 {noformat}
 Oddly, it appears that the cfs_period_us is set to .1s, not 1s.
 We can place processes in hard limits. I have process 4370 running YARN 
 container container_1371141151815_0003_01_03 on a host. By default, it's 
 running at ~300% cpu usage.
 {noformat}
 CPU
 4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
 {noformat}
 When I set the CFS quote:
 {noformat}
 echo 1000  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
  CPU
 4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
 {noformat}
 It drops to 1% usage, and you can see the box has room to spare:
 {noformat}
 Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si, 
 0.0%st
 {noformat}
 Turning the quota back to -1:
 {noformat}
 echo -1  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
 {noformat}
 Burns the cores again:
 {noformat}
 Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 
 0.0%st
 CPU
 4370 criccomi  20   0 1157m 563m  14m S 253.9  0.8  89:32.31 ...
 {noformat}
 On my dev box, I was testing CGroups by running a python process eight times, 
 to burn through all the cores, since it was doing as described above (giving 
 extra CPU to the process, even with a cpu.shares limit). Toggling the 
 

[jira] [Updated] (YARN-736) Add a multi-resource fair sharing metric

2013-06-13 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-736:


Attachment: YARN-736-1.patch

 Add a multi-resource fair sharing metric
 

 Key: YARN-736
 URL: https://issues.apache.org/jira/browse/YARN-736
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-736-1.patch, YARN-736.patch


 Currently, at a regular interval, the fair scheduler computes a fair memory 
 share for each queue and application inside it.  This fair share is not used 
 for scheduling decisions, but is displayed in the web UI, exposed as a 
 metric, and used for preemption decisions.
 With DRF and multi-resource scheduling, assigning a memory share as the fair 
 share metric to every queue no longer makes sense.  It's not obvious what the 
 replacement should be, but probably something like fractional fairness within 
 a queue, or distance from an ideal cluster state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU

2013-06-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682684#comment-13682684
 ] 

Sandy Ryza commented on YARN-810:
-

a configurable % ceiling I mean.

 Support CGroup ceiling enforcement on CPU
 -

 Key: YARN-810
 URL: https://issues.apache.org/jira/browse/YARN-810
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 2.0.5-alpha
Reporter: Chris Riccomini

 Problem statement:
 YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
 Containers are then allowed to request vcores between the minimum and maximum 
 defined in the yarn-site.xml.
 In the case where a single-threaded container requests 1 vcore, with a 
 pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
 the core it's using, provided that no other container is also using it. This 
 happens, even though the only guarantee that YARN/CGroups is making is that 
 the container will get at least 1/4th of the core.
 If a second container then comes along, the second container can take 
 resources from the first, provided that the first container is still getting 
 at least its fair share (1/4th).
 There are certain cases where this is desirable. There are also certain cases 
 where it might be desirable to have a hard limit on CPU usage, and not allow 
 the process to go above the specified resource requirement, even if it's 
 available.
 Here's an RFC that describes the problem in more detail:
 http://lwn.net/Articles/336127/
 Solution:
 As it happens, when CFS is used in combination with CGroups, you can enforce 
 a ceiling using two files in cgroups:
 {noformat}
 cpu.cfs_quota_us
 cpu.cfs_period_us
 {noformat}
 The usage of these two files is documented in more detail here:
 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
 Testing:
 I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, 
 it behaves as described above (it is a soft cap, and allows containers to use 
 more than they asked for). I then tested CFS CPU quotas manually with YARN.
 First, you can see that CFS is in use in the CGroup, based on the file names:
 {noformat}
 [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
 total 0
 -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
 drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
 -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
 -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
 -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
 10
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
 -1
 {noformat}
 Oddly, it appears that the cfs_period_us is set to .1s, not 1s.
 We can place processes in hard limits. I have process 4370 running YARN 
 container container_1371141151815_0003_01_03 on a host. By default, it's 
 running at ~300% cpu usage.
 {noformat}
 CPU
 4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
 {noformat}
 When I set the CFS quote:
 {noformat}
 echo 1000  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
  CPU
 4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
 {noformat}
 It drops to 1% usage, and you can see the box has room to spare:
 {noformat}
 Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si, 
 0.0%st
 {noformat}
 Turning the quota back to -1:
 {noformat}
 echo -1  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
 {noformat}
 Burns the cores again:
 {noformat}
 Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 
 0.0%st
 CPU
 4370 criccomi  20   0 1157m 563m  14m S 253.9  0.8  89:32.31 ...
 {noformat}
 On my dev box, I was testing CGroups by running a python process eight times, 
 to burn through all the cores, since it was doing as described above (giving 
 extra CPU to the process, even with a cpu.shares limit). Toggling the 
 cfs_quota_us seems to enforce a hard limit.
 Implementation:
 What do you guys think about introducing a variable to YarnConfiguration:
 bq. 

[jira] [Commented] (YARN-736) Add a multi-resource fair sharing metric

2013-06-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682689#comment-13682689
 ] 

Sandy Ryza commented on YARN-736:
-

Attached an updated patch that takes maximum shares into account.

 Add a multi-resource fair sharing metric
 

 Key: YARN-736
 URL: https://issues.apache.org/jira/browse/YARN-736
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-736-1.patch, YARN-736.patch


 Currently, at a regular interval, the fair scheduler computes a fair memory 
 share for each queue and application inside it.  This fair share is not used 
 for scheduling decisions, but is displayed in the web UI, exposed as a 
 metric, and used for preemption decisions.
 With DRF and multi-resource scheduling, assigning a memory share as the fair 
 share metric to every queue no longer makes sense.  It's not obvious what the 
 replacement should be, but probably something like fractional fairness within 
 a queue, or distance from an ideal cluster state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs

2013-06-13 Thread Adar Dembo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adar Dembo updated YARN-799:


Description: 
The implementation of

bq. 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

Tells the container-executor to write PIDs to cgroup.procs:

{code}
  public String getResourcesOption(ContainerId containerId) {
String containerName = containerId.toString();
StringBuilder sb = new StringBuilder(cgroups=);

if (isCpuWeightEnabled()) {
  sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs);
  sb.append(,);
}

if (sb.charAt(sb.length() - 1) == ',') {
  sb.deleteCharAt(sb.length() - 1);
} 
return sb.toString();
  }
{code}

Apparently, this file has not always been writeable:

https://patchwork.kernel.org/patch/116146/
http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

The RHEL version of the Linux kernel that I'm using has a CGroup module that 
has a non-writeable cgroup.procs file.

{quote}
$ uname -a
Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 
x86_64 x86_64 x86_64 GNU/Linux
{quote}

As a result, when the container-executor tries to run, it fails with this error 
message:

bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,

This is because the executor is given a resource by the 
CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

{quote}
$ pwd 
/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
$ ls -l
total 0
-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
{quote}

I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
and this appears to have fixed the problem.

I can think of several potential resolutions to this ticket:

1. Ignore the problem, and make people patch YARN when they hit this issue.
2. Write to /tasks instead of /cgroup.procs for everyone
3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
to /tasks.
4. Add a config to yarn-site that lets admins specify which file to write to.

Thoughts?

  was:
The implementation of

bq. 
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java

Tells the container-executor to write PIDs to cgroup.procs:

{code}
  public String getResourcesOption(ContainerId containerId) {
String containerName = containerId.toString();
StringBuilder sb = new StringBuilder(cgroups=);

if (isCpuWeightEnabled()) {
  sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + /cgroup.procs);
  sb.append(,);
}

if (sb.charAt(sb.length() - 1) == ',') {
  sb.deleteCharAt(sb.length() - 1);
}
return sb.toString();
  }
{code}

Apparently, this file has not always been writeable:

https://patchwork.kernel.org/patch/116146/
http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html

The RHEL version of the Linux kernel that I'm using has a CGroup module that 
has a non-writeable cgroup.procs file.

{quote}
$ uname -a
Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 
x86_64 x86_64 x86_64 GNU/Linux
{quote}

As a result, when the container-executor tries to run, it fails with this error 
message:

bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,

This is because the executor is given a resource by the 
CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:

{quote}
$ pwd 
/cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
$ ls -l
total 0
-r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
-rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
{quote}

I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
and this appears to have fixed the problem.

I can think of several potential resolutions to this ticket:

1. Ignore the problem, and make people patch YARN when they hit this issue.
2. Write to /tasks instead of /cgroup.procs for everyone
3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
to /tasks.
4. 

[jira] [Commented] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs

2013-06-13 Thread Adar Dembo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682698#comment-13682698
 ] 

Adar Dembo commented on YARN-799:
-

Thread group IDs should be written to /cgroup.procs while individual thread IDs 
are written to /tasks. Effectively this means that /cgroup.procs can be used to 
atomically move entire processes to a cgroup, while /tasks requires cgroup 
movement occur thread by thread, an inherently racy operation (i.e. a new 
thread is created during the move will be left behind in the old cgroup unless 
the move is retried). That makes /cgroup.procs more attractive for cgroup 
migration. Once threads are in the new cgroup, it doesn't matter how they got 
there: if they fork, their children will be in the new cgroup too.

However, /cgroup.procs has some issues:
# As identified in the original bug report, /cgroup.procs wasn't always 
writable, so we'd need to fall back to /tasks when /cgroup.procs is read-only.
# On all SLES 11 kernels I've tested (including the upcoming SLES 11 SP3), it's 
trivial to crash the kernel via /cgroup.procs.

 CgroupsLCEResourcesHandler tries to write to cgroup.procs
 -

 Key: YARN-799
 URL: https://issues.apache.org/jira/browse/YARN-799
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.4-alpha, 2.0.5-alpha
Reporter: Chris Riccomini

 The implementation of
 bq. 
 ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
 Tells the container-executor to write PIDs to cgroup.procs:
 {code}
   public String getResourcesOption(ContainerId containerId) {
 String containerName = containerId.toString();
 StringBuilder sb = new StringBuilder(cgroups=);
 if (isCpuWeightEnabled()) {
   sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + 
 /cgroup.procs);
   sb.append(,);
 }
 if (sb.charAt(sb.length() - 1) == ',') {
   sb.deleteCharAt(sb.length() - 1);
 } 
 return sb.toString();
   }
 {code}
 Apparently, this file has not always been writeable:
 https://patchwork.kernel.org/patch/116146/
 http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
 https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html
 The RHEL version of the Linux kernel that I'm using has a CGroup module that 
 has a non-writeable cgroup.procs file.
 {quote}
 $ uname -a
 Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 
 2011 x86_64 x86_64 x86_64 GNU/Linux
 {quote}
 As a result, when the container-executor tries to run, it fails with this 
 error message:
 bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,
 This is because the executor is given a resource by the 
 CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:
 {quote}
 $ pwd 
 /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
 $ ls -l
 total 0
 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
 {quote}
 I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
 and this appears to have fixed the problem.
 I can think of several potential resolutions to this ticket:
 1. Ignore the problem, and make people patch YARN when they hit this issue.
 2. Write to /tasks instead of /cgroup.procs for everyone
 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
 to /tasks.
 4. Add a config to yarn-site that lets admins specify which file to write to.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-804) mark AbstractService init/start/stop methods as final

2013-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682699#comment-13682699
 ] 

Hadoop QA commented on YARN-804:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587689/YARN-804-001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

  org.apache.hadoop.yarn.client.TestAMRMClientAsync

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1221//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1221//console

This message is automatically generated.

 mark AbstractService init/start/stop methods as final
 -

 Key: YARN-804
 URL: https://issues.apache.org/jira/browse/YARN-804
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Steve Loughran
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-804-001.patch


 Now that YARN-117 and MAPREDUCE-5298 are checked in, we can mark the public 
 AbstractService init/start/stop methods as final.
 Why? It puts the lifecycle check and error handling around the subclass code, 
 ensuring no lifecycle method gets called in the wrong state or gets called 
 more than once.When a {{serviceInit(), serviceStart()   serviceStop()}} 
 method throws an exception, it's caught and auto-triggers stop. 
 Marking the methods as final forces service implementations to move to the 
 stricter lifecycle. It has one side effect: some of the mocking tests play up 
 -I'll need some assistance here

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-803) factor out scheduler config validation from the ResourceManager to each scheduler implementation

2013-06-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682709#comment-13682709
 ] 

Hadoop QA commented on YARN-803:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587688/YARN-803.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1222//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1222//console

This message is automatically generated.

 factor out scheduler config validation from the ResourceManager to each 
 scheduler implementation
 

 Key: YARN-803
 URL: https://issues.apache.org/jira/browse/YARN-803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-803.patch, YARN-803.patch


 Per discussion in YARN-789 we should factor out from the ResourceManager 
 class the scheduler config validations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU

2013-06-13 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682710#comment-13682710
 ] 

Chris Riccomini commented on YARN-810:
--

Hey Sandy,

If I understand you correctly, not quite. I think what you're saying is, if we 
set a % ceiling that all containers combined could use (say 80%), then a single 
container running would get 80% usage, but if two containers were running, 
they'd get roughly 40% each, right?

What I'm saying is, if one container is running, it gets a maximum 40% of a 
core (even if the other 60% is available). If two are running, they still both 
get 40% of a core.

We have a situation where we want very predictable CPU usage. We don't want a 
container to run happily because it's been over-provisioned based on luck, and 
then when a second container gets allocated on the box, it suddenly slows down 
to its allocated CPU usage, and slows way down. We'd rather it be very 
predictable, and know up front that the allocated CPU resources aren't enough.

Does this make sense? I'm not sure I'm making things as clear as they could be.

Cheers,
Chris

 Support CGroup ceiling enforcement on CPU
 -

 Key: YARN-810
 URL: https://issues.apache.org/jira/browse/YARN-810
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 2.0.5-alpha
Reporter: Chris Riccomini

 Problem statement:
 YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
 Containers are then allowed to request vcores between the minimum and maximum 
 defined in the yarn-site.xml.
 In the case where a single-threaded container requests 1 vcore, with a 
 pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
 the core it's using, provided that no other container is also using it. This 
 happens, even though the only guarantee that YARN/CGroups is making is that 
 the container will get at least 1/4th of the core.
 If a second container then comes along, the second container can take 
 resources from the first, provided that the first container is still getting 
 at least its fair share (1/4th).
 There are certain cases where this is desirable. There are also certain cases 
 where it might be desirable to have a hard limit on CPU usage, and not allow 
 the process to go above the specified resource requirement, even if it's 
 available.
 Here's an RFC that describes the problem in more detail:
 http://lwn.net/Articles/336127/
 Solution:
 As it happens, when CFS is used in combination with CGroups, you can enforce 
 a ceiling using two files in cgroups:
 {noformat}
 cpu.cfs_quota_us
 cpu.cfs_period_us
 {noformat}
 The usage of these two files is documented in more detail here:
 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
 Testing:
 I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, 
 it behaves as described above (it is a soft cap, and allows containers to use 
 more than they asked for). I then tested CFS CPU quotas manually with YARN.
 First, you can see that CFS is in use in the CGroup, based on the file names:
 {noformat}
 [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
 total 0
 -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
 drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
 -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
 -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
 -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
 10
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
 -1
 {noformat}
 Oddly, it appears that the cfs_period_us is set to .1s, not 1s.
 We can place processes in hard limits. I have process 4370 running YARN 
 container container_1371141151815_0003_01_03 on a host. By default, it's 
 running at ~300% cpu usage.
 {noformat}
 CPU
 4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
 {noformat}
 When I set the CFS quote:
 {noformat}
 echo 1000  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
  CPU
 4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
 {noformat}
 It drops to 1% usage, and you can see the box has room to spare:
 

[jira] [Commented] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs

2013-06-13 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682715#comment-13682715
 ] 

Chris Riccomini commented on YARN-799:
--

Hey Adar,

Thanks, this is super useful. In the case of YARN, I believe it's setting up 
the cgroup.procs file in the container-executor, prior to any user-code being 
executed. I also don't think it's using any threading. In such a case, is it 
safe to write to tasks? It seems that if a single threaded process moves itself 
into cgroup.procs, and then spawns the user code, everything from then-on-out 
should end up in /tasks, right?

Cheers,
Chris

 CgroupsLCEResourcesHandler tries to write to cgroup.procs
 -

 Key: YARN-799
 URL: https://issues.apache.org/jira/browse/YARN-799
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.4-alpha, 2.0.5-alpha
Reporter: Chris Riccomini

 The implementation of
 bq. 
 ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
 Tells the container-executor to write PIDs to cgroup.procs:
 {code}
   public String getResourcesOption(ContainerId containerId) {
 String containerName = containerId.toString();
 StringBuilder sb = new StringBuilder(cgroups=);
 if (isCpuWeightEnabled()) {
   sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + 
 /cgroup.procs);
   sb.append(,);
 }
 if (sb.charAt(sb.length() - 1) == ',') {
   sb.deleteCharAt(sb.length() - 1);
 } 
 return sb.toString();
   }
 {code}
 Apparently, this file has not always been writeable:
 https://patchwork.kernel.org/patch/116146/
 http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html
 https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html
 The RHEL version of the Linux kernel that I'm using has a CGroup module that 
 has a non-writeable cgroup.procs file.
 {quote}
 $ uname -a
 Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 
 2011 x86_64 x86_64 x86_64 GNU/Linux
 {quote}
 As a result, when the container-executor tries to run, it fails with this 
 error message:
 bq.fprintf(LOGFILE, Failed to write pid %s (%d) to file %s - %s\n,
 This is because the executor is given a resource by the 
 CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable:
 {quote}
 $ pwd 
 /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_01
 $ ls -l
 total 0
 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release
 -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks
 {quote}
 I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, 
 and this appears to have fixed the problem.
 I can think of several potential resolutions to this ticket:
 1. Ignore the problem, and make people patch YARN when they hit this issue.
 2. Write to /tasks instead of /cgroup.procs for everyone
 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back 
 to /tasks.
 4. Add a config to yarn-site that lets admins specify which file to write to.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >