date:20130531


[ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671572#comment-13671572
 ] 

Hadoop QA commented on YARN-530:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12585627/YARN-530-012.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1048//console

This message is automatically generated.

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.4-alpha
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117changes.pdf, YARN-530-005.patch, 
 YARN-530-008.patch, YARN-530-009.patch, YARN-530-010.patch, 
 YARN-530-011.patch, YARN-530-012.patch, YARN-530-2.patch, YARN-530-3.patch, 
 YARN-530.4.patch, YARN-530.patch


 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-692) Creating NMToken master key on RM and sharing it with NM as a part of RM-NM heartbeat.


[ 
https://issues.apache.org/jira/browse/YARN-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671631#comment-13671631
 ] 

Hadoop QA commented on YARN-692:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12585639/YARN-692.20130531.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1049//console

This message is automatically generated.

 Creating NMToken master key on RM and sharing it with NM as a part of RM-NM 
 heartbeat.
 --

 Key: YARN-692
 URL: https://issues.apache.org/jira/browse/YARN-692
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-692.20130530.1.patch, YARN-692.20130530.2.patch, 
 YARN-692.20130531.patch


 This is related to YARN-613 . Here we will be implementing NMToken generation 
 on RM side and sharing it with NM during RM-NM heartbeat. As a part of this 
 JIRA mater key will only be made available to NM but there will be no 
 validation done until AM-NM communication is fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-733) TestNMClient fails occasionally

2013-05-31 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-733:
-

Attachment: YARN-733.1.patch

In the patch:
1. Update the tests to wait until the expected container status occur
2. In NMClientImpl, add a piece of javadoc to describe that 
startContainer/stopContainer returns doesn't mean container is actually 
started/stopped. There could be a transit container status.

Have run the test for tens of times, and no failure occurs.

 TestNMClient fails occasionally
 ---

 Key: YARN-733
 URL: https://issues.apache.org/jira/browse/YARN-733
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-733.1.patch


 The problem happens at:
 {code}
 // getContainerStatus can be called after stopContainer
 try {
   ContainerStatus status = nmClient.getContainerStatus(
   container.getId(), container.getNodeId(),
   container.getContainerToken());
   assertEquals(container.getId(), status.getContainerId());
   assertEquals(ContainerState.RUNNING, status.getState());
   assertTrue( + i, status.getDiagnostics().contains(
   Container killed by the ApplicationMaster.));
   assertEquals(-1000, status.getExitStatus());
 } catch (YarnRemoteException e) {
   fail(Exception is not expected);
 }
 {code}
 NMClientImpl#stopContainer returns, but container hasn't been stopped 
 immediately. ContainerManangerImpl implements stopContainer in async style. 
 Therefore, the container's status is in transition. 
 NMClientImpl#getContainerStatus immediately after stopContainer will get 
 either the RUNNING status or the COMPLETE one.
 There will be the similar problem wrt NMClientImpl#startContainer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-737) ContainerManagerImpl can directly throw NMNotYetReadyException and InvalidContainerException after YARN-142

2013-05-31 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671707#comment-13671707
 ] 

Jian He commented on YARN-737:
--

bq.Why are we throwing RuntimeException instead of Exception? any reason?
Theses two exceptions actually extend from YarnRemoteException which extend 
from Exception, what do you mean?

 ContainerManagerImpl can directly throw NMNotYetReadyException and 
 InvalidContainerException after YARN-142 
 

 Key: YARN-737
 URL: https://issues.apache.org/jira/browse/YARN-737
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-737.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call

2013-05-31 Thread Trevor Lorimer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671715#comment-13671715
 ] 

Trevor Lorimer commented on YARN-696:
-

Patch submitted, built against trunk and unit test included.

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Priority: Trivial
 Attachments: 0001-YARN-696.patch


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-733) TestNMClient fails occasionally


[ 
https://issues.apache.org/jira/browse/YARN-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671712#comment-13671712
 ] 

Omkar Vinit Joshi commented on YARN-733:


[~zjshen] small nit 
bq. may still need some time to make the container actually started or stopped 
because of its asynchronous
may still need some time to either start or stop the container because of its 
asynchronous

I hope we are not doing getContainerStatus after Application is finished in 
which case we won't have tokens at NM side for authentication.

 TestNMClient fails occasionally
 ---

 Key: YARN-733
 URL: https://issues.apache.org/jira/browse/YARN-733
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-733.1.patch


 The problem happens at:
 {code}
 // getContainerStatus can be called after stopContainer
 try {
   ContainerStatus status = nmClient.getContainerStatus(
   container.getId(), container.getNodeId(),
   container.getContainerToken());
   assertEquals(container.getId(), status.getContainerId());
   assertEquals(ContainerState.RUNNING, status.getState());
   assertTrue( + i, status.getDiagnostics().contains(
   Container killed by the ApplicationMaster.));
   assertEquals(-1000, status.getExitStatus());
 } catch (YarnRemoteException e) {
   fail(Exception is not expected);
 }
 {code}
 NMClientImpl#stopContainer returns, but container hasn't been stopped 
 immediately. ContainerManangerImpl implements stopContainer in async style. 
 Therefore, the container's status is in transition. 
 NMClientImpl#getContainerStatus immediately after stopContainer will get 
 either the RUNNING status or the COMPLETE one.
 There will be the similar problem wrt NMClientImpl#startContainer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler


[ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671738#comment-13671738
 ] 

Sandy Ryza commented on YARN-326:
-

Attached a rebased patch

 Add multi-resource scheduling to the fair scheduler
 ---

 Key: YARN-326
 URL: https://issues.apache.org/jira/browse/YARN-326
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: FairSchedulerDRFDesignDoc-1.pdf, 
 FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, 
 YARN-326-2.patch, YARN-326-3.patch, YARN-326-4.patch, YARN-326-5.patch, 
 YARN-326-6.patch, YARN-326.patch, YARN-326.patch


 With YARN-2 in, the capacity scheduler has the ability to schedule based on 
 multiple resources, using dominant resource fairness.  The fair scheduler 
 should be able to do multiple resource scheduling as well, also using 
 dominant resource fairness.
 More details to come on how the corner cases with fair scheduler configs such 
 as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-735) Make ApplicationAttemptID, ContainerID, NodeID immutable


[ 
https://issues.apache.org/jira/browse/YARN-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671739#comment-13671739
 ] 

Siddharth Seth commented on YARN-735:
-

bq. with a local copy, getAppId can save from every time calling 
covertFromPrortoFormat I think
Fair enough. Can the setter be simplified a bit. After a build(), can we unset 
the builder and disallow any subsequent sets. Also, in ApplicationId. The 
applicationId / appAttemptId instances could be initialized as part of the 
proto constructor itself - which would allow simpler getter code.

 Make ApplicationAttemptID, ContainerID, NodeID immutable
 

 Key: YARN-735
 URL: https://issues.apache.org/jira/browse/YARN-735
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-735.1.patch, YARN-735.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-710) Add to ser/deser methods to RecordFactory


[ 
https://issues.apache.org/jira/browse/YARN-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671758#comment-13671758
 ] 

Siddharth Seth commented on YARN-710:
-

bq. On the 3rd one, that would require assuming that I could remove 'PBImpl' 
from the class, add 'Proto' and then I get the proto implementation. Plus 
package swithing. While the 'PBImpl' postfix is used as a convention already, 
the 'Proto' and the package are not used as convention, thus I'd leave it as it 
is.
Is it possible to just use the return type on the getProto method, instead of 
creating an instance ? (Message message = getProto(newRecordInstance(clazz));)

Otherwise, patch looks good to me.

 Add to ser/deser methods to RecordFactory
 -

 Key: YARN-710
 URL: https://issues.apache.org/jira/browse/YARN-710
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-710.patch, YARN-710.patch


 I order to do things like AMs failover and checkpointing I need to serialize 
 app IDs, app attempt IDs, containers and/or IDs,  resource requests, etc.
 Because we are wrapping/hiding the PB implementation from the APIs, we are 
 hiding the built in PB ser/deser capabilities.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler


[ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671762#comment-13671762
 ] 

Hadoop QA commented on YARN-326:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12585656/YARN-326-6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1053//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1053//console

This message is automatically generated.

 Add multi-resource scheduling to the fair scheduler
 ---

 Key: YARN-326
 URL: https://issues.apache.org/jira/browse/YARN-326
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: FairSchedulerDRFDesignDoc-1.pdf, 
 FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, 
 YARN-326-2.patch, YARN-326-3.patch, YARN-326-4.patch, YARN-326-5.patch, 
 YARN-326-6.patch, YARN-326.patch, YARN-326.patch


 With YARN-2 in, the capacity scheduler has the ability to schedule based on 
 multiple resources, using dominant resource fairness.  The fair scheduler 
 should be able to do multiple resource scheduling as well, also using 
 dominant resource fairness.
 More details to come on how the corner cases with fair scheduler configs such 
 as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-743) AMRMClient should have a separate setProgress instead of sending progress as part of allocate

Vinod Kumar Vavilapalli created YARN-743:


 Summary: AMRMClient should have a separate setProgress instead of 
sending progress as part of allocate
 Key: YARN-743
 URL: https://issues.apache.org/jira/browse/YARN-743
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Progress updates are independent of allocations and so should be set explicitly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler

2013-05-31 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671805#comment-13671805
 ] 

Alejandro Abdelnur commented on YARN-326:
-

LGTM, 3 nits:

* the parseResourceConfigValue(String v) should require ###mb and ###vcores 
present in the config, else fail. As currently the config is only a ### (for 
mb), we have to mark this as an incompat change.
* the DRFPoliciy#compare() does not need to do a Math.signum(s1.start - 
s2.start), it can be just s1.start - s2.start.
* the DRFPoliciy#compare() should not use the name of the job to determine 
order, in the unlikely care 2 jobs are started at the same sime, the return 
should be zero. Things should work just fine.

 Add multi-resource scheduling to the fair scheduler
 ---

 Key: YARN-326
 URL: https://issues.apache.org/jira/browse/YARN-326
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: FairSchedulerDRFDesignDoc-1.pdf, 
 FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, 
 YARN-326-2.patch, YARN-326-3.patch, YARN-326-4.patch, YARN-326-5.patch, 
 YARN-326-6.patch, YARN-326.patch, YARN-326.patch


 With YARN-2 in, the capacity scheduler has the ability to schedule based on 
 multiple resources, using dominant resource fairness.  The fair scheduler 
 should be able to do multiple resource scheduling as well, also using 
 dominant resource fairness.
 More details to come on how the corner cases with fair scheduler configs such 
 as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-735) Make ApplicationAttemptID, ContainerID, NodeID immutable

2013-05-31 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-735:
-

Attachment: YARN-735.2.patch

New patch fixed above comments

 Make ApplicationAttemptID, ContainerID, NodeID immutable
 

 Key: YARN-735
 URL: https://issues.apache.org/jira/browse/YARN-735
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-735.1.patch, YARN-735.2.patch, YARN-735.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-694) AM uses the AMNMToken to authenticate all communication with NM. NM remembers and updates token across RM restart

[
https://issues.apache.org/jira/browse/YARN-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671819#comment-13671819
]

Omkar Vinit Joshi commented on YARN-694:

Need to fix NodeId check as a part of this at NM side. (YARN-739)

AM uses the AMNMToken to authenticate all communication with NM. NM remembers
and updates token across RM restart
-

Key: YARN-694
URL: https://issues.apache.org/jira/browse/YARN-694
Project: Hadoop YARN
Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi

AM uses the AMNMToken to authenticate all the AM-NM communication.
NM will validate AMNMToken in below manner
* If AMNMToken is using current or previous master key then the AMNMToken is
valid. In this case it will update its cache with this key corresponding to
appId.
* If AMNMToken is using the master key which is present in NM's cache
corresponding to AM's appId then it will be validated based on this.
* If AMNMToken is invalid then NM will reject AM calls.
Modification for ContainerToken
* At present RPC validates AM-NM communication based on ContainerToken. It
will be replaced with AMNMToken. Also now onwards AM will use AMNMToken per
NM (replacing earlier behavior of ContainerToken per container per NM).
* startContainer in case of Secured environment is using ContainerToken from
UGI YARN-617; however after this it will use it from the payload (Container).
* ContainerToken will exist and it will only be used to validate the AM's
container start request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-694) AM uses the NMToken to authenticate all communication with NM. NM remembers and updates token across RM restart

[
https://issues.apache.org/jira/browse/YARN-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Omkar Vinit Joshi updated YARN-694:
---

Summary: AM uses the NMToken to authenticate all communication with NM. NM
remembers and updates token across RM restart (was: AM uses the AMNMToken to
authenticate all communication with NM. NM remembers and updates token across
RM restart)

AM uses the NMToken to authenticate all communication with NM. NM remembers
and updates token across RM restart
---

Key: YARN-694
URL: https://issues.apache.org/jira/browse/YARN-694
Project: Hadoop YARN
Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi

[jira] [Updated] (YARN-694) AM uses the NMToken to authenticate all communication with NM. NM remembers and updates token across RM restart

[
https://issues.apache.org/jira/browse/YARN-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Omkar Vinit Joshi updated YARN-694:
---

Description:
AM uses the NMToken to authenticate all the AM-NM communication.
NM will validate NMToken in below manner
* If NMToken is using current or previous master key then the NMToken is valid.
In this case it will update its cache with this key corresponding to appId.
* If NMToken is using the master key which is present in NM's cache
corresponding to AM's appId then it will be validated based on this.
* If NMToken is invalid then NM will reject AM calls.

Modification for ContainerToken
* At present RPC validates AM-NM communication based on ContainerToken. It will
be replaced with NMToken. Also now onwards AM will use NMToken per NM
(replacing earlier behavior of ContainerToken per container per NM).
* startContainer in case of Secured environment is using ContainerToken from
UGI YARN-617; however after this it will use it from the payload (Container).
* ContainerToken will exist and it will only be used to validate the AM's
container start request.

was:
AM uses the AMNMToken to authenticate all the AM-NM communication.
NM will validate AMNMToken in below manner
* If AMNMToken is using current or previous master key then the AMNMToken is
valid. In this case it will update its cache with this key corresponding to
appId.
* If AMNMToken is using the master key which is present in NM's cache
corresponding to AM's appId then it will be validated based on this.
* If AMNMToken is invalid then NM will reject AM calls.

Modification for ContainerToken
* At present RPC validates AM-NM communication based on ContainerToken. It will
be replaced with AMNMToken. Also now onwards AM will use AMNMToken per NM
(replacing earlier behavior of ContainerToken per container per NM).
* startContainer in case of Secured environment is using ContainerToken from
UGI YARN-617; however after this it will use it from the payload (Container).
* ContainerToken will exist and it will only be used to validate the AM's
container start request.

AM uses the NMToken to authenticate all communication with NM. NM remembers
and updates token across RM restart
---

Key: YARN-694
URL: https://issues.apache.org/jira/browse/YARN-694
Project: Hadoop YARN
Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi

AM uses the NMToken to authenticate all the AM-NM communication.
NM will validate NMToken in below manner
* If NMToken is using current or previous master key then the NMToken is
valid. In this case it will update its cache with this key corresponding to
appId.
* If NMToken is using the master key which is present in NM's cache
corresponding to AM's appId then it will be validated based on this.
* If NMToken is invalid then NM will reject AM calls.
Modification for ContainerToken
* At present RPC validates AM-NM communication based on ContainerToken. It
will be replaced with NMToken. Also now onwards AM will use NMToken per NM
(replacing earlier behavior of ContainerToken per container per NM).
* startContainer in case of Secured environment is using ContainerToken from
UGI YARN-617; however after this it will use it from the payload (Container).
* ContainerToken will exist and it will only be used to validate the AM's
container start request.

[jira] [Commented] (YARN-735) Make ApplicationAttemptID, ContainerID, NodeID immutable


[ 
https://issues.apache.org/jira/browse/YARN-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671828#comment-13671828
 ] 

Hadoop QA commented on YARN-735:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12585665/YARN-735.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 41 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1054//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1054//console

This message is automatically generated.

 Make ApplicationAttemptID, ContainerID, NodeID immutable
 

 Key: YARN-735
 URL: https://issues.apache.org/jira/browse/YARN-735
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-735.1.patch, YARN-735.2.patch, YARN-735.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-117) Enhance YARN service model


[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671833#comment-13671833
 ] 

Hadoop QA commented on YARN-117:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12585628/YARN-117-012.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1055//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1055//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1055//console

This message is automatically generated.

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-007.patch, YARN-117-008.patch, 
 YARN-117-009.patch, YARN-117-010.patch, YARN-117-011.patch, 
 YARN-117-012.patch, YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, 
 YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner

[jira] [Assigned] (YARN-720) container-log4j.properties should not refer to mapreduce properties

2013-05-31 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-720:


Assignee: Zhijie Shen

 container-log4j.properties should not refer to mapreduce properties
 ---

 Key: YARN-720
 URL: https://issues.apache.org/jira/browse/YARN-720
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Zhijie Shen

 This refers to yarn.app.mapreduce.container.log.dir and 
 yarn.app.mapreduce.container.log.filesize. This should either be moved into 
 the MR codebase. Alternately the parameters should be renamed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-326) Add multi-resource scheduling to the fair scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-326:


Labels: incompatible  (was: )

 Add multi-resource scheduling to the fair scheduler
 ---

 Key: YARN-326
 URL: https://issues.apache.org/jira/browse/YARN-326
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
  Labels: incompatible
 Attachments: FairSchedulerDRFDesignDoc-1.pdf, 
 FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, 
 YARN-326-2.patch, YARN-326-3.patch, YARN-326-4.patch, YARN-326-5.patch, 
 YARN-326-6.patch, YARN-326.patch, YARN-326.patch


 With YARN-2 in, the capacity scheduler has the ability to schedule based on 
 multiple resources, using dominant resource fairness.  The fair scheduler 
 should be able to do multiple resource scheduling as well, also using 
 dominant resource fairness.
 More details to come on how the corner cases with fair scheduler configs such 
 as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Moved] (YARN-744) Locking not correct in org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest request)

2013-05-31 Thread Bikas Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha moved MAPREDUCE-3899 to YARN-744:


  Component/s: (was: resourcemanager)
   (was: mrv2)
   resourcemanager
Fix Version/s: (was: 0.23.0)
Affects Version/s: (was: 0.23.0)
  Key: YARN-744  (was: MAPREDUCE-3899)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Locking not correct in 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(AllocateRequest
  request)
 ---

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: MAPREDUCE-3899-branch-0.23.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-326) Add multi-resource scheduling to the fair scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-326:


Attachment: YARN-326-7.patch

 Add multi-resource scheduling to the fair scheduler
 ---

 Key: YARN-326
 URL: https://issues.apache.org/jira/browse/YARN-326
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
  Labels: incompatible
 Attachments: FairSchedulerDRFDesignDoc-1.pdf, 
 FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, 
 YARN-326-2.patch, YARN-326-3.patch, YARN-326-4.patch, YARN-326-5.patch, 
 YARN-326-6.patch, YARN-326-7.patch, YARN-326.patch, YARN-326.patch


 With YARN-2 in, the capacity scheduler has the ability to schedule based on 
 multiple resources, using dominant resource fairness.  The fair scheduler 
 should be able to do multiple resource scheduling as well, also using 
 dominant resource fairness.
 More details to come on how the corner cases with fair scheduler configs such 
 as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler


[ 
https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671862#comment-13671862
 ] 

Sandy Ryza commented on YARN-326:
-

Uploaded a patch that addresses Alejandro's comments

 Add multi-resource scheduling to the fair scheduler
 ---

 Key: YARN-326
 URL: https://issues.apache.org/jira/browse/YARN-326
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
  Labels: incompatible
 Attachments: FairSchedulerDRFDesignDoc-1.pdf, 
 FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, 
 YARN-326-2.patch, YARN-326-3.patch, YARN-326-4.patch, YARN-326-5.patch, 
 YARN-326-6.patch, YARN-326-7.patch, YARN-326.patch, YARN-326.patch


 With YARN-2 in, the capacity scheduler has the ability to schedule based on 
 multiple resources, using dominant resource fairness.  The fair scheduler 
 should be able to do multiple resource scheduling as well, also using 
 dominant resource fairness.
 More details to come on how the corner cases with fair scheduler configs such 
 as min and max resources will be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

2013-05-31 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-569:
---

Attachment: YARN-569.3.patch

 CapacityScheduler: support for preemption (using a capacity monitor)
 

 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, 
 preemption.2.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch, 
 YARN-569.patch, YARN-569.patch


 There is a tension between the fast-pace reactive role of the 
 CapacityScheduler, which needs to respond quickly to 
 applications resource requests, and node updates, and the more introspective, 
 time-based considerations 
 needed to observe and correct for capacity balance. To this purpose we opted 
 instead of hacking the delicate
 mechanisms of the CapacityScheduler directly to add support for preemption by 
 means of a Capacity Monitor,
 which can be run optionally as a separate service (much like the 
 NMLivelinessMonitor).
 The capacity monitor (similarly to equivalent functionalities in the fairness 
 scheduler) operates running on intervals 
 (e.g., every 3 seconds), observe the state of the assignment of resources to 
 queues from the capacity scheduler, 
 performs off-line computation to determine if preemption is needed, and how 
 best to edit the current schedule to 
 improve capacity, and generates events that produce four possible actions:
 # Container de-reservations
 # Resource-based preemptions
 # Container-based preemptions
 # Container killing
 The actions listed above are progressively more costly, and it is up to the 
 policy to use them as desired to achieve the rebalancing goals. 
 Note that due to the lag in the effect of these actions the policy should 
 operate at the macroscopic level (e.g., preempt tens of containers
 from a queue) and not trying to tightly and consistently micromanage 
 container allocations. 
 - Preemption policy  (ProportionalCapacityPreemptionPolicy): 
 - 
 Preemption policies are by design pluggable, in the following we present an 
 initial policy (ProportionalCapacityPreemptionPolicy) we have been 
 experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as 
 follows:
 # it gathers from the scheduler the state of the queues, in particular, their 
 current capacity, guaranteed capacity and pending requests (*)
 # if there are pending requests from queues that are under capacity it 
 computes a new ideal balanced state (**)
 # it computes the set of preemptions needed to repair the current schedule 
 and achieve capacity balance (accounting for natural completion rates, and 
 respecting bounds on the amount of preemption we allow for each round)
 # it selects which applications to preempt from each over-capacity queue (the 
 last one in the FIFO order)
 # it remove reservations from the most recently assigned app until the amount 
 of resource to reclaim is obtained, or until no more reservations exits
 # (if not enough) it issues preemptions for containers from the same 
 applications (reverse chronological order, last assigned container first) 
 again until necessary or until no containers except the AM container are left,
 # (if not enough) it moves onto unreserve and preempt from the next 
 application. 
 # containers that have been asked to preempt are tracked across executions. 
 If a containers is among the one to be preempted for more than a certain 
 time, the container is moved in a the list of containers to be forcibly 
 killed. 
 Notes:
 (*) at the moment, in order to avoid double-counting of the requests, we only 
 look at the ANY part of pending resource requests, which means we might not 
 preempt on behalf of AMs that ask only for specific locations but not any. 
 (**) The ideal balance state is one in which each queue has at least its 
 guaranteed capacity, and the spare capacity is distributed among queues (that 
 wants some) as a weighted fair share. Where the weighting is based on the 
 guaranteed capacity of a queue, and the function runs to a fix point.  
 Tunables of the ProportionalCapacityPreemptionPolicy:
 # observe-only mode (i.e., log the actions it would take, but behave as 
 read-only)
 # how frequently to run the policy
 # how long to wait between preemption and kill of a container
 # which fraction of the containers I would like to obtain should I preempt 
 (has to do with the natural rate at which containers are returned)
 # deadzone size, i.e., what % of over-capacity should I ignore (if we are off 
 perfect balance by some small % we ignore it)
 # overall amount of

[jira] [Commented] (YARN-735) Make ApplicationAttemptID, ContainerID, NodeID immutable


[ 
https://issues.apache.org/jira/browse/YARN-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671909#comment-13671909
 ] 

Siddharth Seth commented on YARN-735:
-

+1. Committing this. Thanks Jian.

 Make ApplicationAttemptID, ContainerID, NodeID immutable
 

 Key: YARN-735
 URL: https://issues.apache.org/jira/browse/YARN-735
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-735.1.patch, YARN-735.2.patch, YARN-735.2.patch, 
 YARN-735.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)

[
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671916#comment-13671916
]

Hadoop QA commented on YARN-569:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12585682/YARN-569.3.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:red}-1 findbugs{color}. The patch appears to introduce 2 new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/1058//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/1058//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1058//console

This message is automatically generated.

CapacityScheduler: support for preemption (using a capacity monitor)

Key: YARN-569
URL: https://issues.apache.org/jira/browse/YARN-569
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf,
preemption.2.patch, YARN-569.1.patch, YARN-569.2.patch, YARN-569.3.patch,
YARN-569.patch, YARN-569.patch

There is a tension between the fast-pace reactive role of the
CapacityScheduler, which needs to respond quickly to
applications resource requests, and node updates, and the more introspective,
time-based considerations
needed to observe and correct for capacity balance. To this purpose we opted
instead of hacking the delicate
mechanisms of the CapacityScheduler directly to add support for preemption by
means of a Capacity Monitor,
which can be run optionally as a separate service (much like the
NMLivelinessMonitor).
The capacity monitor (similarly to equivalent functionalities in the fairness
scheduler) operates running on intervals
(e.g., every 3 seconds), observe the state of the assignment of resources to
queues from the capacity scheduler,
performs off-line computation to determine if preemption is needed, and how
best to edit the current schedule to
improve capacity, and generates events that produce four possible actions:
# Container de-reservations
# Resource-based preemptions
# Container-based preemptions
# Container killing
The actions listed above are progressively more costly, and it is up to the
policy to use them as desired to achieve the rebalancing goals.
Note that due to the lag in the effect of these actions the policy should
operate at the macroscopic level (e.g., preempt tens of containers
from a queue) and not trying to tightly and consistently micromanage
container allocations.
- Preemption policy (ProportionalCapacityPreemptionPolicy):
-
Preemption policies are by design pluggable, in the following we present an
initial policy (ProportionalCapacityPreemptionPolicy) we have been
experimenting with. The ProportionalCapacityPreemptionPolicy behaves as
follows:
# it gathers from the scheduler the state of the queues, in particular, their
current capacity, guaranteed capacity and pending requests (*)
# if there are pending requests from queues that are under capacity it
computes a new ideal balanced state (**)
# it computes the set of preemptions needed to repair the current schedule
and achieve capacity balance (accounting for natural completion rates, and
respecting bounds on the amount of preemption we allow for each round)
# it selects which applications to preempt from each over-capacity queue (the
last one in the FIFO order)
# it remove reservations from the most recently assigned app until the amount
of resource to reclaim is obtained, or until no more reservations exits
# (if not enough) it issues

[jira] [Resolved] (YARN-528) Make IDs read only

[
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Siddharth Seth resolved YARN-528.
-

Resolution: Fixed
Fix Version/s: 2.1.0-beta

Closing since all the sub-jiras are done.

Make IDs read only
--

Attachments: y528_AppIdPart_01_Refactor.txt,
y528_AppIdPart_02_AppIdChanges.txt, y528_AppIdPart_03_fixUsage.txt,
y528_ApplicationIdComplete_WIP.txt, YARN-528.txt, YARN-528.txt

I really would like to rip out most if not all of the abstraction layer that
sits in-between Protocol Buffers, the RPC, and the actual user code. We have
no plans to support any other serialization type, and the abstraction layer
just, makes it more difficult to change protocols, makes changing them more
error prone, and slows down the objects themselves.
Completely doing that is a lot of work. This JIRA is a first step towards
that. It makes the various ID objects immutable. If this patch is wel
received I will try to go through other objects/classes of objects and update
them in a similar way.
This is probably the last time we will be able to make a change like this
before 2.0 stabilizes and YARN APIs will not be able to be changed.

[jira] [Reopened] (YARN-528) Make IDs read only

[
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kumar Vavilapalli reopened YARN-528:
--

Make IDs read only
--

Attachments: y528_AppIdPart_01_Refactor.txt,
y528_AppIdPart_02_AppIdChanges.txt, y528_AppIdPart_03_fixUsage.txt,
y528_ApplicationIdComplete_WIP.txt, YARN-528.txt, YARN-528.txt

[jira] [Resolved] (YARN-528) Make IDs read only

[
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kumar Vavilapalli resolved YARN-528.
--

Resolution: Duplicate
Fix Version/s: (was: 2.1.0-beta)

Make IDs read only
--

Key: YARN-528
URL: https://issues.apache.org/jira/browse/YARN-528
Project: Hadoop YARN
Issue Type: Sub-task
Affects Versions: 2.0.4-alpha
Reporter: Robert Joseph Evans
Assignee: Siddharth Seth
Attachments: y528_AppIdPart_01_Refactor.txt,
y528_AppIdPart_02_AppIdChanges.txt, y528_AppIdPart_03_fixUsage.txt,
y528_ApplicationIdComplete_WIP.txt, YARN-528.txt, YARN-528.txt

[jira] [Commented] (YARN-735) Make ApplicationAttemptID, ContainerID, NodeID immutable

2013-05-31 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671926#comment-13671926
 ] 

Hudson commented on YARN-735:
-

Integrated in Hadoop-trunk-Commit #3824 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3824/])
YARN-735. Make ApplicationAttemptId, ContainerId and NodeId immutable. 
Contributed by Jian He. (Revision 1488439)

 Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1488439
Files : 
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/local/LocalContainerAllocator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRAppBenchmark.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MockJobs.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRuntimeEstimators.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestStagingCleanup.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/webapp/TestAMWebApp.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/webapp/TestAMWebServices.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/webapp/TestAMWebServicesAttempts.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/webapp/TestAMWebServicesJobConf.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/webapp/TestAMWebServicesJobs.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/webapp/TestAMWebServicesTasks.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHSWebApp.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServices.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesAttempts.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesJobConf.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesJobs.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesTasks.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeId.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationAttemptIdPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationIdPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerIdPBImpl.java
*

[jira] [Commented] (YARN-717) Copy BuilderUtil methods into token-related records


[ 
https://issues.apache.org/jira/browse/YARN-717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671950#comment-13671950
 ] 

Hadoop QA commented on YARN-717:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12585689/YARN-717.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 28 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 4 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1060//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1060//console

This message is automatically generated.

 Copy BuilderUtil methods into token-related records
 ---

 Key: YARN-717
 URL: https://issues.apache.org/jira/browse/YARN-717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-717.1.patch, YARN-717.2.patch, YARN-717.3.patch


 This is separated from YARN-711,as after changing yarn.api.token from 
 interface to abstract class, eg: ClientTokenPBImpl has to extend two classes: 
 both TokenPBImpl and ClientToken abstract class, which is not allowed in JAVA.
 We may remove the ClientToken/ContainerToken/DelegationToken interface and 
 just use the common Token interface 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable


[ 
https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671954#comment-13671954
 ] 

Vinod Kumar Vavilapalli commented on YARN-713:
--

That, and we should do a sweep of YARN RM (and NM too?) to see where else we 
depend on DNS..

 ResourceManager can exit unexpectedly if DNS is unavailable
 ---

 Key: YARN-713
 URL: https://issues.apache.org/jira/browse/YARN-713
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Priority: Critical
 Attachments: YARN-713.patch


 As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could 
 lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and 
 that ultimately would cause the RM to exit.  The RM should not exit during 
 DNS hiccups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-733) TestNMClient fails occasionally

2013-05-31 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-733:
-

Attachment: YARN-733.2.patch

Fix the javadoc and refactor the test. Thank Omkar and Vinod for your review.

 TestNMClient fails occasionally
 ---

 Key: YARN-733
 URL: https://issues.apache.org/jira/browse/YARN-733
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-733.1.patch, YARN-733.2.patch


 The problem happens at:
 {code}
 // getContainerStatus can be called after stopContainer
 try {
   ContainerStatus status = nmClient.getContainerStatus(
   container.getId(), container.getNodeId(),
   container.getContainerToken());
   assertEquals(container.getId(), status.getContainerId());
   assertEquals(ContainerState.RUNNING, status.getState());
   assertTrue( + i, status.getDiagnostics().contains(
   Container killed by the ApplicationMaster.));
   assertEquals(-1000, status.getExitStatus());
 } catch (YarnRemoteException e) {
   fail(Exception is not expected);
 }
 {code}
 NMClientImpl#stopContainer returns, but container hasn't been stopped 
 immediately. ContainerManangerImpl implements stopContainer in async style. 
 Therefore, the container's status is in transition. 
 NMClientImpl#getContainerStatus immediately after stopContainer will get 
 either the RUNNING status or the COMPLETE one.
 There will be the similar problem wrt NMClientImpl#startContainer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-733) TestNMClient fails occasionally