[jira] [Commented] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService

2014-12-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258990#comment-14258990
 ] 

Junping Du commented on YARN-2993:
--

The test failure and findbugs warning should be unrelated.
+1. The patch looks good to me. Will commit it soon.

 Several fixes (missing acl check, error log msg ...) and some refinement in 
 AdminService
 

 Key: YARN-2993
 URL: https://issues.apache.org/jira/browse/YARN-2993
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2993.001.patch


 This JIRA is to resolve following issues in 
 {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}:
 *1.* There is no ACLs check for {{refreshServiceAcls}}
 *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can 
 not refresh Admin ACLs. instead of ... Can not refresh user-groups.
 *3.* some unnecessary header import.
 *4.* {code}
 if (!isRMActive()) {
   RMAuditLogger.logFailure(user.getShortUserName(), argName,
   adminAcl.toString(), AdminService,
   ResourceManager is not active. Can not remove labels.);
   throwStandbyException();
 }
 {code}
 is common in lots of methods, just the message is different, we should refine 
 it into one common method.
 *5.* {code}
 LOG.info(Exception remove labels, ioe);
 RMAuditLogger.logFailure(user.getShortUserName(), argName,
 adminAcl.toString(), AdminService, Exception remove label);
 throw RPCUtil.getRemoteException(ioe);
 {code}
 is common in lots of methods, just the message is different, we should refine 
 it into one common method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService

2014-12-26 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2993:
-
Hadoop Flags: Reviewed

 Several fixes (missing acl check, error log msg ...) and some refinement in 
 AdminService
 

 Key: YARN-2993
 URL: https://issues.apache.org/jira/browse/YARN-2993
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2993.001.patch


 This JIRA is to resolve following issues in 
 {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}:
 *1.* There is no ACLs check for {{refreshServiceAcls}}
 *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can 
 not refresh Admin ACLs. instead of ... Can not refresh user-groups.
 *3.* some unnecessary header import.
 *4.* {code}
 if (!isRMActive()) {
   RMAuditLogger.logFailure(user.getShortUserName(), argName,
   adminAcl.toString(), AdminService,
   ResourceManager is not active. Can not remove labels.);
   throwStandbyException();
 }
 {code}
 is common in lots of methods, just the message is different, we should refine 
 it into one common method.
 *5.* {code}
 LOG.info(Exception remove labels, ioe);
 RMAuditLogger.logFailure(user.getShortUserName(), argName,
 adminAcl.toString(), AdminService, Exception remove label);
 throw RPCUtil.getRemoteException(ioe);
 {code}
 is common in lots of methods, just the message is different, we should refine 
 it into one common method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService

2014-12-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259105#comment-14259105
 ] 

Hudson commented on YARN-2993:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6791 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6791/])
YARN-2993. Several fixes (missing acl check, error log msg ...) and some 
refinement in AdminService. (Contributed by Yi Liu) (junping_du: rev 
40ee4bff65b2bfdabfd16ee7d9be3382a0476565)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java


 Several fixes (missing acl check, error log msg ...) and some refinement in 
 AdminService
 

 Key: YARN-2993
 URL: https://issues.apache.org/jira/browse/YARN-2993
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2993.001.patch


 This JIRA is to resolve following issues in 
 {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}:
 *1.* There is no ACLs check for {{refreshServiceAcls}}
 *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can 
 not refresh Admin ACLs. instead of ... Can not refresh user-groups.
 *3.* some unnecessary header import.
 *4.* {code}
 if (!isRMActive()) {
   RMAuditLogger.logFailure(user.getShortUserName(), argName,
   adminAcl.toString(), AdminService,
   ResourceManager is not active. Can not remove labels.);
   throwStandbyException();
 }
 {code}
 is common in lots of methods, just the message is different, we should refine 
 it into one common method.
 *5.* {code}
 LOG.info(Exception remove labels, ioe);
 RMAuditLogger.logFailure(user.getShortUserName(), argName,
 adminAcl.toString(), AdminService, Exception remove label);
 throw RPCUtil.getRemoteException(ioe);
 {code}
 is common in lots of methods, just the message is different, we should refine 
 it into one common method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-26 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259191#comment-14259191
 ] 

Jian He commented on YARN-2958:
---

[~varun_saxena], thanks for working on this!

I think we can remove the latestSequenceNumber arg for 
{{RMStateStore#updateRMDelegationTokenAndSequenceNumber}} and 
{{RMStateStore#updateRMDelegationTokenAndSequenceNumberInternal}} and also fix 
all underling stores to not update seq number.  And fix the method name to be 
updateRMDelegationToken only. 

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2994) Document work-preserving RM restart

2014-12-26 Thread Jian He (JIRA)
Jian He created YARN-2994:
-

 Summary: Document work-preserving RM restart
 Key: YARN-2994
 URL: https://issues.apache.org/jira/browse/YARN-2994
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2994) Document work-preserving RM restart

2014-12-26 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2994:
--
Attachment: YARN-2994.1.patch

Updated the doc to include work-preserving RM restart

 Document work-preserving RM restart
 ---

 Key: YARN-2994
 URL: https://issues.apache.org/jira/browse/YARN-2994
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2994.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2994) Document work-preserving RM restart

2014-12-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259213#comment-14259213
 ] 

Hadoop QA commented on YARN-2994:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689195/YARN-2994.1.patch
  against trunk revision 40ee4bf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6194//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6194//console

This message is automatically generated.

 Document work-preserving RM restart
 ---

 Key: YARN-2994
 URL: https://issues.apache.org/jira/browse/YARN-2994
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2994.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2992) ZKRMStateStore crashes due to session expiry

2014-12-26 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259223#comment-14259223
 ] 

Jian He commented on YARN-2992:
---

lgtm, 
bq. I have observed that RM exits while starting if ZK is not available
I think we have retry built in for this scenario ?

 ZKRMStateStore crashes due to session expiry
 

 Key: YARN-2992
 URL: https://issues.apache.org/jira/browse/YARN-2992
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2992-1.patch


 We recently saw the RM crash with the following stacktrace. On session 
 expiry, we should gracefully transition to standby. 
 {noformat}
 2014-12-18 06:28:42,689 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired 
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) 
 at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) 
 at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687)
  
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2992) ZKRMStateStore crashes due to session expiry

2014-12-26 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259243#comment-14259243
 ] 

Jian He commented on YARN-2992:
---

one question: do we need to create a new zkClient object by calling 
createConnection, or is it OK to re-use the old one ?

 ZKRMStateStore crashes due to session expiry
 

 Key: YARN-2992
 URL: https://issues.apache.org/jira/browse/YARN-2992
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2992-1.patch


 We recently saw the RM crash with the following stacktrace. On session 
 expiry, we should gracefully transition to standby. 
 {noformat}
 2014-12-18 06:28:42,689 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired 
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) 
 at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) 
 at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687)
  
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

2014-12-26 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259245#comment-14259245
 ] 

Jian He commented on YARN-2936:
---

[~varun_saxena], thanks for taking on this !

Maybe a simply way is to do this: 
{code}
  public YARNDelegationTokenIdentifierProto getProto() {
builder.setOwner(getOwner().toString());
builder.setRenewer(getRenewer().toString());
builder.setRealUser(getRealUser().toString());
builder.setIssueDate(getIssueDate());
builder.setMaxDate(getMaxDate());
builder.setSequenceNumber(getSequenceNumber());
builder.setMasterKeyId(getMasterKeyId());
return builder.build();
  }
{code}
and create a common method for these setters 

 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Attachments: YARN-2936.001.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-26 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2958:
---
Attachment: YARN-2958.003.patch

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-26 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2958:
---
Attachment: (was: YARN-2958.003.patch)

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-26 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259266#comment-14259266
 ] 

Varun Saxena commented on YARN-2958:


Thanks [~jianhe] for the review. I think latestSequenceNumber is not required 
even in storeRMDT operation. Will change and upload a new patch.

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-26 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2958:
---
Attachment: YARN-2958.003.patch

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-26 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259269#comment-14259269
 ] 

Jian He commented on YARN-2958:
---

Thanks [~varun_saxena], 
bq.  latestSequenceNumber is not required even in storeRMDT operation
I think it is required, as we want to persist the seq number separately. 

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-26 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259272#comment-14259272
 ] 

Varun Saxena commented on YARN-2958:


[~jianhe], I mean we can take it from 
RMDelegationTokenIdentifier#getSequenceNumber and remove latestSequenceNumber 
parameter from  RMStateStore#storeRMDelegationTokenAndSequenceNumber. We can 
persist the sequence number by taking it from RMDelegationTokenIdentifier

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-26 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259271#comment-14259271
 ] 

Varun Saxena commented on YARN-2958:


[~jianhe], I mean we can take it from 
RMDelegationTokenIdentifier#getSequenceNumber and remove latestSequenceNumber 
parameter from  RMStateStore#storeRMDelegationTokenAndSequenceNumber. We can 
persist the sequence number by taking it from RMDelegationTokenIdentifier

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-26 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259273#comment-14259273
 ] 

Jian He commented on YARN-2958:
---

I see, makes sense, thanks for your explanation! 

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2992) ZKRMStateStore crashes due to session expiry

2014-12-26 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259281#comment-14259281
 ] 

Karthik Kambatla commented on YARN-2992:


bq. one question: do we need to create a new zkClient object by calling 
createConnection, or is it OK to re-use the old one ?
Thought about it some at the time of working on the patch. We probably don't 
need the call to createConnection, as the watcher would probably go off before 
the next retry or the next. However, given the frequency of session expiries 
and lost connections, I felt it should be okay to explicitly createConnection. 
I don't think that will add a significant overhead or lead to inaccuracies.


 ZKRMStateStore crashes due to session expiry
 

 Key: YARN-2992
 URL: https://issues.apache.org/jira/browse/YARN-2992
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2992-1.patch


 We recently saw the RM crash with the following stacktrace. On session 
 expiry, we should gracefully transition to standby. 
 {noformat}
 2014-12-18 06:28:42,689 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired 
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) 
 at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) 
 at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687)
  
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2992) ZKRMStateStore crashes due to session expiry

2014-12-26 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259283#comment-14259283
 ] 

Jian He commented on YARN-2992:
---

sounds good. committing.

 ZKRMStateStore crashes due to session expiry
 

 Key: YARN-2992
 URL: https://issues.apache.org/jira/browse/YARN-2992
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2992-1.patch


 We recently saw the RM crash with the following stacktrace. On session 
 expiry, we should gracefully transition to standby. 
 {noformat}
 2014-12-18 06:28:42,689 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired 
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) 
 at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) 
 at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687)
  
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2992) ZKRMStateStore crashes due to session expiry

2014-12-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259286#comment-14259286
 ] 

Hudson commented on YARN-2992:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6792 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6792/])
YARN-2992. ZKRMStateStore crashes due to session expiry. Contributed by Karthik 
Kambatla (jianhe: rev 1454efe5d4fe4214ec5ef9142d55dbeca7dab953)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* hadoop-yarn-project/CHANGES.txt


 ZKRMStateStore crashes due to session expiry
 

 Key: YARN-2992
 URL: https://issues.apache.org/jira/browse/YARN-2992
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Fix For: 2.7.0

 Attachments: yarn-2992-1.patch


 We recently saw the RM crash with the following stacktrace. On session 
 expiry, we should gracefully transition to standby. 
 {noformat}
 2014-12-18 06:28:42,689 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired 
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) 
 at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) 
 at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687)
  
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs

2014-12-26 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2987:
---
Attachment: YARN-2987.001.patch

 ClientRMService#getQueueInfo doesn't check app ACLs
 ---

 Key: YARN-2987
 URL: https://issues.apache.org/jira/browse/YARN-2987
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2987.001.patch


 ClientRMService#getQueueInfo can return a list of applications belonging to 
 the queue, but doesn't actually check if the user has the permission to view 
 the applications.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs

2014-12-26 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2987:
---
Attachment: (was: YARN-2987.001.patch)

 ClientRMService#getQueueInfo doesn't check app ACLs
 ---

 Key: YARN-2987
 URL: https://issues.apache.org/jira/browse/YARN-2987
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena

 ClientRMService#getQueueInfo can return a list of applications belonging to 
 the queue, but doesn't actually check if the user has the permission to view 
 the applications.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs

2014-12-26 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2987:
---
Attachment: YARN-2987.001.patch

 ClientRMService#getQueueInfo doesn't check app ACLs
 ---

 Key: YARN-2987
 URL: https://issues.apache.org/jira/browse/YARN-2987
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2987.001.patch


 ClientRMService#getQueueInfo can return a list of applications belonging to 
 the queue, but doesn't actually check if the user has the permission to view 
 the applications.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259288#comment-14259288
 ] 

Hadoop QA commented on YARN-2958:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689205/YARN-2958.003.patch
  against trunk revision 40ee4bf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 15 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6195//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6195//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6195//console

This message is automatically generated.

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 

[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259289#comment-14259289
 ] 

Hadoop QA commented on YARN-2958:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689206/YARN-2958.003.patch
  against trunk revision 40ee4bf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 15 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6196//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6196//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6196//console

This message is automatically generated.

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 

[jira] [Commented] (YARN-2987) ClientRMService#getQueueInfo doesn't check app ACLs

2014-12-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259297#comment-14259297
 ] 

Hadoop QA commented on YARN-2987:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689209/YARN-2987.001.patch
  against trunk revision 1454efe.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 15 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6197//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6197//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6197//console

This message is automatically generated.

 ClientRMService#getQueueInfo doesn't check app ACLs
 ---

 Key: YARN-2987
 URL: https://issues.apache.org/jira/browse/YARN-2987
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2987.001.patch


 ClientRMService#getQueueInfo can return a list of applications belonging to 
 the queue, but doesn't actually check if the user has the permission to view 
 the applications.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2995) Enhance UI to show cluster resource utilization of various container types

2014-12-26 Thread Sriram Rao (JIRA)
Sriram Rao created YARN-2995:


 Summary: Enhance UI to show cluster resource utilization of 
various container types
 Key: YARN-2995
 URL: https://issues.apache.org/jira/browse/YARN-2995
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager
Reporter: Sriram Rao


This JIRA proposes to extend the Resource manager UI to show how cluster 
resources are being used to run *guaranteed start* and *queueable* containers.  
For example, a graph that shows over time, the fraction of  running containers 
that are *guaranteed start* and the fraction of running containers that are 
*queueable*. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2995) Enhance UI to show cluster resource utilization of various container types

2014-12-26 Thread Sriram Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriram Rao updated YARN-2995:
-
Issue Type: Sub-task  (was: Task)
Parent: YARN-2877

 Enhance UI to show cluster resource utilization of various container types
 --

 Key: YARN-2995
 URL: https://issues.apache.org/jira/browse/YARN-2995
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sriram Rao

 This JIRA proposes to extend the Resource manager UI to show how cluster 
 resources are being used to run *guaranteed start* and *queueable* 
 containers.  For example, a graph that shows over time, the fraction of  
 running containers that are *guaranteed start* and the fraction of running 
 containers that are *queueable*. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)