[jira] [Updated] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-16 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1341:
-

Attachment: YARN-1341v7.patch

bq. Is above code still necessary given currentMasterKey will updated soon from 
RM registration as what we discussed above?
I originally put that in there as a best effort for the NM to try use a 
recent key if the master key is missing.  This would be relevant if we allowed 
clients to connect before registering with the RM, but that's not the case 
currently.  Updated the patch to remove that code.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch, 
 YARN-1341v7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-07-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1341:
---

Target Version/s: 2.6.0  (was: 2.5.0)

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-18 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1341:
-

Attachment: YARN-1341v6.patch

Thanks for reviewing, Junping!

bq. The change in BaseContainerTokenSecretManager.java is not necessary and I 
believe that belongs to YARN-1342.

Good catch, removed.

bq. Can we consolidate the code in a separated method together with 
NMContainerTokenSecretManager as we will do similar thing to recover 
ContainerToken staff which make code have duplicated things?

I'm not sure I understand what you're requesting.  Recovering the NM tokens is 
one line of code (3 if we count the if canRecover part), and recovering the 
container tokens in YARN-1342 will add one more line for that (inside the same 
if canRecover block).  I went ahead and factored this into a separate method, 
however I'm not sure it matches what you were expecting as I don't see where 
we're saving duplicated code.  If what's in the updated patch isn't what you 
expected, please provide some sample pseudo-code to demonstrate how we can 
avoid duplication of code.

bq. Does log error here is just enough in case of failure in store? If Master 
key is updated but not persistent, then it could cause some inconsistency when 
recover it. I think we should throw some exception here if store get failed and 
rollback the key just set.

The problem with throwing an exception is what to do with the exception -- do 
we take down the NM?  That seems like a drastic answer since the NM will likely 
chug along just fine without the key stored.  It only becomes a problem when 
the NM restarts and restores an old key.  However if we rollback the old key 
here then we take that only-breaks-if-we-happened-to-restart case and make it 
an always-breaks scenario.  Eventually the old key will no longer be valid to 
the RM, and none of the AMs will be able to authenticate to the NM.  Therefore 
I thought it would be better to log the error, press onward, and hope we don't 
restart before we store a valid key again (maybe store error was transient) 
rather than either take down the NM or have things start failing even without a 
restart.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-17 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1341:
-

Attachment: YARN-1341v5.patch

Thanks for taking a look, Junping!  I've updated the patch to trunk.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-04-30 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1341:
-

Attachment: YARN-1341v4-and-YARN-1987.patch

Updating the patch to address the DBException handling that was brought up in 
the MAPREDUCE-5652 review and applies here. Note that this now depends upon 
YARN-1987 as that provides the utility wrapper for the leveldb iterator to 
translate raw RuntimeException to the more helpful DBException so we can act 
accordingly when errors occur. The other notable change in the patch is 
renaming LevelDB to Leveldb for consistency with the existing 
LeveldbTimelineStore naming convention.

This latest patch includes the necessary pieces of YARN-1987 so it can compile 
and Jenkins can comment.


 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-04-08 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1341:
-

Attachment: YARN-1341v3.patch

Updating patch after YARN-1757 was committed.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-03-06 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1341:
-

Attachment: YARN-1341.patch

Patch to enable the recovery of NMTokens.  Like YARN-1338 it uses leveldb as a 
state store or a null state store if recovery is not enabled.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
 Attachments: YARN-1341.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-03-06 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1341:
-

Attachment: YARN-1341v2.patch

Revised patch without the addition of the state store to the NM context since 
it's not necessary for this change.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)