subject:"\[jira\] \[Commented\] $YARN\-230$ Make changes for RM restart phase 1"

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-12-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535863#comment-13535863
 ] 

Hudson commented on YARN-230:
-

Integrated in Hadoop-Yarn-trunk #70 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/70/])
YARN-230. RM Restart phase 1 - includes support for saving/restarting all 
applications on an RM bounce. Contributed by Bikas Saha. (Revision 1423758)

 Result = SUCCESS
acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1423758
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptStateData.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationStateData.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationAttemptStateDataPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationStateDataPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestGetGroups.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestYarnClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreAppAttemptEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreFactory.java
*

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-12-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535937#comment-13535937
 ] 

Hudson commented on YARN-230:
-

Integrated in Hadoop-Hdfs-trunk #1259 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1259/])
YARN-230. RM Restart phase 1 - includes support for saving/restarting all 
applications on an RM bounce. Contributed by Bikas Saha. (Revision 1423758)

 Result = FAILURE
acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1423758
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptStateData.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationStateData.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationAttemptStateDataPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationStateDataPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestGetGroups.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestYarnClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreAppAttemptEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreFactory.java
*

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-12-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535634#comment-13535634
 ] 

Hudson commented on YARN-230:
-

Integrated in Hadoop-trunk-Commit #3140 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3140/])
YARN-230. RM Restart phase 1 - includes support for saving/restarting all 
applications on an RM bounce. Contributed by Bikas Saha. (Revision 1423758)

 Result = SUCCESS
acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1423758
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptStateData.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationStateData.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationAttemptStateDataPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationStateDataPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestGetGroups.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestYarnClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreAppAttemptEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreFactory.java
*

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-12-17 Thread Tom White (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534044#comment-13534044
 ] 

Tom White commented on YARN-230:


Arun, yes it looks good to me, +1. We can address any changes that come up in 
later JIRAs. 

 Make changes for RM restart phase 1
 ---

 Key: YARN-230
 URL: https://issues.apache.org/jira/browse/YARN-230
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: PB-impl.patch, Recovery.patch, Store.patch, Test.patch, 
 YARN-230.1.patch, YARN-230.4.patch, YARN-230.5.patch


 As described in YARN-128, phase 1 of RM restart puts in place mechanisms to 
 save application state and read them back after restart. Upon restart, the 
 NM's are asked to reboot and the previously running AM's are restarted.
 After this is done, RM HA and work preserving restart can continue in 
 parallel. For more details please refer to the design document in YARN-128

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-12-17 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534098#comment-13534098
 ] 

Bikas Saha commented on YARN-230:
-

Thanks guys!

 Make changes for RM restart phase 1
 ---

 Key: YARN-230
 URL: https://issues.apache.org/jira/browse/YARN-230
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: PB-impl.patch, Recovery.patch, Store.patch, Test.patch, 
 YARN-230.1.patch, YARN-230.4.patch, YARN-230.5.patch


 As described in YARN-128, phase 1 of RM restart puts in place mechanisms to 
 save application state and read them back after restart. Upon restart, the 
 NM's are asked to reboot and the previously running AM's are restarted.
 After this is done, RM HA and work preserving restart can continue in 
 parallel. For more details please refer to the design document in YARN-128

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-12-14 Thread Tom White (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532371#comment-13532371
 ] 

Tom White commented on YARN-230:


OK, in which case please make the default state store the filesystem one with 
the default URI discussed earlier.

 Make changes for RM restart phase 1
 ---

 Key: YARN-230
 URL: https://issues.apache.org/jira/browse/YARN-230
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: PB-impl.patch, Recovery.patch, Store.patch, Test.patch, 
 YARN-230.1.patch, YARN-230.4.patch, YARN-230.5.patch


 As described in YARN-128, phase 1 of RM restart puts in place mechanisms to 
 save application state and read them back after restart. Upon restart, the 
 NM's are asked to reboot and the previously running AM's are restarted.
 After this is done, RM HA and work preserving restart can continue in 
 parallel. For more details please refer to the design document in YARN-128

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-12-14 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532499#comment-13532499
 ] 

Bikas Saha commented on YARN-230:
-

Yes. Will do when I refresh the patch on YARN-231 after YARN-230 gets committed.

 Make changes for RM restart phase 1
 ---

 Key: YARN-230
 URL: https://issues.apache.org/jira/browse/YARN-230
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: PB-impl.patch, Recovery.patch, Store.patch, Test.patch, 
 YARN-230.1.patch, YARN-230.4.patch, YARN-230.5.patch


 As described in YARN-128, phase 1 of RM restart puts in place mechanisms to 
 save application state and read them back after restart. Upon restart, the 
 NM's are asked to reboot and the previously running AM's are restarted.
 After this is done, RM HA and work preserving restart can continue in 
 parallel. For more details please refer to the design document in YARN-128

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-12-14 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532597#comment-13532597
 ] 

Bikas Saha commented on YARN-230:
-

A note on testing. Among other test changes, there is a functional test that 
takes the RM through different scenarios of applications being stored, run and 
re-run as well as nodes heartbeating and reconnecting on restart. I have 
manually tested the scenarios on a single node setup with ZK and FileSystem 
implementations of the RMStateStore.
Arinto has run the code on a cluster using ZK for storage and verified that it 
works as expected.
https://issues.apache.org/jira/browse/YARN-128?focusedCommentId=13505615page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13505615


 Make changes for RM restart phase 1
 ---

 Key: YARN-230
 URL: https://issues.apache.org/jira/browse/YARN-230
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: PB-impl.patch, Recovery.patch, Store.patch, Test.patch, 
 YARN-230.1.patch, YARN-230.4.patch, YARN-230.5.patch


 As described in YARN-128, phase 1 of RM restart puts in place mechanisms to 
 save application state and read them back after restart. Upon restart, the 
 NM's are asked to reboot and the previously running AM's are restarted.
 After this is done, RM HA and work preserving restart can continue in 
 parallel. For more details please refer to the design document in YARN-128

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-12-13 Thread Tom White (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531147#comment-13531147
 ] 

Tom White commented on YARN-230:


Thanks for addressing my feedback Bikas. The NullRMStateStore is a good idea. 
With it, there is no need for yarn.resourcemanager.recovery.enabled, instead 
make the default yarn.resourcemanager.store.class the NullRMStateStore. For 
this to work NullRMStateStore's loadState method should return an unpopulated 
RMState object rather than null.



 Make changes for RM restart phase 1
 ---

 Key: YARN-230
 URL: https://issues.apache.org/jira/browse/YARN-230
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: PB-impl.patch, Recovery.patch, Store.patch, Test.patch, 
 YARN-230.1.patch, YARN-230.4.patch, YARN-230.5.patch


 As described in YARN-128, phase 1 of RM restart puts in place mechanisms to 
 save application state and read them back after restart. Upon restart, the 
 NM's are asked to reboot and the previously running AM's are restarted.
 After this is done, RM HA and work preserving restart can continue in 
 parallel. For more details please refer to the design document in YARN-128

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-12-11 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529601#comment-13529601
]

Hadoop QA commented on YARN-230:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12560499/YARN-230.5.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 20 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup

org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps

org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher
org.apache.hadoop.yarn.server.resourcemanager.TestRM
org.apache.hadoop.yarn.server.resourcemanager.TestAppManager

org.apache.hadoop.yarn.server.resourcemanager.security.TestApplicationTokens

org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId

org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization

org.apache.hadoop.yarn.server.resourcemanager.security.TestClientTokens
org.apache.hadoop.yarn.server.TestContainerManagerSecurity

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/216//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/216//console

This message is automatically generated.

Make changes for RM restart phase 1
---

Key: YARN-230
URL: https://issues.apache.org/jira/browse/YARN-230
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
Attachments: PB-impl.patch, Recovery.patch, Store.patch, Test.patch,
YARN-230.1.patch, YARN-230.4.patch, YARN-230.5.patch

As described in YARN-128, phase 1 of RM restart puts in place mechanisms to
save application state and read them back after restart. Upon restart, the
NM's are asked to reboot and the previously running AM's are restarted.
After this is done, RM HA and work preserving restart can continue in
parallel. For more details please refer to the design document in YARN-128

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-11-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507475#comment-13507475
 ] 

Hadoop QA commented on YARN-230:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/1228/YARN-230.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 20 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/201//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/201//console

This message is automatically generated.

 Make changes for RM restart phase 1
 ---

 Key: YARN-230
 URL: https://issues.apache.org/jira/browse/YARN-230
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: PB-impl.patch, Recovery.patch, Store.patch, Test.patch, 
 YARN-230.1.patch, YARN-230.4.patch


 As described in YARN-128, phase 1 of RM restart puts in place mechanisms to 
 save application state and read them back after restart. Upon restart, the 
 NM's are asked to reboot and the previously running AM's are restarted.
 After this is done, RM HA and work preserving restart can continue in 
 parallel. For more details please refer to the design document in YARN-128

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-11-28 Thread Tom White (JIRA)

[
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505427#comment-13505427
]

Tom White commented on YARN-230:

bq. But it could make sense to remove application attempts but not remove the
application, couldn't it? Say we want to remove some attempt from the saved
state before the application is done.

Let's add it when we need it then.

bq. We also need to change the AM retry default to 1. Otherwise, even with RM
restart enabled, the restarted attempts will fail because the previous AM will
delete job files. What is your suggestion for that?

I think this is where the killed/failed distinction comes in. If the app
attempt was killed (because the RM died), then the app will be retried since
the first attempt didn't count (from the point of view of
yarn.resourcemanager.am.max-retries). This should be taken care of in YARN-218
- does that sound OK to you?

Make changes for RM restart phase 1
---

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-11-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505567#comment-13505567
 ] 

Hadoop QA commented on YARN-230:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12554383/Test.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/170//console

This message is automatically generated.

 Make changes for RM restart phase 1
 ---

 Key: YARN-230
 URL: https://issues.apache.org/jira/browse/YARN-230
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: PB-impl.patch, Recovery.patch, Store.patch, Test.patch, 
 YARN-230.1.patch


 As described in YARN-128, phase 1 of RM restart puts in place mechanisms to 
 save application state and read them back after restart. Upon restart, the 
 NM's are asked to reboot and the previously running AM's are restarted.
 After this is done, RM HA and work preserving restart can continue in 
 parallel. For more details please refer to the design document in YARN-128

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-11-27 Thread Tom White (JIRA)

[
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504765#comment-13504765
]

Tom White commented on YARN-230:

{quote}
The store and remove methods have been made mirrors because it helps maintain
symmetry of operations that is logically clear. An actual implementation could
choose to remove the entire app data including attempts in removeApplication()
making removeApplicationAttempt() a no-op. So that alternative is not precluded
in the current interface while still maintaining flexibility at the interface.
{quote}

Why is this flexibility needed? I can't see why it makes sense to remove an
application and leave some application attempts around.

bq. I chose to not use directories for FileSystem because one could put a key
value store behind a FileSystem interface and I am not sure how directories
would work in them.

That's reasonable. With the orphan handling (deletion) on restart, the flat
structure you have should work fine. (However, I don't think you need the
removeApplicationAttempt() method.)

bq. One improvement would be to update the store with an attempts final state
(failed/killed/succeeded) and wait for it to be recorded before completing the
state machine.

I agree this can be done later.

bq. Could you please help by providing a good system path.

How about something like ${hadoop.tmp.dir}/yarn/system/rm-store?

Make changes for RM restart phase 1
---

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-11-27 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504775#comment-13504775
]

Bikas Saha commented on YARN-230:
-

bq. Why is this flexibility needed? I can't see why it makes sense to remove an
application and leave some application attempts around.
I agree. It does not make sense to remove application but not remove
application attempts. But it could make sense to remove application attempts
but not remove the application, couldn't it? Say we want to remove some attempt
from the saved state before the application is done.

I can update the patch with defaults for filesystem and the suggested path. On
this note, for MR jobs just enabling these defaults is not enough. We also need
to change the AM retry default to 1. Otherwise, even with RM restart enabled,
the restarted attempts will fail because the previous AM will delete job files.
What is your suggestion for that?

Make changes for RM restart phase 1
---

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-11-24 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503371#comment-13503371
]

Bikas Saha commented on YARN-230:
-

Ignoring orphaned attempts was explicitly added because of the current
implementation of deleting the app info first to act as a marker about app
completion even if attempts failed to get deleted after that. The code should
have discarded orphaned attempts from the store but I forgot about that. Good
catch! Will fix.
The store and remove methods have been made mirrors because it helps maintain
symmetry of operations that is logically clear. An actual implementation could
choose to remove the entire app data including attempts in removeApplication()
making removeApplicationAttempt() a no-op. So that alternative is not precluded
in the current interface while still maintaining flexibility at the interface.
Also, the directory implementation can still be done in which case
removeApplication() could call FS.delete(Path_to_dir) and
removeApplicationAttempt() would remove the attempt file under the app
directory or return success if the app dir has already been deleted.
I chose to not use directories for FileSystem because one could put a key value
store behind a FileSystem interface and I am not sure how directories would
work in them. Also rmdir is atomic on HDFS but may not be atomic on every file
system.
For HDFS one could certainly write a directory based file structure for store
in which apps would have their own directories. But IMO the best implementation
might be a transaction log type implementation similar to what HBase uses I
think. It might also have better HA characteristics because HDFS guarantees
single writer to a file. That however requires considerable investment of time.
It is really hard to guarantee atomicity of removal when we dont know how the
file system is implemented. We could use log structure implementations or for
HDFS we could use atomic rmdir. Also, lets look at the following scenario. We
cannot removeApplication() until we know that the AM has exited and the job is
really done. Just after the RM knows that the job is done, the RM could die
before updating state. So upon restart we can never guarantee that a completed
application was recorded as completed. This is one reason why I chose not to
make the state machine wait for removeApplication() to complete.
One improvement would be to update the store with an attempts final state
(failed/killed/succeeded) and wait for it to be recorded before completing the
state machine. This would allow us to not count killed as failed and also
complete the application state machine if the last attempt had succeeded. This
would implement the preferable solution in your second point above. This would
still be an optimization since the RM could fail before storing the attempt
state (like above) and we are back to square one. I would like to make this
change after YARN-218 is done so that all related changes can be made together.
I consciously chose to not provide defaults for the store because I think its
important that users understand and think about it when they enable a store.
And changing the config helps trigger important questions like which store
works for me, what permissions are needed etc. I you still feel strongly about
it then I could add defaults like you suggest. Could you please help by
providing a good system path. I am not quite familiar with typical rules used
to determine them.
I have explained the temporary choice of Exception on YARN-231.
I will address the remaining comments in the next patch.

Thanks for all the feedback. This is a good discussion. I am sure that there
are improvements to be made. Unless there are big issues with the current state
of the work it would be great if we can commit it and address improvements in
subsequent sub-tasks. This would help keep the changes smaller and easier to
manage. The current code refactors and places basic interface/infrastructure in
place. What do you think?

Make changes for RM restart phase 1
---

--
This message is

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-11-23 Thread Tom White (JIRA)

[
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503228#comment-13503228
]

Tom White commented on YARN-230:

Overall, this looks great. General feedback:

* Can we make application removal atomic? If the RM shuts down after a
completed application is removed from the state store, but before the app
attempts are removed from the store, then the app attempts may be orphaned.
(There's a comment about it in FileSystemRMStateStore, but no action is taken
so the attempt files will remain in the store.) It might be better to make
RMStateStore#removeApplicationState responsible for removing the app attempts
(i.e. remove removeApplicationAttemptState). This would solve the orphaning
problem, and it would also make it possible to store the app attempts in a
directory nested under the application directory, which would be nicer from a
scaling point of view, and also for someone having to debug the state on the
filesystem.
* If the RM shuts down before a (successful) completed application is removed
from the state store, will it be rerun on restart, or will the fact that a
successful app attempt was stored mean that it doesn't need to? Obviously, the
second one would be preferable.
* The exceptions thrown by the public methods of RMStateStore should be more
specific than Exception.
* Let's have a default for yarn.resourcemanager.store.class in
yarn-default.xml. StoreFactory has MemoryRMStateStore as the default, but
that's not useful when running on a cluster; FileSystemRMStateStore would be
better. Similarly it would be good to have the default location for the store
be a system directory on the default file system. With these two changes folks
would only need to set yarn.resourcemanager.recovery.enabled to true to enable
recovery. (We might make that enabled by default at some point too.)
* MemoryRMStateStore#removeApplicationState will fail if asserts are disabled:
the remove method should be called in a separate statement and assigned to a
variable which can be checked in the assert. It's worth checking if this
problem exists elsewhere.
* Naming nit: Store was renamed to RMStateStore, but so StoreFactory should be
renamed to RMStateStoreFactory.
* Naming nit: zk.rm-state-store rather than zk.rmstatestore for consistency
with other property names. Also for fs.rmstatestore, and
zk.rmstatestore.parentpath (parent-path).

Make changes for RM restart phase 1
---

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

2012-11-20 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13501213#comment-13501213
 ] 

Bikas Saha commented on YARN-230:
-

Attached patch applies on top of patch for YARN-229

 Make changes for RM restart phase 1
 ---

 Key: YARN-230
 URL: https://issues.apache.org/jira/browse/YARN-230
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: YARN-230.1.patch


 As described in YARN-128, phase 1 of RM restart puts in place mechanisms to 
 save application state and read them back after restart. Upon restart, the 
 NM's are asked to reboot and the previously running AM's are restarted.
 After this is done, RM HA and work preserving restart can continue in 
 parallel. For more details please refer to the design document in YARN-128

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

18 matches

Site Navigation

Mail list logo

Footer information