[jira] [Updated] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-06-04 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

闫昆 updated YARN-514:


Component/s: (was: resourcemanager)

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Fix For: 2.1.0-beta

 Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, 
 YARN-514.4.patch, YARN-514.5.patch, YARN-514.6.patch, YARN-514.7.patch, 
 YARN-514.8.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-06-04 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-514:
-

Component/s: resourcemanager

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Fix For: 2.1.0-beta

 Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, 
 YARN-514.4.patch, YARN-514.5.patch, YARN-514.6.patch, YARN-514.7.patch, 
 YARN-514.8.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-17 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-514:
-

Attachment: YARN-514.8.patch

In the newest patch, I use app directly. I checked the patch of the related 
M/R jira. It can be applied and work together with the patch in this jira.

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, 
 YARN-514.4.patch, YARN-514.5.patch, YARN-514.6.patch, YARN-514.7.patch, 
 YARN-514.8.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-16 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-514:
-

Attachment: YARN-514.6.patch

Thank @Bikas for your investigation. I've modified the code. The newest patch 
contain the following major updates:

1. FAILED = FAILED transition on RMAppEventType.APP_SAVED and KILLED = KILLED 
transition on RMAppEventType.APP_SAVED are defined. It fixes the problem 
pointed by @Bikas.

2. In addition, I found there's a problem in RMApp state transition in the RM 
restarting scenario. The stored MRApp will be recovered, an RMApp instance will 
be created, it will transit to NEW_SAVING and be stored again with the previous 
patch. To fix the  problem, isRecovered is defined in RMAppImpl, and is set 
to true when RMAppImpl#recover is called. Then, on RMAppEventType.START being 
received, NEW = NEW_SAVING if the RMApp instance is not recovered, NEW = 
SUBMITTED otherwise.

3. Addition test cases are added in TestRMAppTransitions to test the 
aforementioned transition rules.

4. TestRMRestart should have traced the problem of saving the RMApp instance 
which is recovered again.  However, it didn't failed the test case with 
previous patch because MemoryRMStateStore didn't throw exceptions when storing 
a duplicate application/attempt. Therefore, in the newest patch, 
MemoryRMStateStore will through IOException when the application/attempt has 
already been stored, which is consistent with the behavior of 
FileSystemRMStateStore. Then, the current test case of TestRMRestart can trace 
the problem of saving the RMApp instance twice.

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, 
 YARN-514.4.patch, YARN-514.5.patch, YARN-514.6.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-514:
-

Attachment: YARN-514.5.patch

I've drafted a newer patch, where YarnApplicationState, 
YarnApplicationStateProto and RMAppState (RMAppState has one more state than 
the other two) have consistent state orders:

  NEW,
  NEW_SAVING,
  SUBMITTED,
  ACCEPTED,
  RUNNING,
  FINISHED,
  FAILED,
  KILLED

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, 
 YARN-514.4.patch, YARN-514.5.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-11 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-514:
-

Attachment: YARN-514.4.patch

Fix the incorrect indents.

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, 
 YARN-514.4.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-10 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-514:
-

Attachment: YARN-514.2.patch

Update TestRMAppTransitions to avoid the bug of determining the transit SAVING 
state.

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch, YARN-514.2.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-10 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-514:
-

Attachment: YARN-514.3.patch

I've updated the patch. the major modifications are as follows:

1. SAVING is renamed as NEW_SAVING to be more clear.

2. On receiving RMAppEventType.START, RMApp transits from NEW to NEW_SAVING, 
and RMAppSavingTransition is executed, where storeApplication is invoked. On 
receiving RMAppEventType.APP_SAVED (sent from RMStateStore), RMApp transits 
from NEW_SAVING to SUBMITTED, and StartAppAttemptTransition is executed, where 
application store exception is checked before creating a new attempt. 
Therefore, the states of RMApp from SUBMITTED are just moved a step behind 
without any more changes.

3. TestRMAppTransitions has been significantly simplified. Only the transition 
related tests for the newly added state is included here.

In addition, I've done the single-node cluster test, and verified that 
application store occurs before attempt store.

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-09 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-514:
-

Attachment: YARN-514.1.patch

In this patch, I've changed RMStateStore#storeApplication from blocking API to 
non-blocking API. Therefore, it is no longer necessary to invoke the API in 
ClientRMService#submitApplication. Instead, I defined a new state, named 
SAVING, between NEW and SUBMITTED of RMApp. TestRMAppTransitions were modified 
to test the additional state transition, and to test whether the application is 
stored before SUBMITTED and removed after FINISHED.

An additional issue is that the mapping between yarn and mapreduce states needs 
to be updated due to the newly added state. This will be filed and solved in a 
separate MR jira.

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira