[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765649#comment-13765649
 ] 

Bikas Saha commented on YARN-1185:
----------------------------------

Yes. Since FileSystem interface does not provide any atomic operations.
The RM will not start if there is anything wrong with the stored state. So it 
some write is partial/empty is will not start. At that point we can judge if 
the missing piece is important or not and purge that piece and continue. This 
should be ok for job related data since we only lose a job. However, for global 
data like secret keys we may have to be more careful. In one case we encode the 
info in the file name. In other cases, where the data cannot be encoded in the 
file name, we may have to ensure that the store operation is not partial/empty. 
For HDFS we may assume atomic rename but will that be true for all filesystems?

So we could do the following. 
Storing app data may continue to be optimistic and since thats the main 
workload we continue to do what we do today.
Storing global data (mainly the security stuff) can change to be more atomic.

We can make all store operations more atomic if we feel that we will not slow 
down the RM because of multiple roundtrips to the store.

                
> FileSystemRMStateStore doesn't use temporary files when writing data
> --------------------------------------------------------------------
>
>                 Key: YARN-1185
>                 URL: https://issues.apache.org/jira/browse/YARN-1185
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Jason Lowe
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to