[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798711#comment-13798711
 ] 

Omkar Vinit Joshi commented on YARN-1185:
-----------------------------------------

Thanks [~vinodkv] and [~jianhe].

bq. Can you please rip apart TestRMStateStore into two tests (files) - 
TestFileSystemRMStateStore and TestZKRMStateStore but use common code?
done.
bq. Also, to indicate corruption, instead of .tmp file, we can try to a 
state-store write with a partial record and try to recover from that.
I am already doing this.
bq. The test case may also better to assert in the end that the corrupted 
application/attempt is not loaded back in RMState and doesn't exist in 
FileSystem
Done.

Attaching a new patch.

> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> -------------------------------------------------------------------------------
>
>                 Key: YARN-1185
>                 URL: https://issues.apache.org/jira/browse/YARN-1185
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Jason Lowe
>            Assignee: Omkar Vinit Joshi
>         Attachments: YARN-1185.1.patch, YARN-1185.2.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to