[
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798711#comment-13798711
]
Omkar Vinit Joshi commented on YARN-1185:
-----------------------------------------
Thanks [~vinodkv] and [~jianhe].
bq. Can you please rip apart TestRMStateStore into two tests (files) -
TestFileSystemRMStateStore and TestZKRMStateStore but use common code?
done.
bq. Also, to indicate corruption, instead of .tmp file, we can try to a
state-store write with a partial record and try to recover from that.
I am already doing this.
bq. The test case may also better to assert in the end that the corrupted
application/attempt is not loaded back in RMState and doesn't exist in
FileSystem
Done.
Attaching a new patch.
> FileSystemRMStateStore can leave partial files that prevent subsequent
> recovery
> -------------------------------------------------------------------------------
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 2.1.0-beta
> Reporter: Jason Lowe
> Assignee: Omkar Vinit Joshi
> Attachments: YARN-1185.1.patch, YARN-1185.2.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing
> state. However if the RM were to crash in the middle of the write, the
> recovery method could encounter a partially-written file and either outright
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to
> the destination file afterwards.
--
This message was sent by Atlassian JIRA
(v6.1#6144)