[
https://issues.apache.org/jira/browse/YARN-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319725#comment-14319725
]
Tsuyoshi OZAWA commented on YARN-1778:
--------------------------------------
[~zxu], sorry for confusing you. I meant that we should make the period of
retry configurable - it's hard-coded as 5000msec for now.
{code}
The problem for me is the retry in DFSOutputStream#completeFile doesn't work.
{code}
How about making the count of retry bigger in FileSystemRMStateStore in
startInternal? I think it works well.
{code}
My patch will work better with retry at both high layer(new code) and low
layer(old code) because it retry in FileSystemRMStateStore#writeFile, if any
exception happen, it will overwrite the file and redo everything.
{code}
What kind of failure are you thinking about? I think retrying completeFile here
is more straightforward and simple solution.
> TestFSRMStateStore fails on trunk
> ---------------------------------
>
> Key: YARN-1778
> URL: https://issues.apache.org/jira/browse/YARN-1778
> Project: Hadoop YARN
> Issue Type: Test
> Reporter: Xuan Gong
> Assignee: zhihai xu
> Attachments: YARN-1778.000.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)