[ 
https://issues.apache.org/jira/browse/YARN-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319725#comment-14319725
 ] 

Tsuyoshi OZAWA commented on YARN-1778:
--------------------------------------

[~zxu], sorry for confusing you. I meant that we should make the period of 
retry configurable - it's hard-coded as 5000msec for now.

{code}
The problem for me is the retry in DFSOutputStream#completeFile doesn't work. 
{code}

How about making the count of retry bigger in FileSystemRMStateStore in 
startInternal? I think it works well.

{code}
My patch will work better with retry at both high layer(new code) and low 
layer(old code) because it retry in FileSystemRMStateStore#writeFile, if any 
exception happen, it will overwrite the file and redo everything.
{code}

What kind of failure are you thinking about? I think retrying completeFile here 
is more straightforward and simple solution.

> TestFSRMStateStore fails on trunk
> ---------------------------------
>
>                 Key: YARN-1778
>                 URL: https://issues.apache.org/jira/browse/YARN-1778
>             Project: Hadoop YARN
>          Issue Type: Test
>            Reporter: Xuan Gong
>            Assignee: zhihai xu
>         Attachments: YARN-1778.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to