Tsuyoshi OZAWA commented on YARN-1778:

[~zxu], sorry for confusing you. I meant that we should make the period of 
retry configurable - it's hard-coded as 5000msec for now.

The problem for me is the retry in DFSOutputStream#completeFile doesn't work. 

How about making the count of retry bigger in FileSystemRMStateStore in 
startInternal? I think it works well.

My patch will work better with retry at both high layer(new code) and low 
layer(old code) because it retry in FileSystemRMStateStore#writeFile, if any 
exception happen, it will overwrite the file and redo everything.

What kind of failure are you thinking about? I think retrying completeFile here 
is more straightforward and simple solution.

> TestFSRMStateStore fails on trunk
> ---------------------------------
>                 Key: YARN-1778
>                 URL: https://issues.apache.org/jira/browse/YARN-1778
>             Project: Hadoop YARN
>          Issue Type: Test
>            Reporter: Xuan Gong
>            Assignee: zhihai xu
>         Attachments: YARN-1778.000.patch

This message was sent by Atlassian JIRA

Reply via email to