[
https://issues.apache.org/jira/browse/YARN-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306866#comment-14306866
]
Tsuyoshi OZAWA commented on YARN-1778:
--------------------------------------
[~zxu] cc: [~jlowe] Thank you for the investigation.
DFSOutputStream#completeFile includes the logic to retry. It's hard-coded for
now:
{code}
if (retries == 0) {
throw new IOException("Unable to close file because the last block"
+ " does not have enough number of replicas.");
}
retries--;
Thread.sleep(localTimeout);
localTimeout *= 2;
if (Time.now() - localstart > 5000) {
DFSClient.LOG.info("Could not complete " + src + " retrying...");
}
{code}
How about making these timeouts and number of retries configurable and setting
via fs.state-store.num-retries and fs.state-store.retry-interval-ms? It's
simpler way to deal with this problem.
> TestFSRMStateStore fails on trunk
> ---------------------------------
>
> Key: YARN-1778
> URL: https://issues.apache.org/jira/browse/YARN-1778
> Project: Hadoop YARN
> Issue Type: Test
> Reporter: Xuan Gong
> Assignee: zhihai xu
> Attachments: YARN-1778.000.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)