[ 
https://issues.apache.org/jira/browse/YARN-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306866#comment-14306866
 ] 

Tsuyoshi OZAWA commented on YARN-1778:
--------------------------------------

[~zxu] cc: [~jlowe] Thank you for the investigation. 
DFSOutputStream#completeFile includes the logic to retry. It's hard-coded for 
now:

{code}
          if (retries == 0) {
            throw new IOException("Unable to close file because the last block"
                + " does not have enough number of replicas.");
          }
          retries--;
          Thread.sleep(localTimeout);
          localTimeout *= 2;
          if (Time.now() - localstart > 5000) {
            DFSClient.LOG.info("Could not complete " + src + " retrying...");
          }
{code}

How about making these timeouts and number of retries configurable and setting 
via fs.state-store.num-retries and fs.state-store.retry-interval-ms? It's 
simpler way to deal with this problem.

> TestFSRMStateStore fails on trunk
> ---------------------------------
>
>                 Key: YARN-1778
>                 URL: https://issues.apache.org/jira/browse/YARN-1778
>             Project: Hadoop YARN
>          Issue Type: Test
>            Reporter: Xuan Gong
>            Assignee: zhihai xu
>         Attachments: YARN-1778.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to