[ https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Li Lu updated YARN-2673: ------------------------ Attachment: YARN-2673-101714.patch Hi [~zjshen], I've updated my patch according to your comments. I've also fixed a bug in the previous version: in the previous patch I confused "maxRetries" with "maxTries", and issues one less attempt in the retry filter. According to your comments: 1. Made retried, maxRetries and retryInterval \@VisibleForTesting. bq. After retried is set to true first time. It is always true, which means it's not useful for asserting the second request. This is a bug. retried should indicate if retry happened in the last jersey request. I've fixed this issue in this patch by resetting retried every time a request is launched (and the client filter is called). 2. Fixed. 3. maxRetries can be -1 to indicate there is no limit for the number of retries (described in TimelineJerseyRetryFilter). I've added a line of comment here to make it clearer (also a line in the original configuration). 4. Fixed. 5. I think you raised a very valid point. I've removed this new API. > Add retry for timeline client put APIs > -------------------------------------- > > Key: YARN-2673 > URL: https://issues.apache.org/jira/browse/YARN-2673 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Li Lu > Assignee: Li Lu > Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, > YARN-2673-101414.patch, YARN-2673-101714.patch > > > Timeline client now does not handle the case gracefully when the server is > down. Jobs from distributed shell may fail due to ATS restart. We may need to > add some retry mechanisms to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)