[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs

2014-10-20 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: (was: YARN-2673-101914.patch)

> Add retry for timeline client put APIs
> --
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
> YARN-2673-101414.patch, YARN-2673-101714.patch, YARN-2673-102014.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs

2014-10-20 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-102014.patch

> Add retry for timeline client put APIs
> --
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
> YARN-2673-101414.patch, YARN-2673-101714.patch, YARN-2673-102014.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs

2014-10-19 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-101914.patch

Hi [~zjshen], thanks for your review! I addressed your comments, and rebased 
the patch with the latest trunk. If you have time please feel free to take a 
look. Thanks! 

> Add retry for timeline client put APIs
> --
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
> YARN-2673-101414.patch, YARN-2673-101714.patch, YARN-2673-101914.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs

2014-10-17 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-101714.patch

For some unknown reasons, Jenkins executed a wrong set of unit tests. Try to 
kick it again to see if the problem is temporary. 

> Add retry for timeline client put APIs
> --
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
> YARN-2673-101414.patch, YARN-2673-101714.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs

2014-10-17 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: (was: YARN-2673-101714.patch)

> Add retry for timeline client put APIs
> --
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
> YARN-2673-101414.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs

2014-10-17 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-101714.patch

Hi [~zjshen], I've updated my patch according to your comments. I've also fixed 
a bug in the previous version: in the previous patch I confused "maxRetries" 
with "maxTries", and issues one less attempt in the retry filter. 

According to your comments:

1. Made retried, maxRetries and retryInterval \@VisibleForTesting. 
bq. After retried is set to true first time. It is always true, which means 
it's not useful for asserting the second request.
This is a bug. retried should indicate if retry happened in the last jersey 
request. I've fixed this issue in this patch by resetting retried every time a 
request is launched (and the client filter is called). 

2. Fixed. 

3. maxRetries can be -1 to indicate there is no limit for the number of retries 
(described in TimelineJerseyRetryFilter). I've added a line of comment here to 
make it clearer (also a line in the original configuration). 

4. Fixed.

5. I think you raised a very valid point. I've removed this new API. 

> Add retry for timeline client put APIs
> --
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
> YARN-2673-101414.patch, YARN-2673-101714.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs

2014-10-16 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2673:
--
Summary: Add retry for timeline client put APIs  (was: Add retry for 
timeline client)

> Add retry for timeline client put APIs
> --
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
> YARN-2673-101414.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client

2014-10-14 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-101414-2.patch

> Add retry for timeline client
> -
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
> YARN-2673-101414.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client

2014-10-14 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: (was: YARN-2673-101414-2.patch)

> Add retry for timeline client
> -
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client

2014-10-14 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-101414-2.patch

Debugging the UT failure. 

> Add retry for timeline client
> -
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
> YARN-2673-101414.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client

2014-10-14 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-101414-1.patch

Address the comments from findbugs, and retry the unit test failure. Could not 
reproduce the UT failure locally. 

> Add retry for timeline client
> -
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414-1.patch, YARN-2673-101414.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client

2014-10-14 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-101414.patch

Upload a patch for this issue. TimelineClient will by default retry for a given 
amount of time before throw the exception on posting to server. There are a few 
notes:

1. Retrying vs. discarding timeline data: If we do not adding this retry, 
timeline client will drop the posted data if the first attempt has failed. Had 
a offline discussion with [~vinodkv]. We agreed that blocking the timeline 
client for a short while is better, since we may not want to drop some critical 
timeline data. 

2. Retry behavior configurations: Users can define maximum retry counts, and 
time interval between consecutive retries. We may want to have two levels of 
retry settings: a cluster global settings, managed by yarn-site.xml, and a 
per-application customize setting. For the cluster setting, I've added two 
configuration properties, yarn.timeline-service.client.max-retries (default 30) 
and yarn.timeline-service.client.retry-interval-ms (default 1000). I've also 
provide a customizeRetrySettings method for application specific retry 
settings. 

3. Retry implementation: timeline client does not use RPC, but uses RESTful 
APIs. I'm implementing retry as a jersey filter in this patch. 

4. Tests: I added two new unit tests, one to test the customizeRetrySettings 
API and the other to test if the retry has actually happened when we try to 
post  timeline entities. 

> Add retry for timeline client
> -
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-2673-101414.patch
>
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client

2014-10-09 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2673:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-1530

> Add retry for timeline client
> -
>
> Key: YARN-2673
> URL: https://issues.apache.org/jira/browse/YARN-2673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>
> Timeline client now does not handle the case gracefully when the server is 
> down. Jobs from distributed shell may fail due to ATS restart. We may need to 
> add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)