[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs

2014-10-20 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-102014.patch

 Add retry for timeline client put APIs
 --

 Key: YARN-2673
 URL: https://issues.apache.org/jira/browse/YARN-2673
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
 YARN-2673-101414.patch, YARN-2673-101714.patch, YARN-2673-102014.patch


 Timeline client now does not handle the case gracefully when the server is 
 down. Jobs from distributed shell may fail due to ATS restart. We may need to 
 add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs

2014-10-20 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: (was: YARN-2673-101914.patch)

 Add retry for timeline client put APIs
 --

 Key: YARN-2673
 URL: https://issues.apache.org/jira/browse/YARN-2673
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
 YARN-2673-101414.patch, YARN-2673-101714.patch, YARN-2673-102014.patch


 Timeline client now does not handle the case gracefully when the server is 
 down. Jobs from distributed shell may fail due to ATS restart. We may need to 
 add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs

2014-10-19 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-101914.patch

Hi [~zjshen], thanks for your review! I addressed your comments, and rebased 
the patch with the latest trunk. If you have time please feel free to take a 
look. Thanks! 

 Add retry for timeline client put APIs
 --

 Key: YARN-2673
 URL: https://issues.apache.org/jira/browse/YARN-2673
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
 YARN-2673-101414.patch, YARN-2673-101714.patch, YARN-2673-101914.patch


 Timeline client now does not handle the case gracefully when the server is 
 down. Jobs from distributed shell may fail due to ATS restart. We may need to 
 add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs

2014-10-17 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-101714.patch

Hi [~zjshen], I've updated my patch according to your comments. I've also fixed 
a bug in the previous version: in the previous patch I confused maxRetries 
with maxTries, and issues one less attempt in the retry filter. 

According to your comments:

1. Made retried, maxRetries and retryInterval \@VisibleForTesting. 
bq. After retried is set to true first time. It is always true, which means 
it's not useful for asserting the second request.
This is a bug. retried should indicate if retry happened in the last jersey 
request. I've fixed this issue in this patch by resetting retried every time a 
request is launched (and the client filter is called). 

2. Fixed. 

3. maxRetries can be -1 to indicate there is no limit for the number of retries 
(described in TimelineJerseyRetryFilter). I've added a line of comment here to 
make it clearer (also a line in the original configuration). 

4. Fixed.

5. I think you raised a very valid point. I've removed this new API. 

 Add retry for timeline client put APIs
 --

 Key: YARN-2673
 URL: https://issues.apache.org/jira/browse/YARN-2673
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
 YARN-2673-101414.patch, YARN-2673-101714.patch


 Timeline client now does not handle the case gracefully when the server is 
 down. Jobs from distributed shell may fail due to ATS restart. We may need to 
 add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs

2014-10-17 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: (was: YARN-2673-101714.patch)

 Add retry for timeline client put APIs
 --

 Key: YARN-2673
 URL: https://issues.apache.org/jira/browse/YARN-2673
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
 YARN-2673-101414.patch


 Timeline client now does not handle the case gracefully when the server is 
 down. Jobs from distributed shell may fail due to ATS restart. We may need to 
 add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs

2014-10-17 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-101714.patch

For some unknown reasons, Jenkins executed a wrong set of unit tests. Try to 
kick it again to see if the problem is temporary. 

 Add retry for timeline client put APIs
 --

 Key: YARN-2673
 URL: https://issues.apache.org/jira/browse/YARN-2673
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
 YARN-2673-101414.patch, YARN-2673-101714.patch


 Timeline client now does not handle the case gracefully when the server is 
 down. Jobs from distributed shell may fail due to ATS restart. We may need to 
 add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client put APIs

2014-10-16 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2673:
--
Summary: Add retry for timeline client put APIs  (was: Add retry for 
timeline client)

 Add retry for timeline client put APIs
 --

 Key: YARN-2673
 URL: https://issues.apache.org/jira/browse/YARN-2673
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
 YARN-2673-101414.patch


 Timeline client now does not handle the case gracefully when the server is 
 down. Jobs from distributed shell may fail due to ATS restart. We may need to 
 add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client

2014-10-14 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-101414.patch

Upload a patch for this issue. TimelineClient will by default retry for a given 
amount of time before throw the exception on posting to server. There are a few 
notes:

1. Retrying vs. discarding timeline data: If we do not adding this retry, 
timeline client will drop the posted data if the first attempt has failed. Had 
a offline discussion with [~vinodkv]. We agreed that blocking the timeline 
client for a short while is better, since we may not want to drop some critical 
timeline data. 

2. Retry behavior configurations: Users can define maximum retry counts, and 
time interval between consecutive retries. We may want to have two levels of 
retry settings: a cluster global settings, managed by yarn-site.xml, and a 
per-application customize setting. For the cluster setting, I've added two 
configuration properties, yarn.timeline-service.client.max-retries (default 30) 
and yarn.timeline-service.client.retry-interval-ms (default 1000). I've also 
provide a customizeRetrySettings method for application specific retry 
settings. 

3. Retry implementation: timeline client does not use RPC, but uses RESTful 
APIs. I'm implementing retry as a jersey filter in this patch. 

4. Tests: I added two new unit tests, one to test the customizeRetrySettings 
API and the other to test if the retry has actually happened when we try to 
post  timeline entities. 

 Add retry for timeline client
 -

 Key: YARN-2673
 URL: https://issues.apache.org/jira/browse/YARN-2673
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2673-101414.patch


 Timeline client now does not handle the case gracefully when the server is 
 down. Jobs from distributed shell may fail due to ATS restart. We may need to 
 add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client

2014-10-14 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-101414-1.patch

Address the comments from findbugs, and retry the unit test failure. Could not 
reproduce the UT failure locally. 

 Add retry for timeline client
 -

 Key: YARN-2673
 URL: https://issues.apache.org/jira/browse/YARN-2673
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2673-101414-1.patch, YARN-2673-101414.patch


 Timeline client now does not handle the case gracefully when the server is 
 down. Jobs from distributed shell may fail due to ATS restart. We may need to 
 add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client

2014-10-14 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-101414-2.patch

Debugging the UT failure. 

 Add retry for timeline client
 -

 Key: YARN-2673
 URL: https://issues.apache.org/jira/browse/YARN-2673
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
 YARN-2673-101414.patch


 Timeline client now does not handle the case gracefully when the server is 
 down. Jobs from distributed shell may fail due to ATS restart. We may need to 
 add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client

2014-10-14 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: (was: YARN-2673-101414-2.patch)

 Add retry for timeline client
 -

 Key: YARN-2673
 URL: https://issues.apache.org/jira/browse/YARN-2673
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2673-101414-1.patch, YARN-2673-101414.patch


 Timeline client now does not handle the case gracefully when the server is 
 down. Jobs from distributed shell may fail due to ATS restart. We may need to 
 add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client

2014-10-14 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2673:

Attachment: YARN-2673-101414-2.patch

 Add retry for timeline client
 -

 Key: YARN-2673
 URL: https://issues.apache.org/jira/browse/YARN-2673
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-2673-101414-1.patch, YARN-2673-101414-2.patch, 
 YARN-2673-101414.patch


 Timeline client now does not handle the case gracefully when the server is 
 down. Jobs from distributed shell may fail due to ATS restart. We may need to 
 add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2673) Add retry for timeline client

2014-10-09 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2673:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-1530

 Add retry for timeline client
 -

 Key: YARN-2673
 URL: https://issues.apache.org/jira/browse/YARN-2673
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu

 Timeline client now does not handle the case gracefully when the server is 
 down. Jobs from distributed shell may fail due to ATS restart. We may need to 
 add some retry mechanisms to the client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)