[jira] [Updated] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load

2015-04-15 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HADOOP-10597:
-
Attachment: HADOOP-10597-6.patch

Here is the updated patch with Steve's suggestion. Thanks, Steve and Arpit.

 Evaluate if we can have RPC client back off when server is under heavy load
 ---

 Key: HADOOP-10597
 URL: https://issues.apache.org/jira/browse/HADOOP-10597
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HADOOP-10597-2.patch, HADOOP-10597-3.patch, 
 HADOOP-10597-4.patch, HADOOP-10597-5.patch, HADOOP-10597-6.patch, 
 HADOOP-10597.patch, MoreRPCClientBackoffEvaluation.pdf, 
 RPCClientBackoffDesignAndEvaluation.pdf


 Currently if an application hits NN too hard, RPC requests be in blocking 
 state, assuming OS connection doesn't run out. Alternatively RPC or NN can 
 throw some well defined exception back to the client based on certain 
 policies when it is under heavy load; client will understand such exception 
 and do exponential back off, as another implementation of 
 RetryInvocationHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load

2015-04-13 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HADOOP-10597:
-
Attachment: HADOOP-10597-5.patch

Updated patch based on Arpit's suggestion of removing the server side retry 
policy.

 Evaluate if we can have RPC client back off when server is under heavy load
 ---

 Key: HADOOP-10597
 URL: https://issues.apache.org/jira/browse/HADOOP-10597
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HADOOP-10597-2.patch, HADOOP-10597-3.patch, 
 HADOOP-10597-4.patch, HADOOP-10597-5.patch, HADOOP-10597.patch, 
 MoreRPCClientBackoffEvaluation.pdf, RPCClientBackoffDesignAndEvaluation.pdf


 Currently if an application hits NN too hard, RPC requests be in blocking 
 state, assuming OS connection doesn't run out. Alternatively RPC or NN can 
 throw some well defined exception back to the client based on certain 
 policies when it is under heavy load; client will understand such exception 
 and do exponential back off, as another implementation of 
 RetryInvocationHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load

2015-01-06 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HADOOP-10597:
-
Attachment: HADOOP-10597-4.patch

Thanks, [~ste...@apache.org]! Here is the new patch with your suggestions.

Regarding the serialization of {{RetryAction}} via {{RetriableException}} 
message string, I agree it is not necessarily the best approach. Here we need 
to serialize RetryAction and have RPC server send it back to RPC client. 
Possible options that I know of:

* Current RPC Header structure {{RpcHeaderProtos}} includes Exception message 
field; thus it is convenient to use {{RetriableException}} message.
* We can consider adding optional {{RetryAction}} field into RPC header 
{{RpcHeaderProtos}}. That requires more changes.

 Evaluate if we can have RPC client back off when server is under heavy load
 ---

 Key: HADOOP-10597
 URL: https://issues.apache.org/jira/browse/HADOOP-10597
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Ming Ma
Assignee: Steve Loughran
 Attachments: HADOOP-10597-2.patch, HADOOP-10597-3.patch, 
 HADOOP-10597-4.patch, HADOOP-10597.patch, MoreRPCClientBackoffEvaluation.pdf, 
 RPCClientBackoffDesignAndEvaluation.pdf


 Currently if an application hits NN too hard, RPC requests be in blocking 
 state, assuming OS connection doesn't run out. Alternatively RPC or NN can 
 throw some well defined exception back to the client based on certain 
 policies when it is under heavy load; client will understand such exception 
 and do exponential back off, as another implementation of 
 RetryInvocationHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load

2014-12-19 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HADOOP-10597:
-
Attachment: HADOOP-10597-3.patch

Rebased for trunk and address Chris' comments. Appreciate if 
[~ste...@apache.org], [~jingzhao], [~arpitagarwal] and others have additional 
input. We have enabled this feature with FairCallQueue in one of our production 
clusters.

 Evaluate if we can have RPC client back off when server is under heavy load
 ---

 Key: HADOOP-10597
 URL: https://issues.apache.org/jira/browse/HADOOP-10597
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HADOOP-10597-2.patch, HADOOP-10597-3.patch, 
 HADOOP-10597.patch, MoreRPCClientBackoffEvaluation.pdf, 
 RPCClientBackoffDesignAndEvaluation.pdf


 Currently if an application hits NN too hard, RPC requests be in blocking 
 state, assuming OS connection doesn't run out. Alternatively RPC or NN can 
 throw some well defined exception back to the client based on certain 
 policies when it is under heavy load; client will understand such exception 
 and do exponential back off, as another implementation of 
 RetryInvocationHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load

2014-09-22 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HADOOP-10597:
-
Attachment: MoreRPCClientBackoffEvaluation.pdf

Here are some more evaluation results to compare FIFO RPC queue, FairCallQueue 
without backoff and FairCallQueue with backoff. In general, the more RPC 
connections, the more useful RPC client backoff is.

 Evaluate if we can have RPC client back off when server is under heavy load
 ---

 Key: HADOOP-10597
 URL: https://issues.apache.org/jira/browse/HADOOP-10597
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HADOOP-10597-2.patch, HADOOP-10597.patch, 
 MoreRPCClientBackoffEvaluation.pdf, RPCClientBackoffDesignAndEvaluation.pdf


 Currently if an application hits NN too hard, RPC requests be in blocking 
 state, assuming OS connection doesn't run out. Alternatively RPC or NN can 
 throw some well defined exception back to the client based on certain 
 policies when it is under heavy load; client will understand such exception 
 and do exponential back off, as another implementation of 
 RetryInvocationHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load

2014-07-04 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HADOOP-10597:
-

Attachment: HADOOP-10597-2.patch
RPCClientBackoffDesignAndEvaluation.pdf

[~lohit] provided some feedback. Here is the design document with some 
evaluation results. The updated patch also includes unit tests and make the 
server side retry policy pluggable.

 Evaluate if we can have RPC client back off when server is under heavy load
 ---

 Key: HADOOP-10597
 URL: https://issues.apache.org/jira/browse/HADOOP-10597
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HADOOP-10597-2.patch, HADOOP-10597.patch, 
 RPCClientBackoffDesignAndEvaluation.pdf


 Currently if an application hits NN too hard, RPC requests be in blocking 
 state, assuming OS connection doesn't run out. Alternatively RPC or NN can 
 throw some well defined exception back to the client based on certain 
 policies when it is under heavy load; client will understand such exception 
 and do exponential back off, as another implementation of 
 RetryInvocationHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load

2014-07-04 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HADOOP-10597:
-

Status: Patch Available  (was: Open)

 Evaluate if we can have RPC client back off when server is under heavy load
 ---

 Key: HADOOP-10597
 URL: https://issues.apache.org/jira/browse/HADOOP-10597
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HADOOP-10597-2.patch, HADOOP-10597.patch, 
 RPCClientBackoffDesignAndEvaluation.pdf


 Currently if an application hits NN too hard, RPC requests be in blocking 
 state, assuming OS connection doesn't run out. Alternatively RPC or NN can 
 throw some well defined exception back to the client based on certain 
 policies when it is under heavy load; client will understand such exception 
 and do exponential back off, as another implementation of 
 RetryInvocationHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load

2014-06-02 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HADOOP-10597:
-

Attachment: HADOOP-10597.patch

Steve, thanks for the suggestions. Yes, it is useful to have server provide 
backoff parameters to client.

Here is the initial patch to cover the basic idea. After we believe this is the 
right direction, unit test and load tests will follow.

1. Enhance RetriableException for this purpose. When RPC queue is full, 
RetriableException will be thrown back to the client with the backoff 
parameters in PB. The backoff parameters are based on exponential back off.

2. Client/RetryPolicy can decide if and how to use the server hint. It is up to 
each retry policy implementation. For NN HA RetryPolicy 
failoverOnNetworkException, it will use the server hint when it is available. 
For NN non-HA scenario, HDFS-6478 needs to be fixed first. A new policy called 
RetryUpToMaximumCountWithFixedSleepAndServerHint is provided as an example.

3. Backoff feature is turned off by default in RPC server.

 Evaluate if we can have RPC client back off when server is under heavy load
 ---

 Key: HADOOP-10597
 URL: https://issues.apache.org/jira/browse/HADOOP-10597
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Ming Ma
 Attachments: HADOOP-10597.patch


 Currently if an application hits NN too hard, RPC requests be in blocking 
 state, assuming OS connection doesn't run out. Alternatively RPC or NN can 
 throw some well defined exception back to the client based on certain 
 policies when it is under heavy load; client will understand such exception 
 and do exponential back off, as another implementation of 
 RetryInvocationHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)