[
https://issues.apache.org/jira/browse/HDDS-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hanisha Koneru resolved HDDS-4068.
----------------------------------
Target Version/s: 0.7.0
Resolution: Fixed
> Client should not retry same OM on network connection failure
> -------------------------------------------------------------
>
> Key: HDDS-4068
> URL: https://issues.apache.org/jira/browse/HDDS-4068
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: OM HA, Ozone Client
> Reporter: Bharat Viswanadham
> Assignee: Hanisha Koneru
> Priority: Major
> Labels: pull-request-available
>
> Right now retry logic on client to OM is, it will try connect to OM1, if it
> is leader fine, else try with next OM and so on. If OM1, is down, client
> retries for 50 times when ipc.client.connect.max.retries is set to 50 and
> ipc.client.connect.retry.interval default to 1sec, so a total of 50seconds is
> spent in retry and then move to next OM.
> I think here client -> OM should have its own retry policy, in this way if
> the first OM is down, to complete request, the user does not need to wait for
> 50sec.
> As ipc.client.connect.retry.interval and ipc.client.connect.max.retries are
> common configurations for RPC, creating a new default retry policy with
> smaller values would be nice.
> {code:java}
> 20/08/06 00:21:29 INFO ipc.Client: Retrying connect to server:
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 0 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50,
> sleepTime=1000 MILLISECONDS)
> 20/08/06 00:21:30 INFO ipc.Client: Retrying connect to server:
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 1 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50,
> sleepTime=1000 MILLISECONDS)
> 20/08/06 00:21:31 INFO ipc.Client: Retrying connect to server:
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 2 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50,
> sleepTime=1000 MILLISECONDS)
> 20/08/06 00:21:32 INFO ipc.Client: Retrying connect to server:
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 3 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50,
> sleepTime=1000 MILLISECONDS)
> 20/08/06 00:21:33 INFO ipc.Client: Retrying connect to server:
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 4 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50,
> sleepTime=1000 MILLISECONDS)
> 20/08/06 00:21:34 INFO ipc.Client: Retrying connect to server:
> bv-oz-2.bv-oz.root.hwx.site/172.27.23.204:9862. Already tried 5 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50,
> sleepTime=1000 MILLISECONDS)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]