[ https://issues.apache.org/jira/browse/HADOOP-14828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Hung resolved HADOOP-14828. ------------------------------------ Resolution: Duplicate > RetryUpToMaximumTimeWithFixedSleep is not bounded by maximum time > ----------------------------------------------------------------- > > Key: HADOOP-14828 > URL: https://issues.apache.org/jira/browse/HADOOP-14828 > Project: Hadoop Common > Issue Type: Bug > Reporter: Jonathan Hung > > In RetryPolicies.java, RetryUpToMaximumTimeWithFixedSleep is converted to a > RetryUpToMaximumCountWithFixedSleep, whose count is the maxTime / sleepTime: > {noformat} public RetryUpToMaximumTimeWithFixedSleep(long maxTime, long > sleepTime, > TimeUnit timeUnit) { > super((int) (maxTime / sleepTime), sleepTime, timeUnit); > this.maxTime = maxTime; > this.timeUnit = timeUnit; > } > {noformat} > But if retries take a long time, then the maxTime passed to the > RetryUpToMaximumTimeWithFixedSleep is exceeded. > As an example, while doing NM restarts, we saw an issue where the NMProxy > creates a retry policy which specifies a maximum wait time of 15 minutes and > a 10 sec interval (which is converted to a MaximumCount policy with 15 min / > 10 sec = 90 tries). But each NMProxy retry policy invokes o.a.h.ipc.Client's > retry policy: {noformat} if (connectionRetryPolicy == null) { > final int max = conf.getInt( > CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_MAX_RETRIES_KEY, > > CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_MAX_RETRIES_DEFAULT); > final int retryInterval = conf.getInt( > > CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_RETRY_INTERVAL_KEY, > CommonConfigurationKeysPublic > .IPC_CLIENT_CONNECT_RETRY_INTERVAL_DEFAULT); > connectionRetryPolicy = > RetryPolicies.retryUpToMaximumCountWithFixedSleep( > max, retryInterval, TimeUnit.MILLISECONDS); > }{noformat} > So the time it takes the NMProxy to fail is actually (90 retries) * (10 sec > NMProxy interval + o.a.h.ipc.Client retry time). In the default case, ipc > client retries 10 times with a 1 sec interval, meaning the time it takes for > NMProxy to fail is (90)(10 sec + 10 sec) = 30 min instead of the 15 min > specified by NMProxy configuration. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org