[GitHub] dubee commented on a change in pull request #4141: Use Stricter Retry Logic in ApacheBlockingContainerClient
dubee commented on a change in pull request #4141: Use Stricter Retry Logic in ApacheBlockingContainerClient URL: https://github.com/apache/incubator-openwhisk/pull/4141#discussion_r248720331 ## File path: common/scala/src/main/scala/org/apache/openwhisk/core/containerpool/ApacheBlockingContainerClient.scala ## @@ -149,10 +152,9 @@ protected class ApacheBlockingContainerClient(hostname: String, } match { case Success(response) => response case Failure(t: RetryableConnectionError) if retry => -val sleepTime = 50.milliseconds if (timeout > Duration.Zero) { - Thread.sleep(sleepTime.toMillis) - val newTimeout = timeout - sleepTime + Thread.sleep(50) Review comment: @sven-lange-last, added a comment denoting the units used for `Thread.sleep()`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dubee commented on a change in pull request #4141: Use Stricter Retry Logic in ApacheBlockingContainerClient
dubee commented on a change in pull request #4141: Use Stricter Retry Logic in ApacheBlockingContainerClient URL: https://github.com/apache/incubator-openwhisk/pull/4141#discussion_r247631399 ## File path: common/scala/src/main/scala/org/apache/openwhisk/core/containerpool/ApacheBlockingContainerClient.scala ## @@ -149,11 +156,13 @@ protected class ApacheBlockingContainerClient(hostname: String, } match { case Success(response) => response case Failure(t: RetryableConnectionError) if retry => +val waitTime = (Instant.now.toEpochMilli - start.toEpochMilli).milliseconds val sleepTime = 50.milliseconds if (timeout > Duration.Zero) { Thread.sleep(sleepTime.toMillis) - val newTimeout = timeout - sleepTime - execute(request, newTimeout, maxConcurrent, retry = true) + val newTimeout = timeout - sleepTime - waitTime + val newRetryCount = retryCount + 1 + execute(request, newTimeout, maxConcurrent, retry = newRetryCount < 3, retryCount = newRetryCount) Review comment: While there may be a better way to handle crashed containers, I think measuring the amount of time the connection attempt makes is useful. Currently, if a single failed connection attempt takes 5 minutes for some reason, the existing logic continues to retry only deducting 50 milliseconds from the max timeout value. Right now, only when the max timeout limit for an action is reached in 50 millisecond interval will the execute method actually fail. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dubee commented on a change in pull request #4141: Use Stricter Retry Logic in ApacheBlockingContainerClient
dubee commented on a change in pull request #4141: Use Stricter Retry Logic in ApacheBlockingContainerClient URL: https://github.com/apache/incubator-openwhisk/pull/4141#discussion_r247628388 ## File path: common/scala/src/main/scala/org/apache/openwhisk/core/containerpool/ApacheBlockingContainerClient.scala ## @@ -149,11 +156,13 @@ protected class ApacheBlockingContainerClient(hostname: String, } match { case Success(response) => response case Failure(t: RetryableConnectionError) if retry => +val waitTime = (Instant.now.toEpochMilli - start.toEpochMilli).milliseconds val sleepTime = 50.milliseconds if (timeout > Duration.Zero) { Thread.sleep(sleepTime.toMillis) - val newTimeout = timeout - sleepTime - execute(request, newTimeout, maxConcurrent, retry = true) + val newTimeout = timeout - sleepTime - waitTime + val newRetryCount = retryCount + 1 + execute(request, newTimeout, maxConcurrent, retry = newRetryCount < 3, retryCount = newRetryCount) Review comment: @markusthoemmes, I believe the container crashes before the action proxy can make a connection to the container. Are you suggesting to make the invoker search to see if the container crashed before making another connection attempt? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dubee commented on a change in pull request #4141: Use Stricter Retry Logic in ApacheBlockingContainerClient
dubee commented on a change in pull request #4141: Use Stricter Retry Logic in ApacheBlockingContainerClient URL: https://github.com/apache/incubator-openwhisk/pull/4141#discussion_r247613413 ## File path: common/scala/src/main/scala/org/apache/openwhisk/core/containerpool/ApacheBlockingContainerClient.scala ## @@ -149,11 +156,13 @@ protected class ApacheBlockingContainerClient(hostname: String, } match { case Success(response) => response case Failure(t: RetryableConnectionError) if retry => +val waitTime = (Instant.now.toEpochMilli - start.toEpochMilli).milliseconds val sleepTime = 50.milliseconds if (timeout > Duration.Zero) { Thread.sleep(sleepTime.toMillis) - val newTimeout = timeout - sleepTime - execute(request, newTimeout, maxConcurrent, retry = true) + val newTimeout = timeout - sleepTime - waitTime + val newRetryCount = retryCount + 1 + execute(request, newTimeout, maxConcurrent, retry = newRetryCount < 3, retryCount = newRetryCount) Review comment: Exit code is `-1` I believe. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dubee commented on a change in pull request #4141: Use Stricter Retry Logic in ApacheBlockingContainerClient
dubee commented on a change in pull request #4141: Use Stricter Retry Logic in ApacheBlockingContainerClient URL: https://github.com/apache/incubator-openwhisk/pull/4141#discussion_r247582645 ## File path: common/scala/src/main/scala/org/apache/openwhisk/core/containerpool/ApacheBlockingContainerClient.scala ## @@ -149,11 +156,13 @@ protected class ApacheBlockingContainerClient(hostname: String, } match { case Success(response) => response case Failure(t: RetryableConnectionError) if retry => +val waitTime = (Instant.now.toEpochMilli - start.toEpochMilli).milliseconds val sleepTime = 50.milliseconds if (timeout > Duration.Zero) { Thread.sleep(sleepTime.toMillis) - val newTimeout = timeout - sleepTime - execute(request, newTimeout, maxConcurrent, retry = true) + val newTimeout = timeout - sleepTime - waitTime + val newRetryCount = retryCount + 1 + execute(request, newTimeout, maxConcurrent, retry = newRetryCount < 3, retryCount = newRetryCount) Review comment: @markusthoemmes, three retry attempts is correct. Can use more retry attempts if desired. The idea is to just not retry for the maximum time the action is allowed to run. Certain blackbox containers might not start up correctly resulting in an invoker being rendered useless. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dubee commented on a change in pull request #4141: Use Stricter Retry Logic in ApacheBlockingContainerClient
dubee commented on a change in pull request #4141: Use Stricter Retry Logic in ApacheBlockingContainerClient URL: https://github.com/apache/incubator-openwhisk/pull/4141#discussion_r237238880 ## File path: common/scala/src/main/scala/org/apache/openwhisk/core/containerpool/ApacheBlockingContainerClient.scala ## @@ -149,11 +152,13 @@ protected class ApacheBlockingContainerClient(hostname: String, } match { case Success(response) => response case Failure(t: RetryableConnectionError) if retry => +val waitTime = (Instant.now.toEpochMilli - start.toEpochMilli).milliseconds val sleepTime = 50.milliseconds if (timeout > Duration.Zero) { Thread.sleep(sleepTime.toMillis) - val newTimeout = timeout - sleepTime - execute(request, newTimeout, maxConcurrent, retry = true) + val newTimeout = timeout - sleepTime - waitTime + val newRetryCount = retryCount + 1 + execute(request, newTimeout, maxConcurrent, retry = newRetryCount < 3 , retryCount = newRetryCount) Review comment: Three may be too low of a threshold? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services