[GitHub] [spark] juliuszsompolski commented on a diff in pull request #42399: [SPARK-44721][CONNECT] Revamp retry logic and make retries run for 10 minutes

via GitHub Fri, 11 Aug 2023 07:33:28 -0700


juliuszsompolski commented on code in PR #42399:
URL: https://github.com/apache/spark/pull/42399#discussion_r1291416362



##########
python/pyspark/sql/connect/client/core.py:
##########
@@ -1630,35 +1643,25 @@ def __iter__(self) -> Generator[AttemptManager, None, 
None]:
         A generator that yields the current attempt.
         """
         retry_state = RetryState()
-        while True:
-            # Check if the operation was completed successfully.
-            if retry_state.done():
-                break
-
-            # If the number of retries have exceeded the maximum allowed 
retries.
-            if retry_state.count() > self._max_retries:
-                e = retry_state.exception()
-                if e is not None:
-                    raise e
-                else:
-                    raise PySparkRuntimeError(
-                        error_class="EXCEED_RETRY",
-                        message_parameters={},
-                    )
+        next_backoff: float = self._initial_backoff
+
+        if self._max_retries < 0:
+            raise ValueError("Can't have negative number of retries")
 
+        while not retry_state.done() and retry_state.count() <= 
self._max_retries:
             # Do backoff
             if retry_state.count() > 0:
-                backoff = random.randrange(
-                    0,
-                    int(
-                        min(
-                            self._initial_backoff * self._backoff_multiplier 
** retry_state.count(),
-                            self._max_backoff,
-                        )
-                    ),
-                )
-                logger.debug(f"Retrying call after {backoff} ms sleep")
-                # Pythons sleep takes seconds as arguments.
-                time.sleep(backoff / 1000.0)
+                # Randomize backoff for this iteration
+                backoff = next_backoff
+                next_backoff = min(self._max_backoff, next_backoff * 
self._backoff_multiplier)
+
+                if backoff >= self._min_jitter_threshold:
+                    backoff += random.uniform(0, self._jitter)
 
+                logger.debug(f"Retrying call after {backoff} ms sleep")
+                self._sleep(backoff / 1000.0)
             yield AttemptManager(self._can_retry, retry_state)
+
+        if not retry_state.done():
+            # Exceeded number of retries, throw last exception we had
+            raise retry_state.exception()

Review Comment:
   Yeah, I agree that getting it thrown can only be a result of a bug, so it's 
maybe a sanity check / IllegalStateException kind of thing. The only 
justification I see for it is that the python code is more complex and less 
localized, so the risk of there existing a bug (not necessarily now, but in the 
future if someone changes it and breaks it) is bigger that in the corresponding 
scala code. But then, should this be just an assert? There are currently 4 
non-test places in pyspark codebase that use `assert`, so there is existing 
precedent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] juliuszsompolski commented on a diff in pull request #42399: [SPARK-44721][CONNECT] Revamp retry logic and make retries run for 10 minutes

Reply via email to