[GitHub] [spark] itholic commented on a diff in pull request #42399: [SPARK-44721][CONNECT] Revamp retry logic and make retries run for 10 minutes

via GitHub Wed, 16 Aug 2023 05:20:41 -0700


itholic commented on code in PR #42399:
URL: https://github.com/apache/spark/pull/42399#discussion_r1295811402



##########
python/pyspark/sql/connect/client/core.py:
##########
@@ -1630,35 +1643,25 @@ def __iter__(self) -> Generator[AttemptManager, None, 
None]:
         A generator that yields the current attempt.
         """
         retry_state = RetryState()
-        while True:
-            # Check if the operation was completed successfully.
-            if retry_state.done():
-                break
-
-            # If the number of retries have exceeded the maximum allowed 
retries.
-            if retry_state.count() > self._max_retries:
-                e = retry_state.exception()
-                if e is not None:
-                    raise e
-                else:
-                    raise PySparkRuntimeError(
-                        error_class="EXCEED_RETRY",
-                        message_parameters={},
-                    )
+        next_backoff: float = self._initial_backoff
+
+        if self._max_retries < 0:
+            raise ValueError("Can't have negative number of retries")
 
+        while not retry_state.done() and retry_state.count() <= 
self._max_retries:
             # Do backoff
             if retry_state.count() > 0:
-                backoff = random.randrange(
-                    0,
-                    int(
-                        min(
-                            self._initial_backoff * self._backoff_multiplier 
** retry_state.count(),
-                            self._max_backoff,
-                        )
-                    ),
-                )
-                logger.debug(f"Retrying call after {backoff} ms sleep")
-                # Pythons sleep takes seconds as arguments.
-                time.sleep(backoff / 1000.0)
+                # Randomize backoff for this iteration
+                backoff = next_backoff
+                next_backoff = min(self._max_backoff, next_backoff * 
self._backoff_multiplier)
+
+                if backoff >= self._min_jitter_threshold:
+                    backoff += random.uniform(0, self._jitter)
 
+                logger.debug(f"Retrying call after {backoff} ms sleep")
+                self._sleep(backoff / 1000.0)
             yield AttemptManager(self._can_retry, retry_state)
+
+        if not retry_state.done():
+            # Exceeded number of retries, throw last exception we had
+            raise retry_state.exception()

Review Comment:
   > If we don't use it here, we can also just remove it from error_classes.py, 
because this was the only place that uses it.
   
   Yeah, we should remove the `EXCEED_RETRY` from the `error_classes.py` if 
it's not used.
   
   I just want to point out that the `retry_state.exception()` can be `None` or 
`BaseException` according to the definition of `RetryState` class, so I believe 
at least we should do the nullability check regardless of what exception we 
raise. Otherwise, we may face the unexpected situation as below:
   
   ```
   >>> raise None
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   TypeError: exceptions must derive from BaseException
   ```
   
   But if you are really sure here that `retry_state.exception()` is unlikely 
to return `None`, we should change the type of `self._exception` defined in 
`RetryState` to `BaseException` instead of `Optional[BaseException]` because 
`RetryState` is only used here.
   
   So, I would suggest you choose one of two things:
   1. keep the current change as it is, remove `EXCEED_RETRY` from 
`error_classes.py`, update the type for `self._exception` of `BaseException` 
from `Optional[BaseException]` to `BaseException`
   2. checks whether `retry_state.exception()` is `None`, and raises an error 
if `None`. (If the error is a user-facing error, PySpark uses PySpark custom 
errors such as `PySparkRuntimeError` as a policy. If the error is not a 
user-facing, then we can use Python build-in exception such as `RuntimeError`)
   
   I'm find with both way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] itholic commented on a diff in pull request #42399: [SPARK-44721][CONNECT] Revamp retry logic and make retries run for 10 minutes

Reply via email to