Re: [PR] [SPARK-45851][CONNECT][SCALA] Support multiple policies in scala client [spark]

via GitHub Mon, 13 Nov 2023 03:37:04 -0800


juliuszsompolski commented on code in PR #43757:
URL: https://github.com/apache/spark/pull/43757#discussion_r1390996395



##########
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectStubState.scala:
##########
@@ -26,17 +26,15 @@ import org.apache.spark.internal.Logging
 // that the same stub instance is used for all requests from the same client. 
In addition,
 // this class provides access to the commonly configured retry policy and 
exception conversion
 // logic.
-class SparkConnectStubState(
-    channel: ManagedChannel,
-    val retryPolicy: GrpcRetryHandler.RetryPolicy)
+class SparkConnectStubState(channel: ManagedChannel, val retryHandler: 
GrpcRetryHandler)
     extends Logging {
 
+  def this(channel: ManagedChannel, retryPolicies: 
Seq[GrpcRetryHandler.RetryPolicy]) =
+    this(channel, new GrpcRetryHandler(retryPolicies))
+
   // Responsible to convert the GRPC Status exceptions into Spark exceptions.
   lazy val exceptionConverter: GrpcExceptionConverter = new 
GrpcExceptionConverter(channel)
 
-  // Manages the retry handler logic used by the stubs.
-  lazy val retryHandler = new GrpcRetryHandler(retryPolicy)
-

Review Comment:
   Repeating, since this thread seems to have disappeared after subsequent 
commits:
   
   You mentioned that your argument for passing retryPolicy in a constructor is 
that you don't want to create multiple.
   But, reverting these changes, there is still one retryHandler shared across 
everyone who wants it.
   There is only one SparkConnectStubState per client (that's why it got 
refactored out of BlockingStub, and Stub - so that all the stubs etc. and every 
use of it within a client share it). So it's only one retryHandler either way.
   If you have multiple clients, they have multiple retryHandlers anyway, 
because they can be created with different policies.
   
   The only place (outside testing) that creates it is in SparkConnectClient, 
uses the constructor that passes the list of policies anyway: 
https://github.com/apache/spark/pull/43757/files#diff-0c8d29ae3ad350098135b81104c0029f1c54d4ccbde5bad3ee2cf7a412ed9a59R46
   
   so the only change that changes in this files do is:
   * make retryHandler a val instead of lazy val - since the other objects in 
this state are lazy, it would be more consistent if it stayed lazy; the idea of 
lazy initialization in this Stub is that we don't want the stub to initialize 
the session on the server "lazily", only during first request. I know that 
initializing RetryHandler will not initialize anything on the server, but 
having everything be lazy here helps reason about this invariant that nothing 
should call the server in the stub until it's first used. Think about e.g. 
potentially adding a Heartbeat to keep the session alive; that you would want 
to start lazy. Better to have everything lazy to keep the convention.
   * have two constructors, and the new default one passing the 
GrpcRetryHandler is unused. Why have an unused constructor?
   
   This is why I think it would be better to have a `lazy val retryHandler` and 
only have one constructor that passes the list of policies, like it was before.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-45851][CONNECT][SCALA] Support multiple policies in scala client [spark]

Reply via email to