otterc commented on a change in pull request #30312:
URL: https://github.com/apache/spark/pull/30312#discussion_r523304450



##########
File path: 
common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java
##########
@@ -254,7 +254,7 @@ TransportClient createClient(InetSocketAddress address)
       // Disable Nagle's Algorithm since we don't want packets to wait
       .option(ChannelOption.TCP_NODELAY, true)
       .option(ChannelOption.SO_KEEPALIVE, true)
-      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, conf.connectionTimeoutMs())
+      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 
conf.connectionCreationTimeoutMs())

Review comment:
       The `connectionTimeoutMs()` has been overloaded to serve as two 
different types of timeout:
   1. It's used as a connection idle timeout. My understanding is that the 120s 
default was chosen to satisfy this timeout because in some cases the shuffle 
server can take much longer to respond to a request than usual.
   2. It is also used as connection creation timeout. The timeout to create the 
connection should not be this high. This especially becomes a problem because 
the thread which is creating a connection is also blocked for this specified 
time when it tries to connect to a bad node 
[here](https://github.com/apache/spark/blob/0046222a758fda2aead4a77bd19b19b913276693/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java#L283).
 With push-based shuffle, an executor can be running a task that fetches 
shuffle data from the same shuffle service to which another task is trying to 
push the data. Since both push and fetch share the connections to the shuffle 
server, whichever succeeds to get the connection lock 
[here](https://github.com/apache/spark/blob/0046222a758fda2aead4a77bd19b19b913276693/common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java#L198)
 will block the other one. Blocking for a short time is okay bu
 t the larger timeout causes the other to be blocked for a longer time. It 
makes these tasks fail very slow. Also fetches are retried even when connection 
fails to establish, so the higher timeout for creation increases the task 
duration. One of the intentions here is that we didn't want the push side to 
interfere in any way with the fetch side. When a task that pushes acquires the 
connection lock before a task that wants to fetch from the same shuffle server 
node, we want the push to relinquish the lock sooner. This is why the 
connection creation timeout was introduced and we change the default in the 
code to be smaller.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to