Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9227#discussion_r43096557
  
    --- Diff: 
network/common/src/main/java/org/apache/spark/network/server/TransportChannelHandler.java
 ---
    @@ -109,17 +109,20 @@ public void channelRead0(ChannelHandlerContext ctx, 
Message request) {
       public void userEventTriggered(ChannelHandlerContext ctx, Object evt) 
throws Exception {
         if (evt instanceof IdleStateEvent) {
           IdleStateEvent e = (IdleStateEvent) evt;
    -      // See class comment for timeout semantics. In addition to ensuring 
we only timeout while
    -      // there are outstanding requests, we also do a secondary 
consistency check to ensure
    -      // there's no race between the idle timeout and incrementing the 
numOutstandingRequests.
    -      boolean hasInFlightRequests = 
responseHandler.numOutstandingRequests() > 0;
    +      // While an IdleStateEvent has been triggered, we can close idle 
connection
    +      // because it has no read/write events for requestTimeoutNs.
           boolean isActuallyOverdue =
             System.nanoTime() - responseHandler.getTimeOfLastRequestNs() > 
requestTimeoutNs;
    -      if (e.state() == IdleState.ALL_IDLE && hasInFlightRequests && 
isActuallyOverdue) {
    -        String address = NettyUtils.getRemoteAddress(ctx.channel());
    -        logger.error("Connection to {} has been quiet for {} ms while 
there are outstanding " +
    -          "requests. Assuming connection is dead; please adjust 
spark.network.timeout if this " +
    -          "is wrong.", address, requestTimeoutNs / 1000 / 1000);
    +      if (e.state() == IdleState.ALL_IDLE && isActuallyOverdue) {
    +        // In addition to ensuring we only timeout while there are 
outstanding requests, we also
    --- End diff --
    
    Not sure what's going on here. You're not fixing a race, because there's no 
synchronization anywhere. And now you're closing the connection anytime the 
connection is idle, regardless of whether there are outstanding requests. This 
is not good for the netty-based RpcEnv implementation, which does not like when 
its sockets are closed (e.g. executors will die and things like that).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to