Github user sarutak commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2019#discussion_r16631373
  
    --- Diff: core/src/main/scala/org/apache/spark/network/Connection.scala ---
    @@ -118,14 +118,33 @@ abstract class Connection(val channel: SocketChannel, 
val selector: Selector,
       }
     
       def close() {
    -    closed = true
    -    val k = key()
    -    if (k != null) {
    -      k.cancel()
    +    synchronized {
    +      /**
    +       * We should avoid executing closing sequence
    +       * double by a same thread.
    +       * Otherwise we can fail to call connectionsById.get() in
    +       * ConnectionManager#removeConnection() at the 2nd time
    +       */
    +      if (!closed) {
    +        disposeSasl()
    +
    +        /**
    +          * callOnCloseCallback() should be invoked
    +          * before k.cancel() and channel.close()
    +          * to avoid key() returns null.
    +          * If key() returns null before callOnCloseCallback(),
    +          * We cannot remove entry from connectionsByKey in 
ConnectionManager
    +          * and end up being threw CancelledKeyException.
    +          */
    +        callOnCloseCallback()
    +        val k = key()
    +        if (k != null) {
    +          k.cancel()
    +        }
    +        channel.close()
    +        closed = true
    +      }
    --- End diff --
    
    SendingConnection#close is called from 3 threads on the same instance.
    For example, 1st thread of handle-read-write-executor calls 
ReceivingConnection#close -> SendingConnection#close,  2nd thread of 
handle-read-write-executor calles SendingConnection#close and 3rd thread of 
connection-manager-thread calls ConnectionManager#run -> 
SendingConnection#close.
    
    I think, if it threw exception from any methods in close(), connection is 
not marked as closed because one of those thread is expected to close resources 
even if another thread fail to close.
    
    And synchronized block is for protect being called SendingConnection#close 
from 3 threads.
    It can be one of following situation.
    (1) One thread of handle-read-write-execuor evaluates key.cancel in 
SendingConnection#close
    (2) Then, connection-manager-thread calls removeConnection via 
callOnCloseCallback and evaluates "connectionsyKey -= connection.key". This 
should be fail because connection.key is null at this time.
    
    After (2) above, connection-manager-thread expects connectionsByKey.size != 
0 in ConnectionManager#stop but that size cannot be 0 and we get log message 
"All connections not cleaned up".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to