Move blocking read/write calls to SendWorker and RecvWorker Threads
-------------------------------------------------------------------

                 Key: ZOOKEEPER-932
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-932
             Project: Zookeeper
          Issue Type: Sub-task
            Reporter: Vishal K
            Assignee: Vishal K
             Fix For: 3.4.0


Copying relevant comments:

Vishal K added a comment - 02/Nov/10 02:09 PM
Hi Flavio,

I have a suggestion for changing the blocking IO code in QuorumCnxManager. It 
keeps the current code structure and requires a small amount of changes. I am 
not sure if these comments should go in ZOOKEEPER-901. ZOOKEEPER-901 is 
probably addressing netty as well. Please feel free to close this JIRA if you 
intend to make all the changes as a part of ZOOKEEPER-901.

Basically we jusy need to move parts of initiateConnection and 
receiveConnection to SenderWorker and ReceiveWorker.

A. Current flow for receiving connection:
1. accept connection in Listener.run()
2. receiveConnection()

    * Read remote server's ID
    * Take action based on my ID and remote server's ID (disconnect and 
reconnect if my ID is > remote server's ID).
    * kill current set of SenderWorker and ReciveWorker threads
    * Start a new pair

B Current flow for initiating connection:
1. In connectOne(), connect if not already connected. else return.
2. send my ID to the remote server
3. if my ID < remote server disconnect and return
4. if my ID > remote server

    * kill current set of SenderWorker and ReceiveWorkter threads for the 
remote server
    * Start a new pair

Proposed changes:
Move the code that performs any blocking IO in SenderWorker and ReceiveWorker.

A. Proposed flow for receiving connection:
1. accept connection in Listener.run()
2. receiveConnection()

    * kill current set of SenderWorker and ReciveWorker threads
    * Start a new pair

Proposed changed to SenderWorker:

    * Read remote server's ID
    * Take action based on my ID and remote server's ID (disconnect and 
reconnect if my ID is > remote server's ID).
    * Proceed to normal operation

B Proposed flow for initiating connection:
1. in connectOne(), return if already connected
2. Start a new SenderWorker and ReceiveWorker pair
2. In SenderWorker

    * connect to remote server
    * write my ID
    * if my ID < remote server disconnect and return (shutdown the pair).
    * Proceed to normal operation

Questions:

    * In QuorumCnxManager, is it necessary to kill the current pair and restart 
a new one every time we receive a connect request?
    * In receiveConnection we may choose to reject an accepted connection if a 
thread in
      SenderWorker is in the process of connecting. Otherwise a server with ID <
      remote server may keep sending frequent connect request that will result 
in the
      remote server closing connections for this peer. But I think we add a 
delay
      before sending notifications, which might be good enough to prevent this
      problem.

Let me know what you think about this. I can also help with the implementation.

Flavio Junqueira added a comment - 03/Nov/10 05:28 PM
Hi Vishal, I like your proposal, it seems reasonable and not difficult to 
implement.

On your questions:

   1. I don't think it is necessary to kill a pair SenderWorker/RecvWorker 
every time, and I'd certainly support changing it;
   2. I'm not sure where you're suggesting to introduce a delay. In the FLE 
code, a server sends a new batch of notifications if it changes its vote or if 
it times out waiting for a new notification. This timeout value increases over 
time. I was actually thinking that we should reset the timeout value upon 
receiving a notification. I think this is a bug....

Given that it is your proposal, I'd be happy to let you take a stab at it and 
help you out if you need a hand. Does it make sense for you?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to