Move blocking read/write calls to SendWorker and RecvWorker Threads -------------------------------------------------------------------
Key: ZOOKEEPER-932 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-932 Project: Zookeeper Issue Type: Sub-task Reporter: Vishal K Assignee: Vishal K Fix For: 3.4.0 Copying relevant comments: Vishal K added a comment - 02/Nov/10 02:09 PM Hi Flavio, I have a suggestion for changing the blocking IO code in QuorumCnxManager. It keeps the current code structure and requires a small amount of changes. I am not sure if these comments should go in ZOOKEEPER-901. ZOOKEEPER-901 is probably addressing netty as well. Please feel free to close this JIRA if you intend to make all the changes as a part of ZOOKEEPER-901. Basically we jusy need to move parts of initiateConnection and receiveConnection to SenderWorker and ReceiveWorker. A. Current flow for receiving connection: 1. accept connection in Listener.run() 2. receiveConnection() * Read remote server's ID * Take action based on my ID and remote server's ID (disconnect and reconnect if my ID is > remote server's ID). * kill current set of SenderWorker and ReciveWorker threads * Start a new pair B Current flow for initiating connection: 1. In connectOne(), connect if not already connected. else return. 2. send my ID to the remote server 3. if my ID < remote server disconnect and return 4. if my ID > remote server * kill current set of SenderWorker and ReceiveWorkter threads for the remote server * Start a new pair Proposed changes: Move the code that performs any blocking IO in SenderWorker and ReceiveWorker. A. Proposed flow for receiving connection: 1. accept connection in Listener.run() 2. receiveConnection() * kill current set of SenderWorker and ReciveWorker threads * Start a new pair Proposed changed to SenderWorker: * Read remote server's ID * Take action based on my ID and remote server's ID (disconnect and reconnect if my ID is > remote server's ID). * Proceed to normal operation B Proposed flow for initiating connection: 1. in connectOne(), return if already connected 2. Start a new SenderWorker and ReceiveWorker pair 2. In SenderWorker * connect to remote server * write my ID * if my ID < remote server disconnect and return (shutdown the pair). * Proceed to normal operation Questions: * In QuorumCnxManager, is it necessary to kill the current pair and restart a new one every time we receive a connect request? * In receiveConnection we may choose to reject an accepted connection if a thread in SenderWorker is in the process of connecting. Otherwise a server with ID < remote server may keep sending frequent connect request that will result in the remote server closing connections for this peer. But I think we add a delay before sending notifications, which might be good enough to prevent this problem. Let me know what you think about this. I can also help with the implementation. Flavio Junqueira added a comment - 03/Nov/10 05:28 PM Hi Vishal, I like your proposal, it seems reasonable and not difficult to implement. On your questions: 1. I don't think it is necessary to kill a pair SenderWorker/RecvWorker every time, and I'd certainly support changing it; 2. I'm not sure where you're suggesting to introduce a delay. In the FLE code, a server sends a new batch of notifications if it changes its vote or if it times out waiting for a new notification. This timeout value increases over time. I was actually thinking that we should reset the timeout value upon receiving a notification. I think this is a bug.... Given that it is your proposal, I'd be happy to let you take a stab at it and help you out if you need a hand. Does it make sense for you? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.