[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets

Vishal K (JIRA) Wed, 17 Nov 2010 07:51:40 -0800

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932995#action_12932995
 ]


Vishal K commented on ZOOKEEPER-900:
------------------------------------

Hi Flavio,

Thanks. I will take shutting down of worker threads in a separate
jira.

I am cleaning up QCM further to incorporate the change mentioned in my
comment on 02/Nov/10 02:09 PM.

I have a few more questions:

1. I have a question about the following piece of code in QCM:

       if (remoteSid == QuorumPeer.OBSERVER_ID) {
            /*
             * Choose identifier at random. We need a value to identify
             * the connection.
             */
    
            remoteSid = observerCounter--;
            LOG.info("Setting arbitrary identifier to observer: " + remoteSid);
        }


Should we allow this? The problem with this code is that if a peer
connects twice with QuorumPeer.OBSERVER_ID, we will end up creating
threads for this peer twice. This could result in redundant
SendWorker/RecvWorker threads.

I haven't used observers yet. The documentation
http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html
says that just like followers, observers should have server IDs. In
which case, why do we want to provide a wild-card?

2. Should I add a check to reject connections from peers that are not
listed in the configuration file? Currently, we are not doing any
sanity check for server IDs. I think this might fix ZOOKEEPER-851.
The fix is simple. However, I am not sure if anyone in community
is relying on this ability.

-Vishal



> FLE implementation should be improved to use non-blocking sockets
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-900
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
>             Project: Zookeeper
>          Issue Type: Bug
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Critical
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-900.patch, ZOOKEEPER-900.patch1, 
> ZOOKEEPER-900.patch2
>
>
> From earlier email exchanges:
> 1. Blocking connects and accepts:
> a) The first problem is in manager.toSend(). This invokes connectOne(), which 
> does a blocking connect. While testing, I changed the code so that 
> connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() 
> does a socketChannel.connect(). After starting AsyncConnect, connectOne 
> starts a timer. connectOne continues with normal operations if the connection 
> is established before the timer expires, otherwise, when the timer expires it 
> interrupts AsyncConnect() thread and returns. In this way, I can have an 
> upper bound on the amount of time we need to wait for connect to succeed. Of 
> course, this was a quick fix for my testing. Ideally, we should use Selector 
> to do non-blocking connects/accepts. I am planning to do that later once we 
> at least have a quick fix for the problem and consensus from others for the 
> real fix (this problem is big blocker for us). Note that it is OK to do 
> blocking IO in SenderWorker and RecvWorker threads since they block IO to the 
> respective !
 peer.
> b) The blocking IO problem is not just restricted to connectOne(), but also 
> in receiveConnection(). The Listener thread calls receiveConnection() for 
> each incoming connection request. receiveConnection does blocking IO to get 
> peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the 
> peer that had sent the connection request. All of this is happening from the 
> Listener. In short, if a peer fails after initiating a connection, the 
> Listener thread won't be able to accept connections from other peers, because 
> it would be stuck in read() or connetOne(). Also the code has an inherent 
> cycle. initiateConnection() and receiveConnection() will have to be very 
> carefully synchronized otherwise, we could run into deadlocks. This code is 
> going to be difficult to maintain/modify.
> Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets

Reply via email to