Hi Flavio,

On Fri, Sep 3, 2010 at 3:02 PM, Flavio Junqueira <f...@yahoo-inc.com> wrote:

> Vishal, 60-80 seconds is definitely high, and I would expect people to
> complain if they were observing such an amount of time to recover. I
> personally haven't seen any such a case.
>

Can you describe how you were trying to reproduce the bug? On physical
machines, it took me 15 retries (reboot -n leader) to reproduce the problem.
On VMs it is lot more frequent.


>
> On my end, you have good points, but I'm not entirely convinced that we
> need changes as you're proposing them. Seeing a patch would definitely help
> to determine. If you can't provide a patch due to legal issue, we should
> work on one or more to fix at least some of the issues you observed.
>
>
You are right, my fixes may not be the best approach. My intention was to
have a quick fix for our internal use and then start-off a discussion for
real fix. I will send out the diff soon.


> I also agree that it would be nice to have the numbers you are requesting.
> I would love to see
>

> Thanks,
> -Flavio
>
>
Thanks.
-Vishal


> On Sep 3, 2010, at 8:51 PM, Vishal K wrote:
>
> Hi Mahadev,
>
> To be honest, yes, we need the quick fix. I am really surprised why anyone
> else is not seeing this problem. There is nothing special with our setup.
> If
> you look at the JIRA, I have posted logs from various setups (different OS,
> using physical machines, using virtual machines, etc). Also, the bug is
> evident from the code. Pretty much every developer in our team has hit this
> bug.
>
> Now, we have an application that is highly time-sensitive. Maybe most of
> the
> applications that ZK is running on today can tolerate a 60-80 seconds of
> FLE
> convergence. For us such a long delays (under normal conidtions) are not
> acceptable.
>
> It will be nice if people can provide some feedback on how time sensitive
> their application is? Is 60-80 seconds delay in FLE acceptable?
> What has been your experience with running ZK in production? How often do
> you have leader reboots?
>
> Feedback will be greatly apprecaited.
>
> Thanks.
> -Vishal
>
>
>
> On Fri, Sep 3, 2010 at 1:44 PM, Mahadev Konar <maha...@yahoo-inc.com>
> wrote:
>
> Hi Vishal,
>
> Thanks for picking this up. My comments are inline:
>
>
>
> On 9/2/10 3:31 PM, "Vishal K" <vishalm...@gmail.com> wrote:
>
>
> Hi All,
>
>
> I had posted this message as a comment for ZOOKEEPER-822. I thought it
>
> might
>
> be a good idea to give a wider attention so that it will be easier to
>
> collect feedback.
>
>
> I found few problems in the FLE implementation while debugging for:
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-822. Following the email
>
> below might require some background. If necessary, please browse the
>
> JIRA. I
>
> have a patch for 1. a) and 2). I will send them out soon.
>
>
> 1. Blocking connects and accepts:
>
>
> a) The first problem is in manager.toSend(). This invokes connectOne(),
>
> which does a blocking connect. While testing, I changed the code so that
>
> connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run()
>
> does a socketChannel.connect(). After starting AsyncConnect, connectOne
>
> starts a timer. connectOne continues with normal operations if the
>
> connection is established before the timer expires, otherwise, when the
>
> timer expires it interrupts AsyncConnect() thread and returns. In this
>
> way,
>
> I can have an upper bound on the amount of time we need to wait for
>
> connect
>
> to succeed. Of course, this was a quick fix for my testing. Ideally, we
>
> should use Selector to do non-blocking connects/accepts. I am planning to
>
> do
>
> that later once we at least have a quick fix for the problem and
>
> consensus
>
> from others for the real fix (this problem is big blocker for us). Note
>
> that
>
> it is OK to do blocking IO in SenderWorker and RecvWorker threads since
>
> they
>
> block IO to the respective peer.
>
> Vishal, I am really concerned about starting up new threads in the server.
>
> We really need a total revamp of this code (using NIO and selector). Is the
>
> quick fix really required. Zookeeper servers have been running in
>
> production
>
> for a while, and this problem hasn't been noticed by anyone. Shouldn't we
>
> fix it with NIO then?
>
>
>
>
> b) The blocking IO problem is not just restricted to connectOne(), but
>
> also
>
> in receiveConnection(). The Listener thread calls receiveConnection() for
>
> each incoming connection request. receiveConnection does blocking IO to
>
> get
>
> peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to
>
> the
>
> peer that had sent the connection request. All of this is happening from
>
> the
>
> Listener. In short, if a peer fails after initiating a connection, the
>
> Listener thread won't be able to accept connections from other peers,
>
> because it would be stuck in read() or connetOne(). Also the code has an
>
> inherent cycle. initiateConnection() and receiveConnection() will have to
>
> be
>
> very carefully synchronized otherwise, we could run into deadlocks. This
>
> code is going to be difficult to maintain/modify.
>
>
>
> 2. Buggy senderWorkerMap handling:
>
> The code that manages senderWorkerMap is very buggy. It is causing
>
> multiple
>
> election rounds. While debugging I found that sometimes after FLE a node
>
> will have its sendWorkerMap empty even if it has SenderWorker and
>
> RecvWorker
>
> threads for each peer.
>
> IT would be great to clean it up!! I'd be happy to see this class be
>
> cleaned
>
> up! :)
>
>
>
> a) The receiveConnection() method calls the finish() method, which
>
> removes
>
> an entry from the map. Additionally, the thread itself calls finish()
>
> which
>
> could remove the newly added entry from the map. In short,
>
> receiveConnection
>
> is causing the exact condition that you mentioned above.
>
>
> b) Apart from the bug in finish(), receiveConnection is making an entry
>
> in
>
> senderWorkerMap at the wrong place. Here's the buggy code:
>
> SendWorker vsw = senderWorkerMap.get(sid);
>
> senderWorkerMap.put(sid, sw);
>
> if(vsw != null)
>
> vsw.finish();
>
> It makes an entry for the new thread and then calls finish, which causes
>
> the
>
> new thread to be removed from the Map. The old thread will also get
>
> terminated since finish() will interrupt the thread.
>
>
> 3. Race condition in receiveConnection and initiateConnection:
>
>
> *In theory*, two peers can keep disconnecting each other's connection.
>
>
> Example:
>
> T0: Peer 0 initiates a connection (request 1)
>
>                                     T1: Peer 1 receives connection from
>
> peer 0
>
>                                     T2: Peer 1 calls receiveConnection()
>
> T2: Peer 0 closes connection to Peer 1 because its ID is lower.
>
> T3: Peer 0 re-initiates connection to Peer 1 from manger.toSend()
>
> (request
>
> 2)
>
>                                     T3: Peer 1 terminates older
>
> connection
>
> to peer 0
>
>                                     T4: Peer 1 calls connectOne() which
>
> starts new sendWorker threads for peer 0
>
>                                     T5: Peer 1 kills connection created
>
> in
>
> T3 because it receives another (request 2) connect request from 0
>
>
> The problem here is that while Peer 0 is accepting a connection from Peer
>
> 1
>
> it can also be initiating a connection to Peer 1. So if they hit the
>
> right
>
> frequencies they could sit in a connect/disconnect loop and cause
>
> multiple
>
> rounds of leader election.
>
>
> I think the cause here is again blocking connects()/accepts(). A peer
>
> starts
>
> to take action (to kill existing threads and start new threads) as soon
>
> as a
>
> connection is established at the* *TCP level. That is, it does not give
>
> us
>
> any control to synchronized connect and accepts. We could use
>
> non-blocking
>
> connects and accepts. This will allow us to a) tell a thread to not
>
> initiate
>
> a connection because the listener is about to accept a connection from
>
> the
>
> remote peer (use isAcceptable() and isConnectable()methods of
>
> SelectionKey)
>
> and b) prevent a thread from initiating multiple connect request to the
>
> same
>
> peer. It will simplify synchronization.
>
>
> Any thoughts?
>
>
> I am all for cleaning up the code :).
>
>
> Thanks
>
> mahadev
>
> -Vishal
>
>
>
>
>
> *flavio*
> *junqueira*
>
> research scientist
>
> f...@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>
>

Reply via email to