Re: Zookeeper leader election takes a long time.

2016-10-14 Thread Ryan Zhang
From that log, it looks like Node 3 and 2 formed a quorum (Node 1 can’t talk to 
Node2 for some reason). Even if Node 3 didn’t hear from Node 2 after 12:46, it 
is still a special case as we had a kinda network partition for about 10 mins.  
 I wonder if you can repro this in an environment that Node 2 is definitely 
down? I will take a look at the log after 12:46 when I get a chance.

> On Oct 14, 2016, at 1:40 PM, Anand Parthasarathy  
> wrote:
> 
> Hi Michael and Ryan,
> 
> Thanks for looking into the logs.
> 
> Actually, in this case, node 2 was brought down ungracefully and was down
> permanently. In this customer deployment, our product runs as a docker
> container in a bare metal environment. It is possible that the docker proxy
> that provides external connectivity to the zookeeper ports is down but the
> container itself is up and running for several more minutes. If you grep
> for "2 (n.sid)" you will see that there is no packets after 12:46.
> Convergence after that did not happen until 15:57 or so.
> 
> Thanks,
> Anand.
> 
> On Thu, Oct 13, 2016 at 5:10 PM, Michael Han  wrote:
> 
 it started a new round but then I seem to see the election messages
>> from Node 2 again. Any idea why?
>> 
>> My guess is node 2 is back online and ZK service was started. In this use
>> case node 2 does not get offline permanently, IIUC.
>> 
>> 
>> On Thu, Oct 13, 2016 at 3:41 PM, Ryan Zhang 
>> wrote:
>> 
>>> Hi, Anand, I took a look and I wonder how do you explain the following
>>> 
>>> The N1.log starts at around
>>> 2016-10-03 12:37:38,073 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:QuorumPeer@714] - LOOKING
>>> 
>>> and it failed to talk to Node 2 (id: 2)
>>> 2016-10-03 12:37:38,136 [myid:1] - WARN  [WorkerSender[myid=1]:QuorumCn
>>> xManager@382] - Cannot open channel to 2 at election address
>>> node2.controller.local/127.0.0.8:5004
>>> java.net.ConnectException: Connection refused
>>>at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSock
>>> etImpl.java:339)
>>>at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPl
>>> ainSocketImpl.java:200)
>>>at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocket
>>> Impl.java:182)
>>>at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>at java.net.Socket.connect(Socket.java:579)
>>> 
>>> However, after failed to get any election messages from Node 3, it
>>> started a new round but then I seem to see the election messages from Node
>>> 2 again. Any idea why?
>>> 
>>> 2016-10-03 12:38:32,815 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:QuorumPeer@714] - LOOKING
>>> 2016-10-03 12:38:32,816 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:FileSnap@83] - Reading snapshot /var/lib/zookeeper/version-2/s
>>> napshot.20002aa0b
>>> 2016-10-03 12:38:32,875 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:FastLeaderElection@815] - New election. My id =  1, proposed
>>> zxid=0x20002bfb5
>>> 2016-10-03 12:38:32,876 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 1
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 2
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 2
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 2
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:32,917 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 3
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:33,117 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:FastLeaderElection@849] - Notification time out: 400
>>> 2016-10-03 12:38:33,118 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 2
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:33,159 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 3
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:33,559 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:FastLeaderElection@849] - Notification 

Re: Zookeeper leader election takes a long time.

2016-10-14 Thread Anand Parthasarathy
Hi Michael and Ryan,

Thanks for looking into the logs.

Actually, in this case, node 2 was brought down ungracefully and was down
permanently. In this customer deployment, our product runs as a docker
container in a bare metal environment. It is possible that the docker proxy
that provides external connectivity to the zookeeper ports is down but the
container itself is up and running for several more minutes. If you grep
for "2 (n.sid)" you will see that there is no packets after 12:46.
Convergence after that did not happen until 15:57 or so.

Thanks,
Anand.

On Thu, Oct 13, 2016 at 5:10 PM, Michael Han  wrote:

> >> it started a new round but then I seem to see the election messages
> from Node 2 again. Any idea why?
>
> My guess is node 2 is back online and ZK service was started. In this use
> case node 2 does not get offline permanently, IIUC.
>
>
> On Thu, Oct 13, 2016 at 3:41 PM, Ryan Zhang 
> wrote:
>
>> Hi, Anand, I took a look and I wonder how do you explain the following
>>
>> The N1.log starts at around
>> 2016-10-03 12:37:38,073 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>> 5002:QuorumPeer@714] - LOOKING
>>
>> and it failed to talk to Node 2 (id: 2)
>> 2016-10-03 12:37:38,136 [myid:1] - WARN  [WorkerSender[myid=1]:QuorumCn
>> xManager@382] - Cannot open channel to 2 at election address
>> node2.controller.local/127.0.0.8:5004
>> java.net.ConnectException: Connection refused
>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>> at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSock
>> etImpl.java:339)
>> at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPl
>> ainSocketImpl.java:200)
>> at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocket
>> Impl.java:182)
>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>> at java.net.Socket.connect(Socket.java:579)
>>
>> However, after failed to get any election messages from Node 3, it
>> started a new round but then I seem to see the election messages from Node
>> 2 again. Any idea why?
>>
>> 2016-10-03 12:38:32,815 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>> 5002:QuorumPeer@714] - LOOKING
>> 2016-10-03 12:38:32,816 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>> 5002:FileSnap@83] - Reading snapshot /var/lib/zookeeper/version-2/s
>> napshot.20002aa0b
>> 2016-10-03 12:38:32,875 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>> 5002:FastLeaderElection@815] - New election. My id =  1, proposed
>> zxid=0x20002bfb5
>> 2016-10-03 12:38:32,876 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 1
>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 2
>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 2
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 2
>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:32,917 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 3
>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:33,117 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>> 5002:FastLeaderElection@849] - Notification time out: 400
>> 2016-10-03 12:38:33,118 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 2
>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:33,159 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 3
>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:33,559 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>> 5002:FastLeaderElection@849] - Notification time out: 800
>> 2016-10-03 12:38:33,560 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 2
>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:33,561 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 2
>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LEADING (n.state), 2
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:33,601 [myid:1] - INFO  

Re: outstandingChanges queue grows without bound

2016-10-14 Thread Mike Solomon
Thanks for the comments - I'll incorporate them in a future fix. There
is actually a flaw in this code as it's currently implemented - it
does not match the original behavior and I need to think more
carefully.

Arshad, I think ZOOKEEPER-2570 is a somewhat different issue.  The
root cause in both cases is that the ProcessRequestThread is
overloaded, but large multi-op transactions are probably a degenerate
case.

On Thu, Oct 13, 2016 at 1:12 PM, Edward Ribeiro
 wrote:
> Very interesting patch, Mike.
>
> I've left a couple of review comments (hope you don't mind) in the
> https://github.com/msolo/zookeeper/commit/75da352d506c2e3b0001d28acc058c
> 422b3c8f0c commit. :)
>
> Cheers,
> Eddie
>
>
> On Thu, Oct 13, 2016 at 4:06 PM, Arshad Mohammad <
> arshad.mohamma...@gmail.com> wrote:
>
>> Hi Mike
>> I also faced same issue. There is test patch in ZOOKEEPER-2570 which can be
>> used to quickly check  performance gains in each modification.  Hope it is
>> useful.
>>
>> -Arshad
>>
>> On Thu, Oct 13, 2016 at 1:27 AM, Mike Solomon  wrote:
>>
>> > I've been performance testing 3.5.2 and hit an interesting unavailability
>> > issue.
>> >
>> > When there server is very busy (64k connections, 16k writes per
>> > second) the leader can get busy enough that connections get throttled.
>> > Enough throttling causes sessions to expire. As sessions expire, the
>> > CPU consumption rises and the quorum is effectively unavailable.
>> > Interestingly, if you shut down all the clients, the quorum won't heal
>> > for nearly 10 minutes.
>> >
>> > The issue is that the outstandingChanges queue has 250k items in it
>> > and the closeSession code scans this linearly under a lock. Replacing
>> > the linear scan with a hash table lookup improves this, but likely the
>> > real solution is some backpressure on clients as a result of an
>> > oversized outstandingChanges queue.
>> >
>> > Here is a sample fix:
>> > https://github.com/msolo/zookeeper/commit/75da352d506c2e3b0001d28acc058c
>> > 422b3c8f0c
>> >
>> > This results in the quorum healing about 30 seconds after the clients
>> > disconnect.
>> >
>> > Is there a way to prevent runaway growth in this queue? I'm wondering
>> > if changing the definition of "throttling" to take into account the
>> > size of this queue might help mitigate this. The end goal is that some
>> > stable amount of traffic is reached asymptotically without suffering a
>> > collapse.
>> >
>> > Thanks,
>> > -Mike
>> >
>>


Extending session lifetime with client requests

2016-10-14 Thread Peng Li
Hi there,

Suppose a zookeeper session timeout is configured to 10 seconds. If at time
t, a client sends a getData() request to the ensemble and succeeds, can I
assume the client's zookeeper session will be live for the next (10 -
delta) seconds according to the client's local clock? In other words, will
a client request extend the lifetime of the session by another session
timeout?

The delta will account for clock skew between client and server. In
practice, the clock skew is bounded and very small (<<10 seconds) in a 10
seconds interval. Absolute clock skew doesn't matter. Only the clock rate
skew matters.

Cheers
Peng