Re: Serious problem processing hearbeat on login stampede

Benjamin Reed Thu, 14 Apr 2011 10:59:41 -0700

chang,

if the problem is on client startup, then it isn't the heartbeat
stamped, it is session establishment. the heartbeats are very light
weight, so i can't imagine them causing any issues.


the two key issues we need to know are: 1) the version of the server
you are running, and 2) if you are using a dedicated device for the
transaction log.

ben

2011/4/14 Patrick Hunt <ph...@apache.org>:
> 2011/4/14 Chang Song <tru64...@me.com>:
>>> 2) regarding IO, if you run 'iostat -x 2' on the zk servers while your
>>> issue is happening, what's the %util of the disk? what's the iowait
>>> look like?
>>>
>>
>> Again, no I/O at all.   0%
>>
>
> This is simply not possible.
>
> Sessions are persistent. Each time a session is created, and each time
> it is closed, a transaction is written by the zk server to the data
> directory. Additionally log4j based logs are also being streamed to
> the disk. Each of these activities will cause disk IO that will show
> up on iostat.
>
>> Patrick. They are not continuously login/logout.
>> Maybe a couple of times a week. and before they push new feature.
>> When this happens, clients in group A drops out of clusters, which causes
>> problem to other unrelated services.
>>
>
> Ok, good to know.
>
>>
>> It is not about use case, because ZK clients simply tried to connect to
>> ZK ensemble. No use case applies. Just many clients login at the
>> same time or expires at the same time or close session at the same time.
>>
>
> As I mentioned, I've seen cluster sizes of 10,000 clients (10x what
> you report) that didn't have this issue. While bugs might be lurking,
> I've also worked with many teams deploying clusters (probably close to
> 100 by now), some of which had problems, the suggestions I'm making to
> you are based on that experience.
>
>> Heartbeats should be handled in an isolated queue and a
>> dedicated thread.  I don't think we need strict ordering keeping
>> of heartbeats, do we?
>
> ZK is purposely architected this way, it is not a mistake/bug. It is a
> falicy for a highly available service to respond quickly to a
> heartbeat when it cannot service regular requests in a timely fashion.
> This is one of the main reasons why heartbeats are handled in this
> way.
>
> Patrick
>
>>> Patrick
>>>
>>>> It's about CommitProcessor thread queueing (in leader).
>>>> QueuedRequests goes up to 800, so does commitedRequests and
>>>> PendingRequestElapsedTime. PendingRequestElapsedTime
>>>> goes up to 8.8 seconds during this flood.
>>>>
>>>> To exactly reproduce this scenario, easiest way is to
>>>>
>>>> - suspend All JVM client with debugger
>>>> - Cause all client JVM OOME to create heap dump
>>>>
>>>> in group B. All clients in group A will not be able to receive
>>>> ping response in 5 seconds.
>>>>
>>>> We need to fix this as soon as possible.
>>>> What we do as a workaround is to raise sessionTimeout to 40 sec.
>>>> At least clients in Group A survives. But this increases
>>>> our cluster failover time significantly.
>>>>
>>>> Thank you, Patrick.
>>>>
>>>>
>>>> ps. We actually push ping request to FinalRequestProcessor as soon
>>>>     as the packet identifies itself as ping. No dice.
>>>>
>>>>
>>>>
>>>> 2011. 4. 14., 오전 12:21, Patrick Hunt 작성:
>>>>
>>>>> Hi Chang, it sounds like you may have an issue with your cluster
>>>>> environment/setup, or perhaps a resource (GC/mem) issue. Have you
>>>>> looked through the troubleshooting guide?
>>>>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting
>>>>>
>>>>> In particular 1000 clients connecting should be fine, I've personally
>>>>> seen clusters of 7-10 thousand clients. Keep in mind that each session
>>>>> establishment is essentially a write (so the quorum in involved) and
>>>>> what we typically see there is that the cluster configuration has
>>>>> issues. 14 seconds for a ping response is huge and indicates one of
>>>>> the following may be an underlying cause:
>>>>>
>>>>> 1) are you running in a virtualized environment?
>>>>> 2) are you co-locating other services on the same host(s) that make up
>>>>> the ZK serving cluster?
>>>>> 3) have you followed the admin guide's "things to avoid"?
>>>>> http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_commonProblems
>>>>> In particular ensuring that you are not swapping or going into gc
>>>>> pause (both on the server and the client)
>>>>> a) try turning on GC logging and ensure that you are not going into GC
>>>>> pause, see the troubleshooting guide, this is the most common cause of
>>>>> high latency for the clients
>>>>> b) ensure that you are not swapping
>>>>> c) ensure that other processes are not causing log writing
>>>>> (transactional logging) to be slow.
>>>>>
>>>>> Patrick
>>>>>
>>>>> On Wed, Apr 13, 2011 at 6:35 AM, Chang Song <tru64...@me.com> wrote:
>>>>>> Hello, folks.
>>>>>>
>>>>>> We have ran into a very serious issue with Zookeeper.
>>>>>> Here's a brief scenario.
>>>>>>
>>>>>> We have some Zookeeper clients with session timeout of 15 sec (thus 5 
>>>>>> sec ping), let's called
>>>>>> these clients, group A.
>>>>>>
>>>>>> Now 1000 new clients (let's call these, group B) starts up at the same 
>>>>>> time trying to
>>>>>> connect to a three-node ZK ensemble, creating ZK createSession stampede.
>>>>>>
>>>>>> Now almost all clients in group A is not able to exchange ping within 
>>>>>> session expire time (15 sec).
>>>>>> Thus clients in group A drops out of the cluster.
>>>>>>
>>>>>> We have looked into this issue a bit, found mostly synchronous nature of 
>>>>>> session queue processing.
>>>>>> Latency between ping request and response ranges from 10ms up to 14 
>>>>>> seconds during this login stampede.
>>>>>>
>>>>>> Since session timeout is serious matter for our cluster, thus ping 
>>>>>> should be done in psuedo realtime fashion.
>>>>>>
>>>>>> I don't know exactly how these ping timeout policy in clients and 
>>>>>> server, but failure to receive ping
>>>>>> response in clients due to zookeeper login session seem very nonsense to 
>>>>>> me.
>>>>>>
>>>>>> Shouldn't we have a separate ping/heartbeat queue and thread?
>>>>>> Or even multiple ping queues/threads to keep realtime heartbeat?
>>>>>>
>>>>>> THis is very serious issue with Zookeeper for our mission-critical 
>>>>>> system. Could anyone
>>>>>> look into this?
>>>>>>
>>>>>> I will try to file a bug.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> Chang
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
>

Re: Serious problem processing hearbeat on login stampede

Reply via email to