sure I will thank you.
Chang 2011. 4. 15., 오전 7:16, Benjamin Reed 작성: > when you file the jira can you also note the logging level you are using? > > thanx > ben > > 2011/4/14 Chang Song <[email protected]>: >> >> Yes, Ben. >> >> If you read my emails carefully, I already said it is not heartbeat, >> it is session establishment / closing gets stamped. >> Since all the requests' response gets delayed, heartbeats are delayed >> as well. >> >> >> You need to understand that most app can tolerate delay in connect/close, >> but we cannot tolerate ping delay since we are using ZK heartbeat TO >> for sole failure detection. >> We use 15 seconds (5 sec for each ensemble) >> for session timeout, important server will drop out of the clusters even >> if the server is not malfunctioning, in some cases, it wreaks havoc on >> certain >> services. >> >> >> 1. 3.3.3 (latest) >> >> 2. We have a boot disk and usr disk. >> But as I said, disk I/O is not an issue that's causing 8 second delay. >> >> My team will file JIRA today, we'll have to discuss on JIRA ;) >> >> Thank you. >> >> Chang >> >> >> >> >> 2011. 4. 15., 오전 2:59, Benjamin Reed 작성: >> >>> chang, >>> >>> if the problem is on client startup, then it isn't the heartbeat >>> stamped, it is session establishment. the heartbeats are very light >>> weight, so i can't imagine them causing any issues. >>> >>> the two key issues we need to know are: 1) the version of the server >>> you are running, and 2) if you are using a dedicated device for the >>> transaction log. >>> >>> ben >>> >>> 2011/4/14 Patrick Hunt <[email protected]>: >>>> 2011/4/14 Chang Song <[email protected]>: >>>>>> 2) regarding IO, if you run 'iostat -x 2' on the zk servers while your >>>>>> issue is happening, what's the %util of the disk? what's the iowait >>>>>> look like? >>>>>> >>>>> >>>>> Again, no I/O at all. 0% >>>>> >>>> >>>> This is simply not possible. >>>> >>>> Sessions are persistent. Each time a session is created, and each time >>>> it is closed, a transaction is written by the zk server to the data >>>> directory. Additionally log4j based logs are also being streamed to >>>> the disk. Each of these activities will cause disk IO that will show >>>> up on iostat. >>>> >>>>> Patrick. They are not continuously login/logout. >>>>> Maybe a couple of times a week. and before they push new feature. >>>>> When this happens, clients in group A drops out of clusters, which causes >>>>> problem to other unrelated services. >>>>> >>>> >>>> Ok, good to know. >>>> >>>>> >>>>> It is not about use case, because ZK clients simply tried to connect to >>>>> ZK ensemble. No use case applies. Just many clients login at the >>>>> same time or expires at the same time or close session at the same time. >>>>> >>>> >>>> As I mentioned, I've seen cluster sizes of 10,000 clients (10x what >>>> you report) that didn't have this issue. While bugs might be lurking, >>>> I've also worked with many teams deploying clusters (probably close to >>>> 100 by now), some of which had problems, the suggestions I'm making to >>>> you are based on that experience. >>>> >>>>> Heartbeats should be handled in an isolated queue and a >>>>> dedicated thread. I don't think we need strict ordering keeping >>>>> of heartbeats, do we? >>>> >>>> ZK is purposely architected this way, it is not a mistake/bug. It is a >>>> falicy for a highly available service to respond quickly to a >>>> heartbeat when it cannot service regular requests in a timely fashion. >>>> This is one of the main reasons why heartbeats are handled in this >>>> way. >>>> >>>> Patrick >>>> >>>>>> Patrick >>>>>> >>>>>>> It's about CommitProcessor thread queueing (in leader). >>>>>>> QueuedRequests goes up to 800, so does commitedRequests and >>>>>>> PendingRequestElapsedTime. PendingRequestElapsedTime >>>>>>> goes up to 8.8 seconds during this flood. >>>>>>> >>>>>>> To exactly reproduce this scenario, easiest way is to >>>>>>> >>>>>>> - suspend All JVM client with debugger >>>>>>> - Cause all client JVM OOME to create heap dump >>>>>>> >>>>>>> in group B. All clients in group A will not be able to receive >>>>>>> ping response in 5 seconds. >>>>>>> >>>>>>> We need to fix this as soon as possible. >>>>>>> What we do as a workaround is to raise sessionTimeout to 40 sec. >>>>>>> At least clients in Group A survives. But this increases >>>>>>> our cluster failover time significantly. >>>>>>> >>>>>>> Thank you, Patrick. >>>>>>> >>>>>>> >>>>>>> ps. We actually push ping request to FinalRequestProcessor as soon >>>>>>> as the packet identifies itself as ping. No dice. >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2011. 4. 14., 오전 12:21, Patrick Hunt 작성: >>>>>>> >>>>>>>> Hi Chang, it sounds like you may have an issue with your cluster >>>>>>>> environment/setup, or perhaps a resource (GC/mem) issue. Have you >>>>>>>> looked through the troubleshooting guide? >>>>>>>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting >>>>>>>> >>>>>>>> In particular 1000 clients connecting should be fine, I've personally >>>>>>>> seen clusters of 7-10 thousand clients. Keep in mind that each session >>>>>>>> establishment is essentially a write (so the quorum in involved) and >>>>>>>> what we typically see there is that the cluster configuration has >>>>>>>> issues. 14 seconds for a ping response is huge and indicates one of >>>>>>>> the following may be an underlying cause: >>>>>>>> >>>>>>>> 1) are you running in a virtualized environment? >>>>>>>> 2) are you co-locating other services on the same host(s) that make up >>>>>>>> the ZK serving cluster? >>>>>>>> 3) have you followed the admin guide's "things to avoid"? >>>>>>>> http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_commonProblems >>>>>>>> In particular ensuring that you are not swapping or going into gc >>>>>>>> pause (both on the server and the client) >>>>>>>> a) try turning on GC logging and ensure that you are not going into GC >>>>>>>> pause, see the troubleshooting guide, this is the most common cause of >>>>>>>> high latency for the clients >>>>>>>> b) ensure that you are not swapping >>>>>>>> c) ensure that other processes are not causing log writing >>>>>>>> (transactional logging) to be slow. >>>>>>>> >>>>>>>> Patrick >>>>>>>> >>>>>>>> On Wed, Apr 13, 2011 at 6:35 AM, Chang Song <[email protected]> wrote: >>>>>>>>> Hello, folks. >>>>>>>>> >>>>>>>>> We have ran into a very serious issue with Zookeeper. >>>>>>>>> Here's a brief scenario. >>>>>>>>> >>>>>>>>> We have some Zookeeper clients with session timeout of 15 sec (thus 5 >>>>>>>>> sec ping), let's called >>>>>>>>> these clients, group A. >>>>>>>>> >>>>>>>>> Now 1000 new clients (let's call these, group B) starts up at the >>>>>>>>> same time trying to >>>>>>>>> connect to a three-node ZK ensemble, creating ZK createSession >>>>>>>>> stampede. >>>>>>>>> >>>>>>>>> Now almost all clients in group A is not able to exchange ping within >>>>>>>>> session expire time (15 sec). >>>>>>>>> Thus clients in group A drops out of the cluster. >>>>>>>>> >>>>>>>>> We have looked into this issue a bit, found mostly synchronous nature >>>>>>>>> of session queue processing. >>>>>>>>> Latency between ping request and response ranges from 10ms up to 14 >>>>>>>>> seconds during this login stampede. >>>>>>>>> >>>>>>>>> Since session timeout is serious matter for our cluster, thus ping >>>>>>>>> should be done in psuedo realtime fashion. >>>>>>>>> >>>>>>>>> I don't know exactly how these ping timeout policy in clients and >>>>>>>>> server, but failure to receive ping >>>>>>>>> response in clients due to zookeeper login session seem very nonsense >>>>>>>>> to me. >>>>>>>>> >>>>>>>>> Shouldn't we have a separate ping/heartbeat queue and thread? >>>>>>>>> Or even multiple ping queues/threads to keep realtime heartbeat? >>>>>>>>> >>>>>>>>> THis is very serious issue with Zookeeper for our mission-critical >>>>>>>>> system. Could anyone >>>>>>>>> look into this? >>>>>>>>> >>>>>>>>> I will try to file a bug. >>>>>>>>> >>>>>>>>> Thank you. >>>>>>>>> >>>>>>>>> Chang >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>> >> >>
