when you file the jira can you also note the logging level you are using? thanx ben
2011/4/14 Chang Song <[email protected]>: > > Yes, Ben. > > If you read my emails carefully, I already said it is not heartbeat, > it is session establishment / closing gets stamped. > Since all the requests' response gets delayed, heartbeats are delayed > as well. > > > You need to understand that most app can tolerate delay in connect/close, > but we cannot tolerate ping delay since we are using ZK heartbeat TO > for sole failure detection. > We use 15 seconds (5 sec for each ensemble) > for session timeout, important server will drop out of the clusters even > if the server is not malfunctioning, in some cases, it wreaks havoc on certain > services. > > > 1. 3.3.3 (latest) > > 2. We have a boot disk and usr disk. > But as I said, disk I/O is not an issue that's causing 8 second delay. > > My team will file JIRA today, we'll have to discuss on JIRA ;) > > Thank you. > > Chang > > > > > 2011. 4. 15., 오전 2:59, Benjamin Reed 작성: > >> chang, >> >> if the problem is on client startup, then it isn't the heartbeat >> stamped, it is session establishment. the heartbeats are very light >> weight, so i can't imagine them causing any issues. >> >> the two key issues we need to know are: 1) the version of the server >> you are running, and 2) if you are using a dedicated device for the >> transaction log. >> >> ben >> >> 2011/4/14 Patrick Hunt <[email protected]>: >>> 2011/4/14 Chang Song <[email protected]>: >>>>> 2) regarding IO, if you run 'iostat -x 2' on the zk servers while your >>>>> issue is happening, what's the %util of the disk? what's the iowait >>>>> look like? >>>>> >>>> >>>> Again, no I/O at all. 0% >>>> >>> >>> This is simply not possible. >>> >>> Sessions are persistent. Each time a session is created, and each time >>> it is closed, a transaction is written by the zk server to the data >>> directory. Additionally log4j based logs are also being streamed to >>> the disk. Each of these activities will cause disk IO that will show >>> up on iostat. >>> >>>> Patrick. They are not continuously login/logout. >>>> Maybe a couple of times a week. and before they push new feature. >>>> When this happens, clients in group A drops out of clusters, which causes >>>> problem to other unrelated services. >>>> >>> >>> Ok, good to know. >>> >>>> >>>> It is not about use case, because ZK clients simply tried to connect to >>>> ZK ensemble. No use case applies. Just many clients login at the >>>> same time or expires at the same time or close session at the same time. >>>> >>> >>> As I mentioned, I've seen cluster sizes of 10,000 clients (10x what >>> you report) that didn't have this issue. While bugs might be lurking, >>> I've also worked with many teams deploying clusters (probably close to >>> 100 by now), some of which had problems, the suggestions I'm making to >>> you are based on that experience. >>> >>>> Heartbeats should be handled in an isolated queue and a >>>> dedicated thread. I don't think we need strict ordering keeping >>>> of heartbeats, do we? >>> >>> ZK is purposely architected this way, it is not a mistake/bug. It is a >>> falicy for a highly available service to respond quickly to a >>> heartbeat when it cannot service regular requests in a timely fashion. >>> This is one of the main reasons why heartbeats are handled in this >>> way. >>> >>> Patrick >>> >>>>> Patrick >>>>> >>>>>> It's about CommitProcessor thread queueing (in leader). >>>>>> QueuedRequests goes up to 800, so does commitedRequests and >>>>>> PendingRequestElapsedTime. PendingRequestElapsedTime >>>>>> goes up to 8.8 seconds during this flood. >>>>>> >>>>>> To exactly reproduce this scenario, easiest way is to >>>>>> >>>>>> - suspend All JVM client with debugger >>>>>> - Cause all client JVM OOME to create heap dump >>>>>> >>>>>> in group B. All clients in group A will not be able to receive >>>>>> ping response in 5 seconds. >>>>>> >>>>>> We need to fix this as soon as possible. >>>>>> What we do as a workaround is to raise sessionTimeout to 40 sec. >>>>>> At least clients in Group A survives. But this increases >>>>>> our cluster failover time significantly. >>>>>> >>>>>> Thank you, Patrick. >>>>>> >>>>>> >>>>>> ps. We actually push ping request to FinalRequestProcessor as soon >>>>>> as the packet identifies itself as ping. No dice. >>>>>> >>>>>> >>>>>> >>>>>> 2011. 4. 14., 오전 12:21, Patrick Hunt 작성: >>>>>> >>>>>>> Hi Chang, it sounds like you may have an issue with your cluster >>>>>>> environment/setup, or perhaps a resource (GC/mem) issue. Have you >>>>>>> looked through the troubleshooting guide? >>>>>>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting >>>>>>> >>>>>>> In particular 1000 clients connecting should be fine, I've personally >>>>>>> seen clusters of 7-10 thousand clients. Keep in mind that each session >>>>>>> establishment is essentially a write (so the quorum in involved) and >>>>>>> what we typically see there is that the cluster configuration has >>>>>>> issues. 14 seconds for a ping response is huge and indicates one of >>>>>>> the following may be an underlying cause: >>>>>>> >>>>>>> 1) are you running in a virtualized environment? >>>>>>> 2) are you co-locating other services on the same host(s) that make up >>>>>>> the ZK serving cluster? >>>>>>> 3) have you followed the admin guide's "things to avoid"? >>>>>>> http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_commonProblems >>>>>>> In particular ensuring that you are not swapping or going into gc >>>>>>> pause (both on the server and the client) >>>>>>> a) try turning on GC logging and ensure that you are not going into GC >>>>>>> pause, see the troubleshooting guide, this is the most common cause of >>>>>>> high latency for the clients >>>>>>> b) ensure that you are not swapping >>>>>>> c) ensure that other processes are not causing log writing >>>>>>> (transactional logging) to be slow. >>>>>>> >>>>>>> Patrick >>>>>>> >>>>>>> On Wed, Apr 13, 2011 at 6:35 AM, Chang Song <[email protected]> wrote: >>>>>>>> Hello, folks. >>>>>>>> >>>>>>>> We have ran into a very serious issue with Zookeeper. >>>>>>>> Here's a brief scenario. >>>>>>>> >>>>>>>> We have some Zookeeper clients with session timeout of 15 sec (thus 5 >>>>>>>> sec ping), let's called >>>>>>>> these clients, group A. >>>>>>>> >>>>>>>> Now 1000 new clients (let's call these, group B) starts up at the same >>>>>>>> time trying to >>>>>>>> connect to a three-node ZK ensemble, creating ZK createSession >>>>>>>> stampede. >>>>>>>> >>>>>>>> Now almost all clients in group A is not able to exchange ping within >>>>>>>> session expire time (15 sec). >>>>>>>> Thus clients in group A drops out of the cluster. >>>>>>>> >>>>>>>> We have looked into this issue a bit, found mostly synchronous nature >>>>>>>> of session queue processing. >>>>>>>> Latency between ping request and response ranges from 10ms up to 14 >>>>>>>> seconds during this login stampede. >>>>>>>> >>>>>>>> Since session timeout is serious matter for our cluster, thus ping >>>>>>>> should be done in psuedo realtime fashion. >>>>>>>> >>>>>>>> I don't know exactly how these ping timeout policy in clients and >>>>>>>> server, but failure to receive ping >>>>>>>> response in clients due to zookeeper login session seem very nonsense >>>>>>>> to me. >>>>>>>> >>>>>>>> Shouldn't we have a separate ping/heartbeat queue and thread? >>>>>>>> Or even multiple ping queues/threads to keep realtime heartbeat? >>>>>>>> >>>>>>>> THis is very serious issue with Zookeeper for our mission-critical >>>>>>>> system. Could anyone >>>>>>>> look into this? >>>>>>>> >>>>>>>> I will try to file a bug. >>>>>>>> >>>>>>>> Thank you. >>>>>>>> >>>>>>>> Chang >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >>> > >
