Hey everyone, Did I mention that I'm a newbie to ZooKeeper and also to JAVA? :)
I enabled some JAVA GC logs via the "java.env" file: export JVMFLAGS="-Xms1024m -Xmx1024m -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime" and confirmed that the periodic latency is due to JAVA GC operations. For example, below is a 26ms delay which corresponds to a 26ms delay that my test app also saw (it uses the C API and connects to ZK remotely) and as also reported by ZK which is the only JAVA app running in the ZK cluster: 2014-02-24T10:29:51.905-0600: [GC [PSYoungGen: 275424K->12128K(305152K)] 325542K->73974K(1004544K), 0.0255720 secs] [Times: user=0.09 sys=0.00, real=0.03 secs] 2014-02-24T10:29:51.931-0600: Total time for which application threads were stopped: 0.0261350 seconds JAVA JVM tuning seems to be more of a black art than a science with respect to GC and other settings. I was wondering if anyone has any practical advice for JVM settings for the following configuration: a) ZK 3-node cluster running OpenJDK 1.7; ZK is the only app running JAVA. b) Application znode data and watches will fit into < 100MB of RAM (say 250k znodes with ~150 bytes per znode with 2 watchers per znode) Consistent and fast read / write latency - say 5ms or less - is critical for the small dataset above. I'm trying to understand if this is obtainable with ZK & JAVA. I realize that other factors come into play as well (hardware / network). Thanks in advance for any advice. On Fri, Feb 21, 2014 at 7:51 AM, jmmec <[email protected]> wrote: > Thanks Camille, I definitely understand! :) > > The two questions at the top of mind regarding ZooKeeper are: > 1. How does it calculate latencies? I can dig into its code to see. > 2. Is there anything in particular that might cause it to have the spiky > latency I've experienced? I think I ruled out the snapshot behavior by > having a high snapCount. > > Some other things I am planning to explore: > 1. My test software is rightfully suspect, so I'll review it carefully > again and will simplify it further so that it is doing the absolute bare > minimum. > 2. I'm running OpenJDK 1.7.0_60-ea so might swap to an earlier and/or > different distribution. > 3. I'm running ZooKeeper 3.4.5 and might fall back to the 3.3.6 release. > > Hopefully one of the items above will reveal the root cause. Any other > suggestions are welcome. > > > > On Thu, Feb 20, 2014 at 7:57 PM, Camille Fournier <[email protected]>wrote: > >> I might suggest that you create a personal github and mock up a >> replication >> there :) I understand employers that own your code but unless someone >> knows >> the answer off the top of their head, odds of finding the cause are low >> without something that replicates it, and knowing how busy most of us are >> here I don't know that we'll have time to do that for you. >> >> C >> >> >> On Thu, Feb 20, 2014 at 9:41 PM, jmmec <[email protected]> wrote: >> >> > Thanks again, >> > >> > Unfortunately I can't share the test code since it is technically the >> > property of my employer. >> > >> > It's very strange behavior. I think I've said that several times now. >> > ha... >> > >> > Appreciate any additional help or advice or suggestions from everyone >> and >> > anyone and their brother or sister. >> > >> > >> > >> > On Thu, Feb 20, 2014 at 8:10 PM, Camille Fournier <[email protected] >> > >wrote: >> > >> > > Can you share the test code somewhere (github maybe?)? >> > > >> > > Thanks, >> > > C >> > > >> > > >> > > On Thu, Feb 20, 2014 at 9:08 PM, jmmec <[email protected]> wrote: >> > > >> > > > Thanks for the quick reply. >> > > > >> > > > I did not try the "slow" test using a normal disk drive, however I >> > first >> > > > discovered this problem when writing to a 7200RPM disk drive at a >> much >> > > > higher messaging rate (e.g. 1500 to 3000 creates/sec rather than 84 >> > > > creates/sec). This is what caused me to start simplifying the >> > > > configuration trying to find the root cause. As part of that >> > > > investigation, I created a RAM disk to avoid the hard drive, but the >> > hard >> > > > drive wasn't the problem. I just haven't switched back to the hard >> > > drive. >> > > > >> > > > I don't know what ZooKeeper is doing internally, or how & why it is >> > > > deriving 76ms MAX latency. The very regular periodic pattern >> suggests >> > > > something odd. >> > > > >> > > > Hmmmm..... >> > > > >> > > >> > >> > >
