Everything else not withstanding, if you see any swap space being used, you need to adjust things to prevent swapping first.
My 2 cents. On Thu, Nov 5, 2015 at 2:12 PM, Eric Newton <[email protected]> wrote: > Comments inline: > > On Thu, Nov 5, 2015 at 2:18 AM, mohit.kaushik <[email protected]> > wrote: > >> >> I have 3 node cluster ( Accumulo-1.6.3, zookeeper 3.4.6 ) which was >> working fine before I ran into this issue. whenever I start writing data >> with a batchwritter, tablet servers loses there locks one by one. I found >> in zookeeper logs repeatedly trying and closing socket connection for >> servers and log has infinite repetitions of following line. >> > > By far, the most common reason why locks are lost is due to java gc > pauses. In turn, these pauses are almost always due to memory pressure > within the entire system. The OS sees a nice big hunk of memory in the > tserver and swaps it out. Over the years we've tuned various settings to > prevent this, and other memory-hogging, but if you are pushing the system > hard, you may have to tune your existing memory settings. > > The tserver occasionally prints some gc stats in the debug log. If you see > a >30s pause between these messages, memory pressure is probably the > problem. > > >> >> 2015-11-05 12:11:23,860 [myid:3] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket >> connection from /192.168.10.124:47503 >> 2015-11-05 12:11:23,861 [myid:3] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@827] - Processing stat command from / >> 192.168.10.124:47503 >> 2015-11-05 12:11:23,869 [myid:3] - INFO >> [Thread-244:NIOServerCnxn$StatCommand@663] - Stat command output >> 2015-11-05 12:11:23,870 [myid:3] - INFO [Thread-244:NIOServerCnxn@1007] >> - Closed socket connection for client /192.168.10.124:47503 (no session >> established for client) >> > > Yes, this is quite annoying: you get these messages when the monitor grabs > the zookeeper status EVERY 5s. Your monitor is running on 192.168.10.124. > right? > > These messages are expected. > > >> I found it similar to ZOOKEEPER-832 if it is. There is one thread >> discussing on socket connection but it do not provide much help in my >> case.http://mail-archives.apache.org/mod_mbox/accumulo-user/201208.mbox/%3ccam1_12yvaxoe+kq9-qcqtpv1vegpwqvtkhn3ictifw6vq7l...@mail.gmail.com%3E >> >> There are no exceptions in tserver logs and tablet servers simply lose >> there locks. >> > > Ah, is it possible the JVM is killing itself because GC overhead is > climbing too high? You can check the .out (or .err) file for this error. > > >> I can scan data without any problem/exception. I need to know what is >> the cause of the problem and work around. Would upgrading resolve the issue >> or it needs some configuration changes. >> > > Check all your system processes. I know old versions of the SNMP servers > would leak resources, putting memory pressure on the system after a few > months. Check to see if your tserver is approximately the size you need. > If you aren't already doing it, you will want to monitor system memory/swap > usage, and see if it correlates to the lost servers. Zookeeper itself is > also subject to gc pauses, so they can die from the same cause, although > it's a much smaller process. > > > >> My current zoo.cfg is as follows. >> >> clientPort=2181 >> syncLimit=5 >> tickTime=2000 >> initLimit=10 >> maxClientCnxn=100 >> > > That's all fine, but you may want to turn on the zookeeper clean-up: > > > http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_advancedConfiguration > > > Search for "autopurge". > > >> >> I can upload full logs if anyone needs. Please do let me know if you need >> any other info. >> > > How much memory is allocated to the various processes? Do you have swap > turned on? Do you see the delay in the debug GC messages? > > You could try turning off swap, so the OS will kill your process instead > of killing itself. :-) > > -Eric > -- Josef Roehrl Senior Software Developer *PHEMI Systems* 180-887 Great Northern Way Vancouver, BC V5T 4T5 604-336-1119 Website <http://www.phemi.com/> Twitter <https://twitter.com/PHEMISystems> Linkedin <http://www.linkedin.com/company/3561810?trk=tyah&trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1>
