u17 was release a year and a half ago. Latest is u25 (we run u24). What kind of 'crash' are you seeing? What is your OS? St.Ack
On Mon, May 23, 2011 at 8:19 AM, Wayne <wav...@gmail.com> wrote: > Zookeeper is not on the same nodes...and yes we could up to 120 seconds but > then we are back to AWOL nodes for 118 seconds is OK which it is not. > > Bottom line is the JVM is our enemy here (as it always has been) and we had > high hopes for Todd's fix, and it is not panning out for us...yet. > > > On Mon, May 23, 2011 at 11:07 AM, Michael Segel > <michael_se...@hotmail.com>wrote: > >> >> Besides this... >> JRE version: 6.0_17-b17 >> >> Just a silly question ... >> What happens if you double the zookeeper time out to 120 seconds? >> >> Also I'm going to assume that you're not running your ZK on the same nodes >> as your data nodes, but you know what they say about assumptions... >> >> >> > From: tdunn...@maprtech.com >> > Date: Mon, 23 May 2011 07:33:05 -0700 >> > Subject: Re: mslab enabled jvm crash >> > To: user@hbase.apache.org >> > >> > Do you have the same problem with a more recent JVM? >> > >> > On Mon, May 23, 2011 at 4:52 AM, Wayne <wav...@gmail.com> wrote: >> > >> > > I have switched to using the mslab enabled java setting to try to avoid >> GC >> > > causing nodes to go awol but it almost appears to be worse. Below is >> the >> > > latest problem with the JVM apparently actually crashing. I am using >> 0.90.1 >> > > with an 8GB heap. Is there a recommended JVM and recommended settings >> to be >> > > used? As it stands right now we can not run 24 hours under heavy write >> load >> > > without a node being taken out by zookeeper for GCing > 60 sec or other >> > > problems like below. >> > > >> > > Any help would be greatly appreciated. >> > > >> > > >> > > 2011-05-23T02:34:51.626+0000: 13902.361: [GC 13902.361: [ParNew: >> > > 249216K->27648K(249216K), 0.1119520 secs] 7546544K->7433319K(8360960K), >> > > 0.1120390 secs] [Times: user=1.14 sys=0.05, real=0.11 secs] >> > > 2011-05-23T02:34:52.292+0000: 13903.027: [GC 13903.027: [ParNew: >> > > 249216K->27648K(249216K), 0.0732800 secs] 7654887K->7506032K(8360960K), >> > > 0.0733690 secs] [Times: user=0.76 sys=0.02, real=0.08 secs] >> > > 2011-05-23T02:34:52.721+0000: 13903.456: [CMS-concurrent-mark: >> 8.137/10.065 >> > > secs] [Times: user=60.86 sys=2.98, real=10.06 secs] >> > > 2011-05-23T02:34:52.721+0000: 13903.456: >> [CMS-concurrent-preclean-start] >> > > 2011-05-23T02:34:52.839+0000: 13903.574: [GC 13903.574: [ParNew: >> > > 249216K->27648K(249216K), 0.0575510 secs] 7727600K->7562758K(8360960K), >> > > 0.0576420 secs] [Times: user=0.62 sys=0.02, real=0.06 secs] >> > > 2011-05-23T02:34:53.190+0000: 13903.925: [GC 13903.925: [ParNew: >> > > 249171K->27648K(249216K), 0.1108480 secs] 7784281K->7661505K(8360960K), >> > > 0.1109440 secs] [Times: user=1.10 sys=0.03, real=0.11 secs] >> > > 2011-05-23T02:34:53.539+0000: 13904.274: [GC 13904.274: [ParNew >> (promotion >> > > failed): 249216K->249216K(249216K), 0.1207770 secs]13904.395: >> > > [CMS2011-05-23T02:34:54.310+0000: 13905.045: [CMS-concurrent-preclean: >> > > 1.245/1.589 secs] [Times: user=5.99 sys=0.13, real=1.59 secs] >> > > (concurrent mode failure)# >> > > # A fatal error has been detected by the Java Runtime Environment: >> > > # >> > > # SIGSEGV (0xb) at pc=0x00002b19debbe665, pid=25868, tid=1078290752 >> > > # >> > > # JRE version: 6.0_17-b17 >> > > # Java VM: OpenJDK 64-Bit Server VM (14.0-b16 mixed mode linux-amd64 ) >> > > # Derivative: IcedTea6 1.7.10 >> > > # Distribution: Custom build (Wed May 4 23:17:24 EDT 2011) >> > > # Problematic frame: >> > > # V [libjvm.so+0x29d665] >> > > # >> > > # An error report file with more information is saved as: >> > > # .../hbase-0.90.1/hs_err_pid25868.log >> > > # >> > > # If you would like to submit a bug report, please include >> > > # instructions how to reproduce the bug and visit: >> > > # http://icedtea.classpath.org/bugzilla >> > > >> >> >