Even if there is zookeeper timeout due to GC, there should be logging related to that, right? Check your ‘/var/log/messages’, it might be that the kernel killed it due to OOM or something else.
On Thu, Nov 7, 2013 at 8:21 AM, Dhaval Shah <[email protected]>wrote: > Operation too slow is generally in the .log file while the GC logs (if you > enabled GC logging) is in the .out file. You have a very small heap for a > 1GB HFIle size. You are probably running your region server out of memory. > Try increasing the heap size and see if that helps > > Regards, > Dhaval > > > ________________________________ > From: John <[email protected]> > To: [email protected]; Dhaval Shah <[email protected]> > Sent: Thursday, 7 November 2013 11:09 AM > Subject: Re: RegionServer crash without any errors (compaction?) > > > > there are no really other logs before. There are a "operationTooSlow" > message before, but that log is ~50 mins bofre the other: > http://pastebin.com/EAAubqGB > > > > > 2013/11/7 John <[email protected]> > > Hi, > > > >thanks for your fast answer. If I take a look at the cloudera manager at > this time the %-time of using the GC increase at this time, so I think you > are right. The max heap size is 1GB for this node. The > hbase.hregion.max.filesize is also 1GB. > > > >regards > > > > > > > > > >2013/11/7 Dhaval Shah <[email protected]> > > > >Did you look at your GC logs? Probably the compaction process is running > your region server out of memory. Can you provide more details on your > setup? Max heap size? Max Region HFile size? > >> > >>Regards, > >>Dhaval > >> > >> > >>________________________________ > >> From: John <[email protected]> > >>To: [email protected] > >>Sent: Thursday, 7 November 2013 10:51 AM > >>Subject: RegionServer crash without any errors (compaction?) > >> > >> > >> > >>Hi, > >> > >>I have a cluster with 7 regionserver. Some of them are crashing from time > >>to time wihtout any error message in the hbase log. If I take a look at > the > >>log at the time I found this: > >> > >>2013-11-07 15:29:02,511 INFO org.apache.hadoop.hbase.regionserver.Store: > >>Starting compaction of 2 file(s) in 1 of P_SO,< > >>http://xmlns.com/foaf/0.1/homepage > >,1383188177383.59d0259c87c07dc666a5600ba4d6c916. > >>i$ > >>2013-11-07 15:29:10,471 INFO > >>org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom > filter > >>type for hdfs:// > >> > pc08.pool.ifis.uni-luebeck.de:8020/hbase/P_SO/59d0259c87c07dc666a5600ba4d6c916/.tmp/f$ > >>2013-11-07 15:31:05,944 INFO org.apache.hadoop.hbase.util.VersionInfo: > >>HBase 0.94.6-cdh4.4.0 > >>.... restart > >> > >>At this time 2 of the 7 RS crashed, both has this compaction message > before > >>they crashed. I don't know exactly what compaction is, but it seems that > >>this compaction has to do with the crash. What can I do to avoid this > >>restart/crash? > >> > >>best regards > > > -- *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.
