Hi Henry, Are you using automatic major compaction or you turned it off ? From your logs i can see that issue appeared after compaction is triggered. If few compactions are run at same time it can severely affect IO performance leading to errors on RS.(which may be your case). Here is relevant part form hbase book:
http://hbase.apache.org/0.94/book.html#compaction If you want to turn off automatic major compaction you need to add this to hbas-site.xml <property> <name>hbase.hregion.majorcompaction</name> <value>0</value> </property> After you turn of automatic major compaction you can make your own script and schedule major compaction when suits you best . Cheers Samir On Wed, Oct 23, 2013 at 9:29 AM, Henry Hung <[email protected]> wrote: > @xieliang: I will try the PrintGCApplicationStoppedTime, thank you. > > About loading, total requestPerSeconds is around 15000~30000 for 9 > servers, with numberOfOnlineRegions = 136. > > > I also just uploaded the log files of gc and regionserver into dropbox: > > https://dl.dropboxusercontent.com/u/60149953/gc-hbase.log.20131023 > > > https://dl.dropboxusercontent.com/u/60149953/hbase-hadoop-regionserver-fchddn2.log > > > My setup is: > > CentOS release 6.1 (Final) > > Kernel 2.6.32-131.0.15.el6.x86_64 on an x86_64 > > Hadoop 1.0.4 > > HBase 0.94.6 > > HBASE_REGIONSERVER_OPTS="-XX:+UseParNewGC -Xmn256m > -XX:CMSInitiatingOccupancyFraction=70 -Xmx6000m -verbose:gc > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=2 -XX:GCLogFileSize=256M > -Xloggc:/data1/hadoop/gc-hbase.log" > > ulimit -a: > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 62853 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 65535 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 10240 > cpu time (seconds, -t) unlimited > max user processes (-u) 32639 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > > -----Original Message----- > From: 谢良 [mailto:[email protected]] > Sent: Wednesday, October 23, 2013 2:54 PM > To: [email protected] > Subject: 答复: What cause region server to timeout other than long gc? > > Maybe you can try to add "-XX:+PrintGCApplicationStoppedTime", then if > other ops(not gc) caused the long safepoint duration, you could find the > log. > btw, did you have a high load during that time:) > > Best, > Liang > > The privileged confidential information contained in this email is > intended for use only by the addressees as indicated by the original sender > of this email. If you are not the addressee indicated in this email or are > not responsible for delivery of the email to such a person, please kindly > reply to the sender indicating this fact and delete all copies of it from > your computer and network server immediately. Your cooperation is highly > appreciated. It is advised that any unauthorized use of confidential > information of Winbond is strictly prohibited; and any information in this > email irrelevant to the official business of Winbond shall be deemed as > neither given nor endorsed by Winbond. >
