Extremely high IO is bad, hints that there's problem (since it means that most of the processes are waiting for either disk or network). That's what I see in the datanode's log, lots of threads timeout'ing.
20 column families is usually unreasonable (it has a deep impact also with flushing when doing massive imports), you should review your schema. I never good reasons to use more than 2-3 families. Here is an example of what 20 families does: 2010-07-13 04:49:55,981 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for region Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112.. Current region memstore size 65.7m 2010-07-13 04:49:57,314 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://t-namenode1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/b33f1f164dd4d3f04aa65c2223c74112/agent/1177093464238110730, entries=8788, sequenceid=35, memsize=2.0m, filesize=144.1k to Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112. ... 2010-07-13 04:49:57,486 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://t-namenode1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/b33f1f164dd4d3f04aa65c2223c74112/cluster/4834392580329156904, entries=8788, sequenceid=35, memsize=2.0m, filesize=141.6k to Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112. ... As you can see, we flush when we hit a global size of 64MB on a region but each family is flushed to a different (very small) file. This is by design, HBase is column-oriented and using all families at the same time usually points to bad usage/schema design. Also this must account for a LOT of the IO you are seeing. And about the compression, I do see an error in the region server log: 2010-07-13 04:49:57,137 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Something's missing in your configuration. Please review http://wiki.apache.org/hadoop/UsingLzoCompression J-D On Wed, Jul 14, 2010 at 1:16 PM, Jinsong Hu <[email protected]> wrote: > the test has stopped. so I can't really tell now. what I noticed is that I > was running > "iostat -x 5" all the time and I noticed that there are lots of times that > the %idle is > less than 5%, or even close to 0% . I also notice that the disk %util > higher than 90%. > > > This indicates that the disk is pretty busy. On the other hand, I was > running 3 mappers > to insert records to the hbase, I artificially restrict the insertion rate > to be 80 records/second. > The table has 20 column families, each family has 4-5 columns. by my > estimate, each > record is about 10K byte. so the pumping rate is not really that high. > > the table columns are compressed. I do notice that when I disable the > compression, > the insertion rate is even slower. This indicates that the disk usage > probably is indeed the > bottle neck, but it is just hard to believe that I can't even insert record > at 3x80=240 records/second > with 10K byte records. The load isn't heavy at all . > Right now I reduce the speed to even lower 20 records/second > and test and see if it will be in better shape. > > Did you check the datanode log ? There are lots of errors there. can you > tell why it happens ? > > > Jimmy. > > -------------------------------------------------- > From: "Jean-Daniel Cryans" <[email protected]> > Sent: Wednesday, July 14, 2010 12:10 PM > To: <[email protected]> > Subject: Re: regionserver crash under heavy load > >> Thanks for the zookeeper log, I was able to pinpoint more the problem. >> So I see this (didn't dig enough in the logs it seems, wasn't >> expecting a 3 minutes difference) >> >> 2010-07-13 21:49:17,864 INFO org.apache.hadoop.io.compress.CodecPool: >> Got brand-new decompressor >> 2010-07-13 22:03:46,313 DEBUG >> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: >> Total=3.3544312MB (3517376), Free=405.42056MB (425114272), >> Max=408.775MB (428631648), Counts: Blocks=0, Access=6580365, Hit=0, >> Miss=658036 >> 5, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%, >> Evicted/Run=NaN >> 2010-07-13 22:03:46,483 INFO org.apache.hadoop.io.compress.CodecPool: >> Got brand-new compressor >> 2010-07-13 22:03:46,404 INFO org.apache.zookeeper.ClientCnxn: Client >> session timed out, have not heard from server in 908177ms for >> sessionid 0x229c87a7f4e0011, closing socket connection and attempting >> reconnect2010-07-13 22:03:46,400 INFO >> org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor >> 2010-07-13 22:03:46,313 WARN >> org.apache.hadoop.hbase.regionserver.wal.HLog: IPC Server handler 2 on >> 60020 took 868374ms appending an edit to hlog; editcount=4274 >> >> At the same time, we can see that the zookeeper log does expire the >> session: >> >> 2010-07-13 21:52:00,001 - INFO [SessionTracker:zookeeperser...@315] - >> Expiring session 0x229c87a7f4e0011, timeout of 120000ms exceeded >> 2010-07-13 21:52:00,001 - INFO [SessionTracker:zookeeperser...@315] - >> Expiring session 0x129c87a7f980015, timeout of 120000ms exceeded >> 2010-07-13 21:52:00,002 - INFO >> [ProcessThread:-1:preprequestproces...@385] - Processed session >> termination for sessionid: 0x229c87a7f4e0011 >> 2010-07-13 21:52:00,002 - INFO >> [ProcessThread:-1:preprequestproces...@385] - Processed session >> termination for sessionid: 0x129c87a7f980015 >> >> So your region server process was paused for more than 10 minutes, the >> logs show that gap and the lines that follow talk about that big >> pause. Are your nodes swapping? >> >> J-D >> >> On Wed, Jul 14, 2010 at 11:48 AM, Jinsong Hu <[email protected]> >> wrote: >>> >>> the zookeepers are on 3 separate physical machine that colocates with 3 >>> master. >>> I have put the logs for them here. >>> http://t-collectors1.proofpoint.com:8079/zookeeper.tar.gz >>> the version is also from CDH3 June 10 release. >>> >>> I checked the datanode in that period , there are lots of exceptions, I >>> have >>> put it here: >>> http://t-collectors1.proofpoint.com:8079/backup.tar.gz >>> >>> Jinsong >>> >>> -------------------------------------------------- >>> From: "Jean-Daniel Cryans" <[email protected]> >>> Sent: Wednesday, July 14, 2010 11:16 AM >>> To: <[email protected]> >>> Subject: Re: regionserver crash under heavy load >>> >>>> So your region servers had their session expired but I don't see any >>>> sign of GC activity. Usually that points to a case where zookeeper >>>> isn't able to answer request fast enough because it is IO starved. Are >>>> the zookeeper quorum member on the same nodes as the region servers? >>>> >>>> J-D >>>> >>>> On Wed, Jul 14, 2010 at 10:16 AM, Jinsong Hu <[email protected]> >>>> wrote: >>>>> >>>>> I have uploaded the crashed log files to >>>>> >>>>> http://somewhere >>>>> http://somewhere >>>>> >>>>> it includes the GC log and config files too. both of these regionserver >>>>> crashed when I >>>>> use only 3 mappers to insert records to hbase, and each task I limit >>>>> the >>>>> data rate >>>>> to 80 records /second. >>>>> >>>>> The machine I use is relatively old, having only 4G of ram. 4 core CPU, >>>>> and >>>>> 250G disk. >>>>> I run tasktracker , datanode and regionserver on them. I have 3 >>>>> machines, >>>>> only 1 regionserver >>>>> is still up after the continuous insertion for overnight. before the >>>>> test >>>>> I >>>>> created hbase >>>>> tables so I started with an empty table. I saw it running fine for >>>>> several >>>>> hours and this morning >>>>> I check again, 2 regionserver died. I backup the log and restart the >>>>> regionserver, and then >>>>> I found the regionserver process is up, but not listening to any port. >>>>> >>>>> the log doesn't show any error: >>>>> 2010-07-14 16:23:45,769 DEBUG >>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper >>>>> : Set watcher on master address ZNode /hbase/master >>>>> 2010-07-14 16:23:45,951 DEBUG >>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper >>>>> : Read ZNode /hbase/master got 10.110.24.48:60000 >>>>> 2010-07-14 16:23:45,957 INFO >>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: >>>>> Telling master at 10.110.24.48:60000 that we are up >>>>> 2010-07-14 16:23:46,283 INFO >>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook: >>>>> Installed shutdown hook thread: Shutdownhook:regionserver60020 >>>>> >>>>> >>>>> I am using the June 10 release of the cloudera distribution for both >>>>> hadoop >>>>> and hbase. >>>>> >>>>> Jimmy >>>>> >>>>> >>>>> >>>>> -------------------------------------------------- >>>>> From: "Jean-Daniel Cryans" <[email protected]> >>>>> Sent: Tuesday, July 13, 2010 4:24 PM >>>>> To: <[email protected]> >>>>> Subject: Re: regionserver crash under heavy load >>>>> >>>>>> Your region server doesn't look much loaded from the metrics POV. But >>>>>> I specifically asked for the lines around that, not just the dump, >>>>>> since it will contain the reason for the shutdown. >>>>>> >>>>>>> I do notice that the disk usage is pretty high. I am just thinking >>>>>>> that >>>>>>> our >>>>>>> problem probably is a hardware limit. but the server should not crash >>>>>>> when >>>>>>> the hardware limit is reached. >>>>>> >>>>>> We still don't know why it crashed and it may not even be related to >>>>>> HW limits, we need those bigger log traces. Also use pastebin.com or >>>>>> anything like that. >>>>>> >>>>>>> >>>>>>> do you have any idea when CDH3 official release will be out ? >>>>>> >>>>>> I don't work for cloudera, but IIRC the next beta for CDH3 is due for >>>>>> September. >>>>>> >>>>>>> >>>>>>> Jimmy >>>>>>> >>>>>>> -------------------------------------------------- >>>>>>> From: "Jean-Daniel Cryans" <[email protected]> >>>>>>> Sent: Tuesday, July 13, 2010 2:55 PM >>>>>>> To: <[email protected]> >>>>>>> Subject: Re: regionserver crash under heavy load >>>>>>> >>>>>>>> Please use a pasting service for the log traces. I personally use >>>>>>>> pastebin.com >>>>>>>> >>>>>>>> You probably had a GC that lasted too long, this is something out of >>>>>>>> the control of the application (apart from trying to put as less >>>>>>>> data >>>>>>>> in memory as possible, but you are inserting so...). Your log >>>>>>>> doesn't >>>>>>>> contain enough information for us to tell, please look for a "Dump >>>>>>>> of >>>>>>>> metrics" line and paste the lines around that. >>>>>>>> >>>>>>>> J-D >>>>>>>> >>>>>>>> On Tue, Jul 13, 2010 at 2:49 PM, Jinsong Hu <[email protected]> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi, Todd: >>>>>>>>> I downloaded hadoop-0.20.2+320 and hbase-0.89.20100621+17 from >>>>>>>>> CDH3 >>>>>>>>> and >>>>>>>>> inserted data with full load, after a while the hbase regionserver >>>>>>>>> crashed. >>>>>>>>> I checked system with "iostat -x 5" and notice the disk is pretty >>>>>>>>> busy. >>>>>>>>> Then I modified my client code and reduced the insertion rate by 6 >>>>>>>>> times, >>>>>>>>> and the test runs fine. Is there any way that regionserver be >>>>>>>>> modified >>>>>>>>> so >>>>>>>>> that at least it doesn't crash under heavy load ? I used apache >>>>>>>>> hbase >>>>>>>>> 0.20.5 distribution and the same problem happens. I am thinking >>>>>>>>> that >>>>>>>>> when >>>>>>>>> the regionserver is too busy, it should throttle incoming data rate >>>>>>>>> to >>>>>>>>> protect the server. Could this be done ? >>>>>>>>> Do you also know when the CDH3 official release will come out ? >>>>>>>>> the >>>>>>>>> one >>>>>>>>> I >>>>>>>>> downloaded is beta version. >>>>>>>>> >>>>>>>>> Jimmy >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2010-07-13 02:24:34,389 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion: >>>>>>>>> Close >>>>>>>>> d Spam_MsgEventTable,56-2010-05-19 >>>>>>>>> 10:09:02\x099a420f4f31748828fd24aeea1d06b294, >>>>>>>>> 1278973678315.01dd22f517dabf53ddd135709b68ba6c. >>>>>>>>> 2010-07-13 02:24:34,389 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: >>>>>>>>> aborting server at: m0002029.ppops.net,60020,1278969481450 >>>>>>>>> 2010-07-13 02:24:34,389 DEBUG >>>>>>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper >>>>>>>>> : Closed connection with ZooKeeper; /hbase/root-region-server >>>>>>>>> 2010-07-13 02:24:34,389 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: >>>>>>>>> regionserver60020 exiting >>>>>>>>> 2010-07-13 02:24:34,608 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook: >>>>>>>>> Shutdown hook starting; hbase.shutdown.hook=true; >>>>>>>>> fsShutdownHook=Thread[Thread-1 >>>>>>>>> 0,5,main] >>>>>>>>> 2010-07-13 02:24:34,608 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook: >>>>>>>>> Starting fs shutdown hook thread. >>>>>>>>> 2010-07-13 02:24:34,608 ERROR org.apache.hadoop.hdfs.DFSClient: >>>>>>>>> Exception >>>>>>>>> closin >>>>>>>>> g file >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> /hbase/.logs/m0002029.ppops.net,60020,1278969481450/10.110.24.79%3A60020. >>>>>>>>> 1278987220794 : java.io.IOException: IOException >>>>>>>>> flush:java.io.IOException: >>>>>>>>> IOEx >>>>>>>>> ception flush:java.io.IOException: IOException >>>>>>>>> flush:java.io.IOException: >>>>>>>>> IOExce >>>>>>>>> ption flush:java.io.IOException: IOException >>>>>>>>> flush:java.io.IOException: >>>>>>>>> IOExcept >>>>>>>>> ion flush:java.io.IOException: IOException >>>>>>>>> flush:java.io.IOException: >>>>>>>>> IOExceptio >>>>>>>>> n flush:java.io.IOException: IOException flush:java.io.IOException: >>>>>>>>> Error >>>>>>>>> Recove >>>>>>>>> ry for block blk_-1605696159279298313_2395924 failed because >>>>>>>>> recovery >>>>>>>>> from >>>>>>>>> prim >>>>>>>>> ary datanode 10.110.24.80:50010 failed 6 times. Pipeline was >>>>>>>>> 10.110.24.80:50010 >>>>>>>>> . Aborting... >>>>>>>>> java.io.IOException: IOException flush:java.io.IOException: >>>>>>>>> IOException >>>>>>>>> flush:ja >>>>>>>>> va.io.IOException: IOException flush:java.io.IOException: >>>>>>>>> IOException >>>>>>>>> flush:java >>>>>>>>> .io.IOException: IOException flush:java.io.IOException: IOException >>>>>>>>> flush:java.i >>>>>>>>> o.IOException: IOException flush:java.io.IOException: IOException >>>>>>>>> flush:java.io. >>>>>>>>> IOException: IOException flush:java.io.IOException: Error Recovery >>>>>>>>> for >>>>>>>>> block >>>>>>>>> blk >>>>>>>>> _-1605696159279298313_2395924 failed because recovery from primary >>>>>>>>> datanode >>>>>>>>> 10. >>>>>>>>> 110.24.80:50010 failed 6 times. Pipeline was 10.110.24.80:50010. >>>>>>>>> Aborting... >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java: >>>>>>>>> 3214) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java: >>>>>>>>> 97) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944 >>>>>>>>> ) >>>>>>>>> at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) >>>>>>>>> at >>>>>>>>> >>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces >>>>>>>>> sorImpl.java:25) >>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(S >>>>>>>>> equenceFileLogWriter.java:124) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.hflush(HLog.java:826) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1004) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:817) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.j >>>>>>>>> ava:1531) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1447) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer. >>>>>>>>> java:1703) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionSe >>>>>>>>> rver.java:2361) >>>>>>>>> at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) >>>>>>>>> at >>>>>>>>> >>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces >>>>>>>>> sorImpl.java:25) >>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java: >>>>>>>>> 919) >>>>>>>>> 2010-07-13 02:24:34,610 ERROR org.apache.hadoop.hdfs.DFSClient: >>>>>>>>> Exception >>>>>>>>> closin >>>>>>>>> g file >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> /hbase/Spam_MsgEventTable/079c7de876422e57e5f09fef5d997e06/.tmp/677365813 >>>>>>>>> 4549268273 : java.io.IOException: All datanodes 10.110.24.80:50010 >>>>>>>>> are >>>>>>>>> bad. >>>>>>>>> Abor >>>>>>>>> ting... >>>>>>>>> java.io.IOException: All datanodes 10.110.24.80:50010 are bad. >>>>>>>>> Aborting... >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError >>>>>>>>> (DFSClient.java:2603) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClien >>>>>>>>> t.java:2139) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS >>>>>>>>> Client.java:2306) >>>>>>>>> 2010-07-13 02:24:34,729 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook: >>>>>>>>> Shutdown hook finished. >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
