sure, I will test with one single column family and see how it goes.
Jimmy. -------------------------------------------------- From: "Jean-Daniel Cryans" <[email protected]> Sent: Wednesday, July 14, 2010 2:11 PM To: <[email protected]> Subject: Re: regionserver crash under heavy load
After discussing your issue on the IRC channel, the biggest problem is probably all the compactions that are taking place on all the small files that are generated. If all your files are 100k big, before any of your families is able to reach 256MB it will take a hell lot of compactions, which basically means rewriting your data internally hundreds of times. We are wondering what could be done to ease that problem, like flushing each family individually, but that wouldn't be an ideal situation either and it would require a lot of work. In the mean time, I would recommend either using a very small number of families or, if you really need to have 20 families, loading them one by one. You'll be impressed by the speedup. J-DOn Wed, Jul 14, 2010 at 1:32 PM, Jean-Daniel Cryans <[email protected]> wrote:Extremely high IO is bad, hints that there's problem (since it means that most of the processes are waiting for either disk or network). That's what I see in the datanode's log, lots of threads timeout'ing. 20 column families is usually unreasonable (it has a deep impact also with flushing when doing massive imports), you should review your schema. I never good reasons to use more than 2-3 families. Here is an example of what 20 families does: 2010-07-13 04:49:55,981 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flushfor region Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112..Current region memstore size 65.7m 2010-07-13 04:49:57,314 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://t-namenode1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/b33f1f164dd4d3f04aa65c2223c74112/agent/1177093464238110730, entries=8788, sequenceid=35, memsize=2.0m, filesize=144.1k to Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112. ... 2010-07-13 04:49:57,486 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://t-namenode1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/b33f1f164dd4d3f04aa65c2223c74112/cluster/4834392580329156904, entries=8788, sequenceid=35, memsize=2.0m, filesize=141.6k to Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112. ... As you can see, we flush when we hit a global size of 64MB on a region but each family is flushed to a different (very small) file. This is by design, HBase is column-oriented and using all families at the same time usually points to bad usage/schema design. Also this must account for a LOT of the IO you are seeing. And about the compression, I do see an error in the region server log: 2010-07-13 04:49:57,137 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Something's missing in your configuration. Please review http://wiki.apache.org/hadoop/UsingLzoCompression J-DOn Wed, Jul 14, 2010 at 1:16 PM, Jinsong Hu <[email protected]> wrote:the test has stopped. so I can't really tell now. what I noticed is that Iwas running"iostat -x 5" all the time and I noticed that there are lots of times thatthe %idle is less than 5%, or even close to 0% . I also notice that the disk %util higher than 90%. This indicates that the disk is pretty busy. On the other hand, I was running 3 mappersto insert records to the hbase, I artificially restrict the insertion rateto be 80 records/second. The table has 20 column families, each family has 4-5 columns. by my estimate, each record is about 10K byte. so the pumping rate is not really that high. the table columns are compressed. I do notice that when I disable the compression, the insertion rate is even slower. This indicates that the disk usage probably is indeed thebottle neck, but it is just hard to believe that I can't even insert recordat 3x80=240 records/second with 10K byte records. The load isn't heavy at all . Right now I reduce the speed to even lower 20 records/second and test and see if it will be in better shape. Did you check the datanode log ? There are lots of errors there. can you tell why it happens ? Jimmy. -------------------------------------------------- From: "Jean-Daniel Cryans" <[email protected]> Sent: Wednesday, July 14, 2010 12:10 PM To: <[email protected]> Subject: Re: regionserver crash under heavy loadThanks for the zookeeper log, I was able to pinpoint more the problem. So I see this (didn't dig enough in the logs it seems, wasn't expecting a 3 minutes difference) 2010-07-13 21:49:17,864 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2010-07-13 22:03:46,313 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=3.3544312MB (3517376), Free=405.42056MB (425114272), Max=408.775MB (428631648), Counts: Blocks=0, Access=6580365, Hit=0, Miss=658036 5, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%, Evicted/Run=NaN 2010-07-13 22:03:46,483 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor 2010-07-13 22:03:46,404 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 908177ms for sessionid 0x229c87a7f4e0011, closing socket connection and attempting reconnect2010-07-13 22:03:46,400 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor 2010-07-13 22:03:46,313 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: IPC Server handler 2 on 60020 took 868374ms appending an edit to hlog; editcount=4274 At the same time, we can see that the zookeeper log does expire the session: 2010-07-13 21:52:00,001 - INFO [SessionTracker:zookeeperser...@315] - Expiring session 0x229c87a7f4e0011, timeout of 120000ms exceeded 2010-07-13 21:52:00,001 - INFO [SessionTracker:zookeeperser...@315] - Expiring session 0x129c87a7f980015, timeout of 120000ms exceeded 2010-07-13 21:52:00,002 - INFO [ProcessThread:-1:preprequestproces...@385] - Processed session termination for sessionid: 0x229c87a7f4e0011 2010-07-13 21:52:00,002 - INFO [ProcessThread:-1:preprequestproces...@385] - Processed session termination for sessionid: 0x129c87a7f980015 So your region server process was paused for more than 10 minutes, the logs show that gap and the lines that follow talk about that big pause. Are your nodes swapping? J-D On Wed, Jul 14, 2010 at 11:48 AM, Jinsong Hu <[email protected]> wrote:the zookeepers are on 3 separate physical machine that colocates with 3master. I have put the logs for them here. http://t-collectors1.proofpoint.com:8079/zookeeper.tar.gz the version is also from CDH3 June 10 release.I checked the datanode in that period , there are lots of exceptions, Ihave put it here: http://t-collectors1.proofpoint.com:8079/backup.tar.gz Jinsong -------------------------------------------------- From: "Jean-Daniel Cryans" <[email protected]> Sent: Wednesday, July 14, 2010 11:16 AM To: <[email protected]> Subject: Re: regionserver crash under heavy loadSo your region servers had their session expired but I don't see any sign of GC activity. Usually that points to a case where zookeeperisn't able to answer request fast enough because it is IO starved. Arethe zookeeper quorum member on the same nodes as the region servers? J-D On Wed, Jul 14, 2010 at 10:16 AM, Jinsong Hu <[email protected]> wrote:I have uploaded the crashed log files to http://somewhere http://somewhereit includes the GC log and config files too. both of these regionservercrashed when I use only 3 mappers to insert records to hbase, and each task I limit the data rate to 80 records /second.The machine I use is relatively old, having only 4G of ram. 4 core CPU,and 250G disk. I run tasktracker , datanode and regionserver on them. I have 3 machines, only 1 regionserver is still up after the continuous insertion for overnight. before the test I created hbase tables so I started with an empty table. I saw it running fine for several hours and this morning I check again, 2 regionserver died. I backup the log and restart the regionserver, and thenI found the regionserver process is up, but not listening to any port.the log doesn't show any error: 2010-07-14 16:23:45,769 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper : Set watcher on master address ZNode /hbase/master 2010-07-14 16:23:45,951 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper : Read ZNode /hbase/master got 10.110.24.48:60000 2010-07-14 16:23:45,957 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at 10.110.24.48:60000 that we are up 2010-07-14 16:23:46,283 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Installed shutdown hook thread: Shutdownhook:regionserver60020 I am using the June 10 release of the cloudera distribution for both hadoop and hbase. Jimmy -------------------------------------------------- From: "Jean-Daniel Cryans" <[email protected]> Sent: Tuesday, July 13, 2010 4:24 PM To: <[email protected]> Subject: Re: regionserver crash under heavy loadYour region server doesn't look much loaded from the metrics POV. ButI specifically asked for the lines around that, not just the dump, since it will contain the reason for the shutdown.I do notice that the disk usage is pretty high. I am just thinkingthat ourproblem probably is a hardware limit. but the server should not crashwhen the hardware limit is reached.We still don't know why it crashed and it may not even be related to HW limits, we need those bigger log traces. Also use pastebin.com oranything like that.do you have any idea when CDH3 official release will be out ?I don't work for cloudera, but IIRC the next beta for CDH3 is due forSeptember.Jimmy -------------------------------------------------- From: "Jean-Daniel Cryans" <[email protected]> Sent: Tuesday, July 13, 2010 2:55 PM To: <[email protected]> Subject: Re: regionserver crash under heavy loadPlease use a pasting service for the log traces. I personally use pastebin.comYou probably had a GC that lasted too long, this is something out ofthe control of the application (apart from trying to put as less data in memory as possible, but you are inserting so...). Your log doesn'tcontain enough information for us to tell, please look for a "Dumpof metrics" line and paste the lines around that. J-DOn Tue, Jul 13, 2010 at 2:49 PM, Jinsong Hu <[email protected]>wrote:Hi, Todd: I downloaded hadoop-0.20.2+320 and hbase-0.89.20100621+17 from CDH3 andinserted data with full load, after a while the hbase regionservercrashed.I checked system with "iostat -x 5" and notice the disk is prettybusy.Then I modified my client code and reduced the insertion rate by 6times, and the test runs fine. Is there any way that regionserver be modified so that at least it doesn't crash under heavy load ? I used apache hbase 0.20.5 distribution and the same problem happens. I am thinking that whenthe regionserver is too busy, it should throttle incoming data rateto protect the server. Could this be done ? Do you also know when the CDH3 official release will come out ? the one I downloaded is beta version. Jimmy 2010-07-13 02:24:34,389 INFO org.apache.hadoop.hbase.regionserver.HRegion: Close d Spam_MsgEventTable,56-2010-05-19 10:09:02\x099a420f4f31748828fd24aeea1d06b294, 1278973678315.01dd22f517dabf53ddd135709b68ba6c. 2010-07-13 02:24:34,389 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at: m0002029.ppops.net,60020,1278969481450 2010-07-13 02:24:34,389 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper : Closed connection with ZooKeeper; /hbase/root-region-server 2010-07-13 02:24:34,389 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020 exiting 2010-07-13 02:24:34,608 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-1 0,5,main] 2010-07-13 02:24:34,608 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown hook thread. 2010-07-13 02:24:34,608 ERROR org.apache.hadoop.hdfs.DFSClient: Exception closin g file /hbase/.logs/m0002029.ppops.net,60020,1278969481450/10.110.24.79%3A60020. 1278987220794 : java.io.IOException: IOException flush:java.io.IOException: IOEx ception flush:java.io.IOException: IOException flush:java.io.IOException: IOExce ption flush:java.io.IOException: IOException flush:java.io.IOException: IOExcept ion flush:java.io.IOException: IOException flush:java.io.IOException: IOException flush:java.io.IOException: IOException flush:java.io.IOException:Error Recove ry for block blk_-1605696159279298313_2395924 failed because recovery from prim ary datanode 10.110.24.80:50010 failed 6 times. Pipeline was 10.110.24.80:50010 . Aborting... java.io.IOException: IOException flush:java.io.IOException: IOException flush:ja va.io.IOException: IOException flush:java.io.IOException: IOException flush:java.io.IOException: IOException flush:java.io.IOException: IOExceptionflush:java.io.IOException: IOException flush:java.io.IOException: IOExceptionflush:java.io.IOException: IOException flush:java.io.IOException: Error Recoveryfor block blk_-1605696159279298313_2395924 failed because recovery from primarydatanode 10.110.24.80:50010 failed 6 times. Pipeline was 10.110.24.80:50010.Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java: 3214) at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java: 97) at org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944 )at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(S equenceFileLogWriter.java:124) at org.apache.hadoop.hbase.regionserver.wal.HLog.hflush(HLog.java:826) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1004) at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:817) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.j ava:1531) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1447) at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer. java:1703) at org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionSe rver.java:2361)at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java: 919) 2010-07-13 02:24:34,610 ERROR org.apache.hadoop.hdfs.DFSClient: Exception closin g file /hbase/Spam_MsgEventTable/079c7de876422e57e5f09fef5d997e06/.tmp/6773658134549268273 : java.io.IOException: All datanodes 10.110.24.80:50010are bad. Abor ting... java.io.IOException: All datanodes 10.110.24.80:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError (DFSClient.java:2603) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClien t.java:2139) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS Client.java:2306) 2010-07-13 02:24:34,729 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished.
