Re: regionserver crash under heavy load

Jean-Daniel Cryans Wed, 14 Jul 2010 13:34:10 -0700

Extremely high IO is bad, hints that there's problem (since it means
that most of the processes are waiting for either disk or network).
That's what I see in the datanode's log, lots of threads timeout'ing.


20 column families is usually unreasonable (it has a deep impact also
with flushing when doing massive imports), you should review your
schema. I never good reasons to use more than 2-3 families. Here is an
example of what 20 families does:

2010-07-13 04:49:55,981 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush
for region Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112..
Current region memstore size 65.7m
2010-07-13 04:49:57,314 INFO
org.apache.hadoop.hbase.regionserver.Store: Added
hdfs://t-namenode1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/b33f1f164dd4d3f04aa65c2223c74112/agent/1177093464238110730,
entries=8788, sequenceid=35, memsize=2.0m, filesize=144.1k to
Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112.
...
2010-07-13 04:49:57,486 INFO
org.apache.hadoop.hbase.regionserver.Store: Added
hdfs://t-namenode1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/b33f1f164dd4d3f04aa65c2223c74112/cluster/4834392580329156904,
entries=8788, sequenceid=35, memsize=2.0m, filesize=141.6k to
Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112.
...

As you can see, we flush when we hit a global size of 64MB on a region
but each family is flushed to a different (very small) file. This is
by design, HBase is column-oriented and using all families at the same
time usually points to bad usage/schema design. Also this must account
for a LOT of the IO you are seeing.

And about the compression, I do see an error in the region server log:

2010-07-13 04:49:57,137 WARN org.apache.hadoop.util.NativeCodeLoader:
Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable

Something's missing in your configuration. Please review
http://wiki.apache.org/hadoop/UsingLzoCompression

J-D

On Wed, Jul 14, 2010 at 1:16 PM, Jinsong Hu <[email protected]> wrote:
> the test has stopped. so I can't really tell now. what I noticed is that I
> was running
> "iostat -x 5" all the time and I noticed that there are lots of times that
> the %idle is
> less than 5%,  or even close to 0% . I also notice that the disk %util
> higher than 90%.
>
>
> This indicates that the disk is pretty busy. On the other hand, I was
> running 3 mappers
> to insert records to the hbase, I artificially restrict the insertion rate
> to be 80 records/second.
> The table has 20 column families, each family has 4-5 columns. by my
> estimate, each
> record is about 10K byte. so the pumping rate is not really that high.
>
> the table columns are compressed. I do notice that when I disable the
> compression,
> the insertion rate is even slower. This indicates that the disk usage
> probably is indeed the
> bottle neck, but it is just hard to believe that I can't even insert record
> at 3x80=240 records/second
> with 10K byte records.  The load isn't heavy at all .
> Right now I reduce the speed to even lower 20 records/second
> and test and see if it will be in better shape.
>
> Did you check the datanode log ? There are lots of errors there. can you
> tell why it happens ?
>
>
> Jimmy.
>
> --------------------------------------------------
> From: "Jean-Daniel Cryans" <[email protected]>
> Sent: Wednesday, July 14, 2010 12:10 PM
> To: <[email protected]>
> Subject: Re: regionserver crash under heavy load
>
>> Thanks for the zookeeper log, I was able to pinpoint more the problem.
>> So I see this (didn't dig enough in the logs it seems, wasn't
>> expecting a 3 minutes difference)
>>
>> 2010-07-13 21:49:17,864 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got brand-new decompressor
>> 2010-07-13 22:03:46,313 DEBUG
>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>> Total=3.3544312MB (3517376), Free=405.42056MB (425114272),
>> Max=408.775MB (428631648), Counts: Blocks=0, Access=6580365, Hit=0,
>> Miss=658036
>> 5, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%,
>> Evicted/Run=NaN
>> 2010-07-13 22:03:46,483 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got brand-new compressor
>> 2010-07-13 22:03:46,404 INFO org.apache.zookeeper.ClientCnxn: Client
>> session timed out, have not heard from server in 908177ms for
>> sessionid 0x229c87a7f4e0011, closing socket connection and attempting
>> reconnect2010-07-13 22:03:46,400 INFO
>> org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor
>> 2010-07-13 22:03:46,313 WARN
>> org.apache.hadoop.hbase.regionserver.wal.HLog: IPC Server handler 2 on
>> 60020 took 868374ms appending an edit to hlog; editcount=4274
>>
>> At the same time, we can see that the zookeeper log does expire the
>> session:
>>
>> 2010-07-13 21:52:00,001 - INFO  [SessionTracker:zookeeperser...@315] -
>> Expiring session 0x229c87a7f4e0011, timeout of 120000ms exceeded
>> 2010-07-13 21:52:00,001 - INFO  [SessionTracker:zookeeperser...@315] -
>> Expiring session 0x129c87a7f980015, timeout of 120000ms exceeded
>> 2010-07-13 21:52:00,002 - INFO
>> [ProcessThread:-1:preprequestproces...@385] - Processed session
>> termination for sessionid: 0x229c87a7f4e0011
>> 2010-07-13 21:52:00,002 - INFO
>> [ProcessThread:-1:preprequestproces...@385] - Processed session
>> termination for sessionid: 0x129c87a7f980015
>>
>> So your region server process was paused for more than 10 minutes, the
>> logs show that gap and the lines that follow talk about that big
>> pause. Are your nodes swapping?
>>
>> J-D
>>
>> On Wed, Jul 14, 2010 at 11:48 AM, Jinsong Hu <[email protected]>
>> wrote:
>>>
>>> the zookeepers are on 3 separate physical machine that colocates with 3
>>> master.
>>> I have put the logs for them here.
>>> http://t-collectors1.proofpoint.com:8079/zookeeper.tar.gz
>>> the version is also from CDH3 June 10 release.
>>>
>>> I checked the datanode in that period , there are lots of exceptions, I
>>> have
>>> put it here:
>>> http://t-collectors1.proofpoint.com:8079/backup.tar.gz
>>>
>>> Jinsong
>>>
>>> --------------------------------------------------
>>> From: "Jean-Daniel Cryans" <[email protected]>
>>> Sent: Wednesday, July 14, 2010 11:16 AM
>>> To: <[email protected]>
>>> Subject: Re: regionserver crash under heavy load
>>>
>>>> So your region servers had their session expired but I don't see any
>>>> sign of GC activity. Usually that points to a case where zookeeper
>>>> isn't able to answer request fast enough because it is IO starved. Are
>>>> the zookeeper quorum member on the same nodes as the region servers?
>>>>
>>>> J-D
>>>>
>>>> On Wed, Jul 14, 2010 at 10:16 AM, Jinsong Hu <[email protected]>
>>>> wrote:
>>>>>
>>>>> I have uploaded the crashed log files to
>>>>>
>>>>> http://somewhere
>>>>> http://somewhere
>>>>>
>>>>> it includes the GC log and config files too. both of these regionserver
>>>>> crashed when I
>>>>> use only 3 mappers to insert records to hbase, and each task I limit
>>>>> the
>>>>> data rate
>>>>> to 80 records /second.
>>>>>
>>>>> The machine I use is relatively old, having only 4G of ram. 4 core CPU,
>>>>> and
>>>>> 250G disk.
>>>>> I run tasktracker , datanode and regionserver on them. I have 3
>>>>> machines,
>>>>> only 1 regionserver
>>>>> is still up after the continuous insertion for overnight. before the
>>>>> test
>>>>> I
>>>>> created  hbase
>>>>> tables so I started with an empty table. I saw it running fine for
>>>>> several
>>>>> hours and this morning
>>>>> I check again, 2 regionserver died. I backup the log and restart the
>>>>> regionserver, and then
>>>>> I found the regionserver process is up, but not listening to any port.
>>>>>
>>>>> the log doesn't show any error:
>>>>> 2010-07-14 16:23:45,769 DEBUG
>>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
>>>>> : Set watcher on master address ZNode /hbase/master
>>>>> 2010-07-14 16:23:45,951 DEBUG
>>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
>>>>> : Read ZNode /hbase/master got 10.110.24.48:60000
>>>>> 2010-07-14 16:23:45,957 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>>>>> Telling master at 10.110.24.48:60000 that we are up
>>>>> 2010-07-14 16:23:46,283 INFO
>>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook:
>>>>> Installed shutdown hook thread: Shutdownhook:regionserver60020
>>>>>
>>>>>
>>>>> I am using the June 10 release of the cloudera distribution for both
>>>>> hadoop
>>>>> and hbase.
>>>>>
>>>>> Jimmy
>>>>>
>>>>>
>>>>>
>>>>> --------------------------------------------------
>>>>> From: "Jean-Daniel Cryans" <[email protected]>
>>>>> Sent: Tuesday, July 13, 2010 4:24 PM
>>>>> To: <[email protected]>
>>>>> Subject: Re: regionserver crash under heavy load
>>>>>
>>>>>> Your region server doesn't look much loaded from the metrics POV. But
>>>>>> I specifically asked for the lines around that, not just the dump,
>>>>>> since it will contain the reason for the shutdown.
>>>>>>
>>>>>>> I do notice that the disk usage is pretty high.  I am just thinking
>>>>>>> that
>>>>>>> our
>>>>>>> problem probably is a hardware limit. but the server should not crash
>>>>>>> when
>>>>>>> the hardware limit is reached.
>>>>>>
>>>>>> We still don't know why it crashed and it may not even be related to
>>>>>> HW limits, we need those bigger log traces. Also use pastebin.com or
>>>>>> anything like that.
>>>>>>
>>>>>>>
>>>>>>> do you have any idea when CDH3 official release will be out ?
>>>>>>
>>>>>> I don't work for cloudera, but IIRC the next beta for CDH3 is due for
>>>>>> September.
>>>>>>
>>>>>>>
>>>>>>> Jimmy
>>>>>>>
>>>>>>> --------------------------------------------------
>>>>>>> From: "Jean-Daniel Cryans" <[email protected]>
>>>>>>> Sent: Tuesday, July 13, 2010 2:55 PM
>>>>>>> To: <[email protected]>
>>>>>>> Subject: Re: regionserver crash under heavy load
>>>>>>>
>>>>>>>> Please use a pasting service for the log traces. I personally use
>>>>>>>> pastebin.com
>>>>>>>>
>>>>>>>> You probably had a GC that lasted too long, this is something out of
>>>>>>>> the control of the application (apart from trying to put as less
>>>>>>>> data
>>>>>>>> in memory as possible, but you are inserting so...). Your log
>>>>>>>> doesn't
>>>>>>>> contain enough information for us to tell, please look for a "Dump
>>>>>>>> of
>>>>>>>> metrics" line and paste the lines around that.
>>>>>>>>
>>>>>>>> J-D
>>>>>>>>
>>>>>>>> On Tue, Jul 13, 2010 at 2:49 PM, Jinsong Hu <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi, Todd:
>>>>>>>>>  I downloaded hadoop-0.20.2+320 and hbase-0.89.20100621+17 from
>>>>>>>>> CDH3
>>>>>>>>> and
>>>>>>>>> inserted data with full load, after a while the hbase regionserver
>>>>>>>>> crashed.
>>>>>>>>> I checked  system with "iostat -x 5" and notice the disk is pretty
>>>>>>>>> busy.
>>>>>>>>> Then I modified my client code and reduced the insertion rate by 6
>>>>>>>>> times,
>>>>>>>>> and the test runs fine.  Is there any way that regionserver be
>>>>>>>>> modified
>>>>>>>>> so
>>>>>>>>> that at least it doesn't crash under heavy load ?  I used apache
>>>>>>>>> hbase
>>>>>>>>> 0.20.5 distribution and the same problem happens. I am thinking
>>>>>>>>> that
>>>>>>>>> when
>>>>>>>>> the regionserver is too busy, it should throttle incoming data rate
>>>>>>>>> to
>>>>>>>>> protect the server.  Could this be done ?
>>>>>>>>>  Do you also know when the CDH3 official release will come out ?
>>>>>>>>> the
>>>>>>>>> one
>>>>>>>>> I
>>>>>>>>> downloaded is beta version.
>>>>>>>>>
>>>>>>>>> Jimmy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2010-07-13 02:24:34,389 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>>>>>>> Close
>>>>>>>>> d Spam_MsgEventTable,56-2010-05-19
>>>>>>>>> 10:09:02\x099a420f4f31748828fd24aeea1d06b294,
>>>>>>>>> 1278973678315.01dd22f517dabf53ddd135709b68ba6c.
>>>>>>>>> 2010-07-13 02:24:34,389 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>>>>>>>>> aborting server at: m0002029.ppops.net,60020,1278969481450
>>>>>>>>> 2010-07-13 02:24:34,389 DEBUG
>>>>>>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
>>>>>>>>> : Closed connection with ZooKeeper; /hbase/root-region-server
>>>>>>>>> 2010-07-13 02:24:34,389 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>>>>>>>>> regionserver60020 exiting
>>>>>>>>> 2010-07-13 02:24:34,608 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook:
>>>>>>>>> Shutdown hook starting; hbase.shutdown.hook=true;
>>>>>>>>> fsShutdownHook=Thread[Thread-1
>>>>>>>>> 0,5,main]
>>>>>>>>> 2010-07-13 02:24:34,608 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook:
>>>>>>>>> Starting fs shutdown hook thread.
>>>>>>>>> 2010-07-13 02:24:34,608 ERROR org.apache.hadoop.hdfs.DFSClient:
>>>>>>>>> Exception
>>>>>>>>> closin
>>>>>>>>> g file
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> /hbase/.logs/m0002029.ppops.net,60020,1278969481450/10.110.24.79%3A60020.
>>>>>>>>> 1278987220794 : java.io.IOException: IOException
>>>>>>>>> flush:java.io.IOException:
>>>>>>>>> IOEx
>>>>>>>>> ception flush:java.io.IOException: IOException
>>>>>>>>> flush:java.io.IOException:
>>>>>>>>> IOExce
>>>>>>>>> ption flush:java.io.IOException: IOException
>>>>>>>>> flush:java.io.IOException:
>>>>>>>>> IOExcept
>>>>>>>>> ion flush:java.io.IOException: IOException
>>>>>>>>> flush:java.io.IOException:
>>>>>>>>> IOExceptio
>>>>>>>>> n flush:java.io.IOException: IOException flush:java.io.IOException:
>>>>>>>>> Error
>>>>>>>>> Recove
>>>>>>>>> ry for block blk_-1605696159279298313_2395924 failed  because
>>>>>>>>> recovery
>>>>>>>>> from
>>>>>>>>> prim
>>>>>>>>> ary datanode 10.110.24.80:50010 failed 6 times.  Pipeline was
>>>>>>>>> 10.110.24.80:50010
>>>>>>>>> . Aborting...
>>>>>>>>> java.io.IOException: IOException flush:java.io.IOException:
>>>>>>>>> IOException
>>>>>>>>> flush:ja
>>>>>>>>> va.io.IOException: IOException flush:java.io.IOException:
>>>>>>>>> IOException
>>>>>>>>> flush:java
>>>>>>>>> .io.IOException: IOException flush:java.io.IOException: IOException
>>>>>>>>> flush:java.i
>>>>>>>>> o.IOException: IOException flush:java.io.IOException: IOException
>>>>>>>>> flush:java.io.
>>>>>>>>> IOException: IOException flush:java.io.IOException: Error Recovery
>>>>>>>>> for
>>>>>>>>> block
>>>>>>>>> blk
>>>>>>>>> _-1605696159279298313_2395924 failed  because recovery from primary
>>>>>>>>> datanode
>>>>>>>>> 10.
>>>>>>>>> 110.24.80:50010 failed 6 times.  Pipeline was 10.110.24.80:50010.
>>>>>>>>> Aborting...
>>>>>>>>>   at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:
>>>>>>>>> 3214)
>>>>>>>>>   at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:
>>>>>>>>> 97)
>>>>>>>>>   at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944
>>>>>>>>> )
>>>>>>>>>   at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
>>>>>>>>>   at
>>>>>>>>>
>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>>>>>>>>> sorImpl.java:25)
>>>>>>>>>   at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>>   at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(S
>>>>>>>>> equenceFileLogWriter.java:124)
>>>>>>>>>   at
>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.hflush(HLog.java:826)
>>>>>>>>>   at
>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1004)
>>>>>>>>>   at
>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:817)
>>>>>>>>>   at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.j
>>>>>>>>> ava:1531)
>>>>>>>>>   at
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1447)
>>>>>>>>>   at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.
>>>>>>>>> java:1703)
>>>>>>>>>   at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionSe
>>>>>>>>> rver.java:2361)
>>>>>>>>>   at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>>>>>>>>>   at
>>>>>>>>>
>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>>>>>>>>> sorImpl.java:25)
>>>>>>>>>   at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>>   at
>>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
>>>>>>>>>   at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:
>>>>>>>>> 919)
>>>>>>>>> 2010-07-13 02:24:34,610 ERROR org.apache.hadoop.hdfs.DFSClient:
>>>>>>>>> Exception
>>>>>>>>> closin
>>>>>>>>> g file
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> /hbase/Spam_MsgEventTable/079c7de876422e57e5f09fef5d997e06/.tmp/677365813
>>>>>>>>> 4549268273 : java.io.IOException: All datanodes 10.110.24.80:50010
>>>>>>>>> are
>>>>>>>>> bad.
>>>>>>>>> Abor
>>>>>>>>> ting...
>>>>>>>>> java.io.IOException: All datanodes 10.110.24.80:50010 are bad.
>>>>>>>>> Aborting...
>>>>>>>>>   at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError
>>>>>>>>> (DFSClient.java:2603)
>>>>>>>>>   at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClien
>>>>>>>>> t.java:2139)
>>>>>>>>>   at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
>>>>>>>>> Client.java:2306)
>>>>>>>>> 2010-07-13 02:24:34,729 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook:
>>>>>>>>> Shutdown hook finished.
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: regionserver crash under heavy load

Reply via email to