Re: regionserver crash under heavy load

Jinsong Hu Tue, 13 Jul 2010 15:29:21 -0700

Here is the grep of dump of metrics:

2010-07-13 02:22:45,818 INFOorg.apache.hadoop.hbase.regionserver.HRegionServer:Dump of metrics: request=305.0, regions=14, stores=167, storefiles=287,storefileIndexSize=54, memstoreSize=489, compactionQueueSize=1, usedHeap=488,maxHeap=2043, blockCacheSize=5800680, blockCacheFree=422830968, blockCacheCount=244,bloc

kCacheHitRatio=29

2010-07-13 02:22:48,286 INFOorg.apache.hadoop.hbase.regionserver.HRegionServer:Dump of metrics: request=0.0, regions=14, stores=167, storefiles=287,storefileIndexSize=54, memstoreSize=489, compactionQueueSize=1, usedHeap=491,maxHeap=2043, blockCacheSize=5800680, blockCacheFree=422830968, blockCacheCount=244,blockC

acheHitRatio=29

I logged all gc and the longest gc is 8.8 seconds. but most of them are notthat long. I used " -XX:+UseConcMarkSweepGC" flag in the java code so GCdoesn't look like a problem.

I do notice that the disk usage is pretty high. I am just thinking that ourproblem probably is a hardware limit. but the server should not crash whenthe hardware limit is reached.


do you have any idea when CDH3 official release will be out ?

Jimmy

--------------------------------------------------
From: "Jean-Daniel Cryans" <[email protected]>
Sent: Tuesday, July 13, 2010 2:55 PM
To: <[email protected]>
Subject: Re: regionserver crash under heavy load

Please use a pasting service for the log traces. I personally usepastebin.com


You probably had a GC that lasted too long, this is something out of
the control of the application (apart from trying to put as less data
in memory as possible, but you are inserting so...). Your log doesn't
contain enough information for us to tell, please look for a "Dump of
metrics" line and paste the lines around that.

J-D

On Tue, Jul 13, 2010 at 2:49 PM, Jinsong Hu <[email protected]>wrote:

Hi, Todd:
 I downloaded hadoop-0.20.2+320 and hbase-0.89.20100621+17 from CDH3 and

inserted data with full load, after a while the hbase regionservercrashed.

I checked  system with "iostat -x 5" and notice the disk is pretty busy.
Then I modified my client code and reduced the insertion rate by 6 times,

and the test runs fine. Is there any way that regionserver be modifiedso

that at least it doesn't crash under heavy load ?  I used apache hbase
0.20.5 distribution and the same problem happens. I am thinking that when
the regionserver is too busy, it should throttle incoming data rate to
protect the server.  Could this be done ?

Do you also know when the CDH3 official release will come out ? the oneI

downloaded is beta version.

Jimmy

2010-07-13 02:24:34,389 INFOorg.apache.hadoop.hbase.regionserver.HRegion:

Close
d Spam_MsgEventTable,56-2010-05-19
10:09:02\x099a420f4f31748828fd24aeea1d06b294,
1278973678315.01dd22f517dabf53ddd135709b68ba6c.
2010-07-13 02:24:34,389 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
aborting server at: m0002029.ppops.net,60020,1278969481450
2010-07-13 02:24:34,389 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
: Closed connection with ZooKeeper; /hbase/root-region-server
2010-07-13 02:24:34,389 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
regionserver60020 exiting
2010-07-13 02:24:34,608 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook:
Shutdown hook starting; hbase.shutdown.hook=true;
fsShutdownHook=Thread[Thread-1
0,5,main]
2010-07-13 02:24:34,608 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook:
Starting fs shutdown hook thread.
2010-07-13 02:24:34,608 ERROR org.apache.hadoop.hdfs.DFSClient: Exception
closin
g file
/hbase/.logs/m0002029.ppops.net,60020,1278969481450/10.110.24.79%3A60020.

1278987220794 : java.io.IOException: IOExceptionflush:java.io.IOException:

IOEx
ception flush:java.io.IOException: IOException flush:java.io.IOException:
IOExce
ption flush:java.io.IOException: IOException flush:java.io.IOException:
IOExcept
ion flush:java.io.IOException: IOException flush:java.io.IOException:
IOExceptio
n flush:java.io.IOException: IOException flush:java.io.IOException: Error
Recove

ry for block blk_-1605696159279298313_2395924 failed because recoveryfrom

prim
ary datanode 10.110.24.80:50010 failed 6 times.  Pipeline was
10.110.24.80:50010
. Aborting...
java.io.IOException: IOException flush:java.io.IOException: IOException
flush:ja
va.io.IOException: IOException flush:java.io.IOException: IOException
flush:java
.io.IOException: IOException flush:java.io.IOException: IOException
flush:java.i
o.IOException: IOException flush:java.io.IOException: IOException
flush:java.io.

IOException: IOException flush:java.io.IOException: Error Recovery forblock

blk

_-1605696159279298313_2395924 failed because recovery from primarydatanode

10.
110.24.80:50010 failed 6 times.  Pipeline was 10.110.24.80:50010.
Aborting...
      at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:
3214)
      at
org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:
97)
      at
org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944
)
      at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
      at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(S
equenceFileLogWriter.java:124)

atorg.apache.hadoop.hbase.regionserver.wal.HLog.hflush(HLog.java:826)atorg.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1004)atorg.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:817)

      at
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.j
ava:1531)

atorg.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1447)

      at
org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.
java:1703)
      at
org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionSe
rver.java:2361)
      at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
      at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)

atorg.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)

      at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:
919)
2010-07-13 02:24:34,610 ERROR org.apache.hadoop.hdfs.DFSClient: Exception
closin
g file
/hbase/Spam_MsgEventTable/079c7de876422e57e5f09fef5d997e06/.tmp/677365813

4549268273 : java.io.IOException: All datanodes 10.110.24.80:50010 arebad.

Abor
ting...

java.io.IOException: All datanodes 10.110.24.80:50010 are bad.Aborting...

      at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError
(DFSClient.java:2603)
      at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClien
t.java:2139)
      at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
Client.java:2306)
2010-07-13 02:24:34,729 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook:
Shutdown hook finished.

Re: regionserver crash under heavy load

Reply via email to