Re: regionserver crash under heavy load

Jinsong Hu Tue, 13 Jul 2010 21:20:24 -0700

about 1000 record per second. each record is around 10K in size,

there are 3 regionservers, double as data node and task tracker. eachmachine have 4G memory and 4 core cpu.

not very powerful machine.


Jimmy.

--------------------------------------------------
From: "Veeramachaneni, Ravi" <[email protected]>
Sent: Tuesday, July 13, 2010 7:04 PM
To: <[email protected]>
Subject: RE: regionserver crash under heavy load

Just curious, how big is the load we are talking about? 100s or 1000sinserts/second? We are planning on moving to CDH3 with HBase soon.


-----Original Message-----

From: [email protected] [mailto:[email protected]] On Behalf OfJean-Daniel Cryans

Sent: Tuesday, July 13, 2010 6:24 PM
To: [email protected]
Subject: Re: regionserver crash under heavy load

Your region server doesn't look much loaded from the metrics POV. But
I specifically asked for the lines around that, not just the dump,
since it will contain the reason for the shutdown.

I do notice that the disk usage is pretty high. I am just thinking thatourproblem probably is a hardware limit. but the server should not crashwhen
the hardware limit is reached.


We still don't know why it crashed and it may not even be related to
HW limits, we need those bigger log traces. Also use pastebin.com or
anything like that.


do you have any idea when CDH3 official release will be out ?

I don't work for cloudera, but IIRC the next beta for CDH3 is due forSeptember.


Jimmy

--------------------------------------------------
From: "Jean-Daniel Cryans" <[email protected]>
Sent: Tuesday, July 13, 2010 2:55 PM
To: <[email protected]>
Subject: Re: regionserver crash under heavy load

Please use a pasting service for the log traces. I personally use
pastebin.com

You probably had a GC that lasted too long, this is something out of
the control of the application (apart from trying to put as less data
in memory as possible, but you are inserting so...). Your log doesn't
contain enough information for us to tell, please look for a "Dump of
metrics" line and paste the lines around that.

J-D

On Tue, Jul 13, 2010 at 2:49 PM, Jinsong Hu <[email protected]>
wrote:


Hi, Todd:

I downloaded hadoop-0.20.2+320 and hbase-0.89.20100621+17 from CDH3and

inserted data with full load, after a while the hbase regionserver
crashed.

I checked system with "iostat -x 5" and notice the disk is prettybusy.Then I modified my client code and reduced the insertion rate by 6times,

and the test runs fine.  Is there any way that regionserver be modified
so
that at least it doesn't crash under heavy load ?  I used apache hbase

0.20.5 distribution and the same problem happens. I am thinking thatwhen

the regionserver is too busy, it should throttle incoming data rate to
protect the server.  Could this be done ?

Do you also know when the CDH3 official release will come out ? theone

I
downloaded is beta version.

Jimmy






2010-07-13 02:24:34,389 INFO
org.apache.hadoop.hbase.regionserver.HRegion:
Close
d Spam_MsgEventTable,56-2010-05-19
10:09:02\x099a420f4f31748828fd24aeea1d06b294,
1278973678315.01dd22f517dabf53ddd135709b68ba6c.
2010-07-13 02:24:34,389 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
aborting server at: m0002029.ppops.net,60020,1278969481450
2010-07-13 02:24:34,389 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
: Closed connection with ZooKeeper; /hbase/root-region-server
2010-07-13 02:24:34,389 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
regionserver60020 exiting
2010-07-13 02:24:34,608 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook:
Shutdown hook starting; hbase.shutdown.hook=true;
fsShutdownHook=Thread[Thread-1
0,5,main]
2010-07-13 02:24:34,608 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook:
Starting fs shutdown hook thread.

2010-07-13 02:24:34,608 ERROR org.apache.hadoop.hdfs.DFSClient:Exception

closin
g file
/hbase/.logs/m0002029.ppops.net,60020,1278969481450/10.110.24.79%3A60020.
1278987220794 : java.io.IOException: IOException
flush:java.io.IOException:
IOEx

ception flush:java.io.IOException: IOExceptionflush:java.io.IOException:

IOExce
ption flush:java.io.IOException: IOException flush:java.io.IOException:
IOExcept
ion flush:java.io.IOException: IOException flush:java.io.IOException:
IOExceptio

n flush:java.io.IOException: IOException flush:java.io.IOException:Error

Recove
ry for block blk_-1605696159279298313_2395924 failed  because recovery
from
prim
ary datanode 10.110.24.80:50010 failed 6 times.  Pipeline was
10.110.24.80:50010
. Aborting...
java.io.IOException: IOException flush:java.io.IOException: IOException
flush:ja
va.io.IOException: IOException flush:java.io.IOException: IOException
flush:java
.io.IOException: IOException flush:java.io.IOException: IOException
flush:java.i
o.IOException: IOException flush:java.io.IOException: IOException
flush:java.io.
IOException: IOException flush:java.io.IOException: Error Recovery for
block
blk
_-1605696159279298313_2395924 failed  because recovery from primary
datanode
10.
110.24.80:50010 failed 6 times.  Pipeline was 10.110.24.80:50010.
Aborting...
     at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:
3214)
     at
org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:
97)
     at
org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944
)
     at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
     at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(S
equenceFileLogWriter.java:124)
     at
org.apache.hadoop.hbase.regionserver.wal.HLog.hflush(HLog.java:826)
     at
org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1004)
     at
org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:817)
     at
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.j
ava:1531)
     at
org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1447)
     at
org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.
java:1703)
     at
org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionSe
rver.java:2361)
     at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
     at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
     at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:
919)

2010-07-13 02:24:34,610 ERROR org.apache.hadoop.hdfs.DFSClient:Exception

closin
g file
/hbase/Spam_MsgEventTable/079c7de876422e57e5f09fef5d997e06/.tmp/677365813
4549268273 : java.io.IOException: All datanodes 10.110.24.80:50010 are
bad.
Abor
ting...
java.io.IOException: All datanodes 10.110.24.80:50010 are bad.
Aborting...
     at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError
(DFSClient.java:2603)
     at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClien
t.java:2139)
     at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
Client.java:2306)
2010-07-13 02:24:34,729 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook:
Shutdown hook finished.

The information contained in this communication may be CONFIDENTIAL and isintended only for the use of the recipient(s) named above. If you are notthe intended recipient, you are hereby notified that any dissemination,distribution, or copying of this communication, or any of its contents, isstrictly prohibited. If you have received this communication in error,please notify the sender and delete/destroy the original message and anycopy of it from your computer or paper files.

Re: regionserver crash under heavy load

Reply via email to