Re: regionserver crash under heavy load

Jinsong Hu Wed, 14 Jul 2010 14:15:24 -0700

sure, I will test with one single column family and see how it goes.


Jimmy.

--------------------------------------------------
From: "Jean-Daniel Cryans" <[email protected]>
Sent: Wednesday, July 14, 2010 2:11 PM
To: <[email protected]>
Subject: Re: regionserver crash under heavy load

After discussing your issue on the IRC channel, the biggest problem is
probably all the compactions that are taking place on all the small
files that are generated. If all your files are 100k big, before any
of your families is able to reach 256MB it will take a hell lot of
compactions, which basically means rewriting your data internally
hundreds of times. We are wondering what could be done to ease that
problem, like flushing each family individually, but that wouldn't be
an ideal situation either and it would require a lot of work.

In the mean time, I would recommend either using a very small number
of families or, if you really need to have 20 families, loading them
one by one. You'll be impressed by the speedup.

J-D

On Wed, Jul 14, 2010 at 1:32 PM, Jean-Daniel Cryans <[email protected]>wrote:

Extremely high IO is bad, hints that there's problem (since it means
that most of the processes are waiting for either disk or network).
That's what I see in the datanode's log, lots of threads timeout'ing.

20 column families is usually unreasonable (it has a deep impact also
with flushing when doing massive imports), you should review your
schema. I never good reasons to use more than 2-3 families. Here is an
example of what 20 families does:

2010-07-13 04:49:55,981 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush

for regionSpam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112..

Current region memstore size 65.7m
2010-07-13 04:49:57,314 INFO
org.apache.hadoop.hbase.regionserver.Store: Added
hdfs://t-namenode1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/b33f1f164dd4d3f04aa65c2223c74112/agent/1177093464238110730,
entries=8788, sequenceid=35, memsize=2.0m, filesize=144.1k to
Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112.
...
2010-07-13 04:49:57,486 INFO
org.apache.hadoop.hbase.regionserver.Store: Added
hdfs://t-namenode1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/b33f1f164dd4d3f04aa65c2223c74112/cluster/4834392580329156904,
entries=8788, sequenceid=35, memsize=2.0m, filesize=141.6k to
Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112.
...

As you can see, we flush when we hit a global size of 64MB on a region
but each family is flushed to a different (very small) file. This is
by design, HBase is column-oriented and using all families at the same
time usually points to bad usage/schema design. Also this must account
for a LOT of the IO you are seeing.

And about the compression, I do see an error in the region server log:

2010-07-13 04:49:57,137 WARN org.apache.hadoop.util.NativeCodeLoader:
Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable

Something's missing in your configuration. Please review
http://wiki.apache.org/hadoop/UsingLzoCompression

J-D

On Wed, Jul 14, 2010 at 1:16 PM, Jinsong Hu <[email protected]>wrote:

the test has stopped. so I can't really tell now. what I noticed is thatI

was running

"iostat -x 5" all the time and I noticed that there are lots of timesthat

the %idle is
less than 5%,  or even close to 0% . I also notice that the disk %util
higher than 90%.


This indicates that the disk is pretty busy. On the other hand, I was
running 3 mappers

to insert records to the hbase, I artificially restrict the insertionrate

to be 80 records/second.
The table has 20 column families, each family has 4-5 columns. by my
estimate, each
record is about 10K byte. so the pumping rate is not really that high.

the table columns are compressed. I do notice that when I disable the
compression,
the insertion rate is even slower. This indicates that the disk usage
probably is indeed the

bottle neck, but it is just hard to believe that I can't even insertrecord

at 3x80=240 records/second
with 10K byte records.  The load isn't heavy at all .
Right now I reduce the speed to even lower 20 records/second
and test and see if it will be in better shape.

Did you check the datanode log ? There are lots of errors there. can you
tell why it happens ?


Jimmy.

--------------------------------------------------
From: "Jean-Daniel Cryans" <[email protected]>
Sent: Wednesday, July 14, 2010 12:10 PM
To: <[email protected]>
Subject: Re: regionserver crash under heavy load

Thanks for the zookeeper log, I was able to pinpoint more the problem.
So I see this (didn't dig enough in the logs it seems, wasn't
expecting a 3 minutes difference)

2010-07-13 21:49:17,864 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new decompressor
2010-07-13 22:03:46,313 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
Total=3.3544312MB (3517376), Free=405.42056MB (425114272),
Max=408.775MB (428631648), Counts: Blocks=0, Access=6580365, Hit=0,
Miss=658036
5, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%,
Evicted/Run=NaN
2010-07-13 22:03:46,483 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new compressor
2010-07-13 22:03:46,404 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 908177ms for
sessionid 0x229c87a7f4e0011, closing socket connection and attempting
reconnect2010-07-13 22:03:46,400 INFO
org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor
2010-07-13 22:03:46,313 WARN
org.apache.hadoop.hbase.regionserver.wal.HLog: IPC Server handler 2 on
60020 took 868374ms appending an edit to hlog; editcount=4274

At the same time, we can see that the zookeeper log does expire the
session:

2010-07-13 21:52:00,001 - INFO  [SessionTracker:zookeeperser...@315] -
Expiring session 0x229c87a7f4e0011, timeout of 120000ms exceeded
2010-07-13 21:52:00,001 - INFO  [SessionTracker:zookeeperser...@315] -
Expiring session 0x129c87a7f980015, timeout of 120000ms exceeded
2010-07-13 21:52:00,002 - INFO
[ProcessThread:-1:preprequestproces...@385] - Processed session
termination for sessionid: 0x229c87a7f4e0011
2010-07-13 21:52:00,002 - INFO
[ProcessThread:-1:preprequestproces...@385] - Processed session
termination for sessionid: 0x129c87a7f980015

So your region server process was paused for more than 10 minutes, the
logs show that gap and the lines that follow talk about that big
pause. Are your nodes swapping?

J-D

On Wed, Jul 14, 2010 at 11:48 AM, Jinsong Hu <[email protected]>
wrote:

the zookeepers are on 3 separate physical machine that colocates with3

master.
I have put the logs for them here.
http://t-collectors1.proofpoint.com:8079/zookeeper.tar.gz
the version is also from CDH3 June 10 release.

I checked the datanode in that period , there are lots of exceptions,I

have
put it here:
http://t-collectors1.proofpoint.com:8079/backup.tar.gz

Jinsong

--------------------------------------------------
From: "Jean-Daniel Cryans" <[email protected]>
Sent: Wednesday, July 14, 2010 11:16 AM
To: <[email protected]>
Subject: Re: regionserver crash under heavy load

So your region servers had their session expired but I don't see any
sign of GC activity. Usually that points to a case where zookeeper

isn't able to answer request fast enough because it is IO starved.Are

the zookeeper quorum member on the same nodes as the region servers?

J-D

On Wed, Jul 14, 2010 at 10:16 AM, Jinsong Hu <[email protected]>
wrote:


I have uploaded the crashed log files to

http://somewhere
http://somewhere

it includes the GC log and config files too. both of theseregionserver

crashed when I
use only 3 mappers to insert records to hbase, and each task I limit
the
data rate
to 80 records /second.

The machine I use is relatively old, having only 4G of ram. 4 coreCPU,

and
250G disk.
I run tasktracker , datanode and regionserver on them. I have 3
machines,
only 1 regionserver
is still up after the continuous insertion for overnight. before the
test
I
created  hbase
tables so I started with an empty table. I saw it running fine for
several
hours and this morning
I check again, 2 regionserver died. I backup the log and restart the
regionserver, and then

I found the regionserver process is up, but not listening to anyport.


the log doesn't show any error:
2010-07-14 16:23:45,769 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
: Set watcher on master address ZNode /hbase/master
2010-07-14 16:23:45,951 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
: Read ZNode /hbase/master got 10.110.24.48:60000
2010-07-14 16:23:45,957 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
Telling master at 10.110.24.48:60000 that we are up
2010-07-14 16:23:46,283 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook:
Installed shutdown hook thread: Shutdownhook:regionserver60020


I am using the June 10 release of the cloudera distribution for both
hadoop
and hbase.

Jimmy



--------------------------------------------------
From: "Jean-Daniel Cryans" <[email protected]>
Sent: Tuesday, July 13, 2010 4:24 PM
To: <[email protected]>
Subject: Re: regionserver crash under heavy load

Your region server doesn't look much loaded from the metrics POV.But

I specifically asked for the lines around that, not just the dump,
since it will contain the reason for the shutdown.

I do notice that the disk usage is pretty high. I am justthinking
that
our
problem probably is a hardware limit. but the server should notcrash
when
the hardware limit is reached.

We still don't know why it crashed and it may not even be relatedtoHW limits, we need those bigger log traces. Also use pastebin.comor

anything like that.


do you have any idea when CDH3 official release will be out ?

I don't work for cloudera, but IIRC the next beta for CDH3 is duefor

September.


Jimmy

--------------------------------------------------
From: "Jean-Daniel Cryans" <[email protected]>
Sent: Tuesday, July 13, 2010 2:55 PM
To: <[email protected]>
Subject: Re: regionserver crash under heavy load

Please use a pasting service for the log traces. I personally use
pastebin.com

You probably had a GC that lasted too long, this is something outof

the control of the application (apart from trying to put as less
data
in memory as possible, but you are inserting so...). Your log
doesn't

contain enough information for us to tell, please look for a"Dump

of
metrics" line and paste the lines around that.

J-D

On Tue, Jul 13, 2010 at 2:49 PM, Jinsong Hu<[email protected]>

wrote:


Hi, Todd:
 I downloaded hadoop-0.20.2+320 and hbase-0.89.20100621+17 from
CDH3
and

inserted data with full load, after a while the hbaseregionserver

crashed.

I checked system with "iostat -x 5" and notice the disk ispretty

busy.

Then I modified my client code and reduced the insertion rate by6

times,
and the test runs fine.  Is there any way that regionserver be
modified
so
that at least it doesn't crash under heavy load ?  I used apache
hbase
0.20.5 distribution and the same problem happens. I am thinking
that
when

the regionserver is too busy, it should throttle incoming datarate

to
protect the server.  Could this be done ?
 Do you also know when the CDH3 official release will come out ?
the
one
I
downloaded is beta version.

Jimmy






2010-07-13 02:24:34,389 INFO
org.apache.hadoop.hbase.regionserver.HRegion:
Close
d Spam_MsgEventTable,56-2010-05-19
10:09:02\x099a420f4f31748828fd24aeea1d06b294,
1278973678315.01dd22f517dabf53ddd135709b68ba6c.
2010-07-13 02:24:34,389 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
aborting server at: m0002029.ppops.net,60020,1278969481450
2010-07-13 02:24:34,389 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
: Closed connection with ZooKeeper; /hbase/root-region-server
2010-07-13 02:24:34,389 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
regionserver60020 exiting
2010-07-13 02:24:34,608 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook:
Shutdown hook starting; hbase.shutdown.hook=true;
fsShutdownHook=Thread[Thread-1
0,5,main]
2010-07-13 02:24:34,608 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook:
Starting fs shutdown hook thread.
2010-07-13 02:24:34,608 ERROR org.apache.hadoop.hdfs.DFSClient:
Exception
closin
g file



/hbase/.logs/m0002029.ppops.net,60020,1278969481450/10.110.24.79%3A60020.
1278987220794 : java.io.IOException: IOException
flush:java.io.IOException:
IOEx
ception flush:java.io.IOException: IOException
flush:java.io.IOException:
IOExce
ption flush:java.io.IOException: IOException
flush:java.io.IOException:
IOExcept
ion flush:java.io.IOException: IOException
flush:java.io.IOException:
IOExceptio

n flush:java.io.IOException: IOExceptionflush:java.io.IOException:

Error
Recove
ry for block blk_-1605696159279298313_2395924 failed  because
recovery
from
prim
ary datanode 10.110.24.80:50010 failed 6 times.  Pipeline was
10.110.24.80:50010
. Aborting...
java.io.IOException: IOException flush:java.io.IOException:
IOException
flush:ja
va.io.IOException: IOException flush:java.io.IOException:
IOException
flush:java

.io.IOException: IOException flush:java.io.IOException:IOException

flush:java.i

o.IOException: IOException flush:java.io.IOException:IOException

flush:java.io.

IOException: IOException flush:java.io.IOException: ErrorRecovery

for
block
blk

_-1605696159279298313_2395924 failed because recovery fromprimary

datanode
10.

110.24.80:50010 failed 6 times. Pipeline was10.110.24.80:50010.

Aborting...
  at

org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:
3214)
  at

org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:
97)
  at

org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944
)

at sun.reflect.GeneratedMethodAccessor24.invoke(UnknownSource)

  at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at

org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(S
equenceFileLogWriter.java:124)
  at
org.apache.hadoop.hbase.regionserver.wal.HLog.hflush(HLog.java:826)
  at
org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1004)
  at
org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:817)
  at

org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.j
ava:1531)
  at
org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1447)
  at

org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.
java:1703)
  at

org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionSe
rver.java:2361)

at sun.reflect.GeneratedMethodAccessor10.invoke(UnknownSource)

  at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
  at

org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:
919)
2010-07-13 02:24:34,610 ERROR org.apache.hadoop.hdfs.DFSClient:
Exception
closin
g file



/hbase/Spam_MsgEventTable/079c7de876422e57e5f09fef5d997e06/.tmp/677365813

4549268273 : java.io.IOException: All datanodes10.110.24.80:50010

are
bad.
Abor
ting...
java.io.IOException: All datanodes 10.110.24.80:50010 are bad.
Aborting...
  at

org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError
(DFSClient.java:2603)
  at

org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClien
t.java:2139)
  at

org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
Client.java:2306)
2010-07-13 02:24:34,729 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook:
Shutdown hook finished.

Re: regionserver crash under heavy load

Reply via email to