Yes, I am indeed testing the sustained rate. the channel I/O exception shows the I/O killed the regionserver.

the data node side shows:

2010-08-28 23:46:27,854 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Ex

ception in receiveBlock for block blk_7209586757797236713_2442298 java.io.Interr

uptedIOException: Interruped while waiting for IO on channel java.nio.channels.S

ocketChannel[connected local=/10.110.24.89:50010 remote=/10.110.24.89:42524]. 0

millis timeout left.


the regionserver side shows:

2010-08-28 23:47:13,148 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream R

esponseProcessor exception for block blk_7209586757797236713_2442298java.io.EOF

Exception


I agree that if the insertion rate is slower, we will support more data in hbase. In this case, I do want to stress test the hbase and see what is the limit. Our application continuously collects data from network and insert to hbase, and I want to see what happens during the extreme cases.
it looks channel I/O doesn't become bottleneck under such stress test.

dfs -dus shows we have 1.17 TB of data when one of the regionserver crashed. the data is gzip compressed as I found that gzip compression actually gives better writing rate.

I may test larger region size later. Previous test with 2 GB also cause lots of I/O and
finally hbase regionserver crashed too.

Jimmy.

--------------------------------------------------
From: "Jean-Daniel Cryans" <jdcry...@apache.org>
Sent: Wednesday, September 01, 2010 11:35 AM
To: <user@hbase.apache.org>
Subject: Re: how many regions a regionserver can support

Is that really a good test? Unless you are planning to write about 1TB
of new data per day into HBase I don't see how you are testing
capacity, you're more likely testing how HBase can sustain a constant
import of a lot of data. Regarding that, I'd be interested in knowing
exactly the circumstances of the region server failure.

Regarding real life example, one of our cluster has about 2.5TB of
LZOed data (not sure about the raw size) according to dfs -du, on 20
nodes (FWIW). When trying to reach high density on your nodes, be sure
to compress your data and set the split size bigger than the default
of 256MB or you'll end up with too many regions.

J-D

On Wed, Sep 1, 2010 at 11:21 AM, Jinsong Hu <jinsong...@hotmail.com> wrote:
I did a testing with 6 regionserver cluster with a key design that spread
the incoming data to all regions.
I noticed after pumping data for 3-4 days for about 3 TB data, one of the
regionserver shuts down because
of channel IO error. on a 3 regionserver cluster and same key design, the
regionservers shuts down after only
45G data insertion.

I notice that if the key is designed so that it doesn't spread to all
regions, but only to small portion of regions and that
portion of regions spread approximately evenly among all regionservers, then
the HDFS  size becomes the limit of
the total number of regions that can be supported and I don't run into this
IO issue.

Can any body show us the actual example of the hbase data size and cluster
size ?

Jimmy.

--------------------------------------------------
From: "Jonathan Gray" <jg...@facebook.com>
Sent: Friday, August 27, 2010 10:55 AM
To: <user@hbase.apache.org>
Subject: RE: how many regions a regionserver can support

There is no fixed limit, it has much more to do with the read/write load
than the actual dataset size.

HBase is usually fine having very densely packed RegionServers, if much of the data is rarely accessed. If you have extremely high numbers of regions per server and you are writing to all of these regions, or even reading from
all of them, you could have issues.  Though storage capacity needs to be
considered, capacity planning often has much more to do with how much memory
you need to support the read/write load you expect.  Reads mostly from a
performance POV but for writes, there are some important considerations
related to the number of regions per server (and thus data density and
determining your max region size).

In any case, you should probably increase your max size to 1GB or so and
can go higher if necessary.

JG

-----Original Message-----
From: Jinsong Hu [mailto:jinsong...@hotmail.com]
Sent: Friday, August 27, 2010 10:03 AM
To: user@hbase.apache.org
Subject: how many regions a regionserver can support

Hi, There :
  Does anybody know how many region a regionserver can support ? I
have
regionservers with 8G ram and 1.5T disk and 4 core CPU.
I searched http://www.facebook.com/note.php?note_id=142473677002 and
they
say google target is 100 regions of 200M for each
regionserver.
 In my case, I have 2700 regions spread to 6 regionservers. each
region is
set to default size of 256M . and it seems it is still running fine. I
am
running CDH3.  I just wonder what is the upper limit so that I can do
capacity planning. Does anybody know this ?

Jimmy.




Reply via email to