"be sureto compress your data and set the split size bigger than the default
of 256MB or you'll end up with too many regions."

How many regions are to many?  I have a decent sized cluster (~30 nodes) and
started inserting new data, and noticed that after a day, I went from 30
regions on each server to 60.   That is using the default region size.  I
haven't tested increasing the region file sizes, as I'm concerned about
performance scanning data.

On Wed, Sep 1, 2010 at 2:35 PM, Jean-Daniel Cryans <jdcry...@apache.org>wrote:

> Is that really a good test? Unless you are planning to write about 1TB
> of new data per day into HBase I don't see how you are testing
> capacity, you're more likely testing how HBase can sustain a constant
> import of a lot of data. Regarding that, I'd be interested in knowing
> exactly the circumstances of the region server failure.
>
> Regarding real life example, one of our cluster has about 2.5TB of
> LZOed data (not sure about the raw size) according to dfs -du, on 20
> nodes (FWIW). When trying to reach high density on your nodes, be sure
> to compress your data and set the split size bigger than the default
> of 256MB or you'll end up with too many regions.
>
> J-D
>
> On Wed, Sep 1, 2010 at 11:21 AM, Jinsong Hu <jinsong...@hotmail.com>
> wrote:
> > I did a testing with 6 regionserver cluster with a key design that spread
> > the incoming data to all regions.
> > I noticed after pumping data for 3-4 days for about 3 TB data, one of the
> > regionserver shuts down because
> > of channel IO error.  on a 3 regionserver cluster and same key design,
> the
> > regionservers shuts down after only
> > 45G data insertion.
> >
> > I notice that if the key is designed so that it doesn't spread to all
> > regions, but only to small portion of regions and that
> > portion of regions spread approximately evenly among all regionservers,
> then
> > the HDFS  size becomes the limit of
> > the total number of regions that can be supported and I don't run into
> this
> > IO issue.
> >
> > Can any body show us the actual example of the hbase data size and
> cluster
> > size ?
> >
> > Jimmy.
> >
> > --------------------------------------------------
> > From: "Jonathan Gray" <jg...@facebook.com>
> > Sent: Friday, August 27, 2010 10:55 AM
> > To: <user@hbase.apache.org>
> > Subject: RE: how many regions a regionserver can support
> >
> >> There is no fixed limit, it has much more to do with the read/write load
> >> than the actual dataset size.
> >>
> >> HBase is usually fine having very densely packed RegionServers, if much
> of
> >> the data is rarely accessed.  If you have extremely high numbers of
> regions
> >> per server and you are writing to all of these regions, or even reading
> from
> >> all of them, you could have issues.  Though storage capacity needs to be
> >> considered, capacity planning often has much more to do with how much
> memory
> >> you need to support the read/write load you expect.  Reads mostly from a
> >> performance POV but for writes, there are some important considerations
> >> related to the number of regions per server (and thus data density and
> >> determining your max region size).
> >>
> >> In any case, you should probably increase your max size to 1GB or so and
> >> can go higher if necessary.
> >>
> >> JG
> >>
> >>> -----Original Message-----
> >>> From: Jinsong Hu [mailto:jinsong...@hotmail.com]
> >>> Sent: Friday, August 27, 2010 10:03 AM
> >>> To: user@hbase.apache.org
> >>> Subject: how many regions a regionserver can support
> >>>
> >>> Hi, There :
> >>>   Does anybody know how many region a regionserver can support ? I
> >>> have
> >>> regionservers with 8G ram and 1.5T disk and 4 core CPU.
> >>> I searched http://www.facebook.com/note.php?note_id=142473677002 and
> >>> they
> >>> say google target is 100 regions of 200M for each
> >>> regionserver.
> >>>  In my case, I have 2700 regions spread to 6 regionservers. each
> >>> region is
> >>> set to default size of 256M . and it seems it is still running fine. I
> >>> am
> >>> running CDH3.  I just wonder what is the upper limit so that I can do
> >>> capacity planning. Does anybody know this ?
> >>>
> >>> Jimmy.
> >>
> >>
> >
>

Reply via email to