"be sureto compress your data and set the split size bigger than the default of 256MB or you'll end up with too many regions."
How many regions are to many? I have a decent sized cluster (~30 nodes) and started inserting new data, and noticed that after a day, I went from 30 regions on each server to 60. That is using the default region size. I haven't tested increasing the region file sizes, as I'm concerned about performance scanning data. On Wed, Sep 1, 2010 at 2:35 PM, Jean-Daniel Cryans <jdcry...@apache.org>wrote: > Is that really a good test? Unless you are planning to write about 1TB > of new data per day into HBase I don't see how you are testing > capacity, you're more likely testing how HBase can sustain a constant > import of a lot of data. Regarding that, I'd be interested in knowing > exactly the circumstances of the region server failure. > > Regarding real life example, one of our cluster has about 2.5TB of > LZOed data (not sure about the raw size) according to dfs -du, on 20 > nodes (FWIW). When trying to reach high density on your nodes, be sure > to compress your data and set the split size bigger than the default > of 256MB or you'll end up with too many regions. > > J-D > > On Wed, Sep 1, 2010 at 11:21 AM, Jinsong Hu <jinsong...@hotmail.com> > wrote: > > I did a testing with 6 regionserver cluster with a key design that spread > > the incoming data to all regions. > > I noticed after pumping data for 3-4 days for about 3 TB data, one of the > > regionserver shuts down because > > of channel IO error. on a 3 regionserver cluster and same key design, > the > > regionservers shuts down after only > > 45G data insertion. > > > > I notice that if the key is designed so that it doesn't spread to all > > regions, but only to small portion of regions and that > > portion of regions spread approximately evenly among all regionservers, > then > > the HDFS size becomes the limit of > > the total number of regions that can be supported and I don't run into > this > > IO issue. > > > > Can any body show us the actual example of the hbase data size and > cluster > > size ? > > > > Jimmy. > > > > -------------------------------------------------- > > From: "Jonathan Gray" <jg...@facebook.com> > > Sent: Friday, August 27, 2010 10:55 AM > > To: <user@hbase.apache.org> > > Subject: RE: how many regions a regionserver can support > > > >> There is no fixed limit, it has much more to do with the read/write load > >> than the actual dataset size. > >> > >> HBase is usually fine having very densely packed RegionServers, if much > of > >> the data is rarely accessed. If you have extremely high numbers of > regions > >> per server and you are writing to all of these regions, or even reading > from > >> all of them, you could have issues. Though storage capacity needs to be > >> considered, capacity planning often has much more to do with how much > memory > >> you need to support the read/write load you expect. Reads mostly from a > >> performance POV but for writes, there are some important considerations > >> related to the number of regions per server (and thus data density and > >> determining your max region size). > >> > >> In any case, you should probably increase your max size to 1GB or so and > >> can go higher if necessary. > >> > >> JG > >> > >>> -----Original Message----- > >>> From: Jinsong Hu [mailto:jinsong...@hotmail.com] > >>> Sent: Friday, August 27, 2010 10:03 AM > >>> To: user@hbase.apache.org > >>> Subject: how many regions a regionserver can support > >>> > >>> Hi, There : > >>> Does anybody know how many region a regionserver can support ? I > >>> have > >>> regionservers with 8G ram and 1.5T disk and 4 core CPU. > >>> I searched http://www.facebook.com/note.php?note_id=142473677002 and > >>> they > >>> say google target is 100 regions of 200M for each > >>> regionserver. > >>> In my case, I have 2700 regions spread to 6 regionservers. each > >>> region is > >>> set to default size of 256M . and it seems it is still running fine. I > >>> am > >>> running CDH3. I just wonder what is the upper limit so that I can do > >>> capacity planning. Does anybody know this ? > >>> > >>> Jimmy. > >> > >> > > >