Re: Max Table Count

Wayne Thu, 19 May 2011 08:25:57 -0700

How about Column Families? We have 4 column families per table due to
different settings (versions etc.). They are sparse in that a given row will
only ever write to a single CF and even regions usually have only 1 CF's
data/store file except at the border between row key naming conventions
(each CF has its own convention). I recently read in the online book (see
below) how more CFs are bad and you should stick with only 1. Is this true
given that there is only really ever data for one CF in a given region? Are
we wasting disk i/o and memory because of empty CFs being flushed and
compacted?

Thanks as always Stack for your help.
8.2.  On the number of column families

HBase currently does not do well with anything about two or three column
families so keep the number of column families in your schema low.
Currently, flushing and compactions are done on a per Region basis so if one
column family is carrying the bulk of the data bringing on flushes, the
adjacent families will also be flushed though the amount of data they carry
is small. Compaction is currently triggered by the total number of files
under a column family. Its not size based. When many column families the
flushing and compaction interaction can make for a bunch of needless i/o
loading (To be addressed by changing flushing and compaction to work on a
per column family basis).

Try to make do with one column famliy if you can in your schemas. Only
introduce a second and third column family in the case where data access is
usually column scoped; i.e. you query one column family or the other but
usually not both at the one time.

On Wed, May 18, 2011 at 10:46 AM, Stack <[email protected]> wrote:

> Its not the number of tables that is of import, its the number of
> regions.  You can have your regions in as many tables as you like.  I
> do not believe there a cost to having more tables.
>
> St.Ack
>
> On Wed, May 18, 2011 at 5:54 AM, Wayne <[email protected]> wrote:
> > How many tables can a cluster realistically handle or how many
> tables/node
> > can be supported? I am looking for a realistic idea of whether a 10 node
> > cluster can support 100 or even 500 tables. I realize it is recommended
> to
> > have a few tables at most (and to use the row key to add everything to
> one
> > table), but that is not an option for us at this point. What are the
> > settings that need to be tweaked and where are the issues going to occur
> in
> > terms of resource limitations, memory constraints, and OOM problems? Do
> most
> > resource limitations fall back to total active region count regardless of
> > the table count? Where do things get scary in terms of a large numbers of
> > tables?
> >
> > Thanks in advance for any advice that can be provided.
> >
>

Re: Max Table Count

Reply via email to