On Mon, Oct 11, 2010 at 4:20 AM, Andrey Stepachev <[email protected]> wrote:
> Hi.
>
> One additional issue with column families: number of memstores. Each
> family on insert utilizies
> one memstory. If you'll write in several memstores at onces you get
> more memstores and more
> memory will be used by you region server. Especially with random
> inserts you can easy get
> gc timeouts or OOME.

Very unlikely to get OOME here, since there's a limit on the size of
all the memstores inside a single region server (default is 40% of
configured heap). But you don't really want to hit it since it blocks
all inserts until it cleared enough room.

But the "number of memstores" argument also implies that since regions
flush on the total size of their memstores, filling up a few of them
at the same time is very inefficient. The worst case is filling up a
family with really big cells while also inserting much smaller cells
into other families. In one case on a troublesome cluster I saw
regions flushing one ~58MB file along with 5 ~100KB-1MB files.

Flushing individual families instead of whole regions would be a fix
in this case, but it has other side effects.

I personally don't recommend using multiple families unless they are
used separately almost all the time.

J-D

Reply via email to