On Mon, Oct 11, 2010 at 4:20 AM, Andrey Stepachev <[email protected]> wrote: > Hi. > > One additional issue with column families: number of memstores. Each > family on insert utilizies > one memstory. If you'll write in several memstores at onces you get > more memstores and more > memory will be used by you region server. Especially with random > inserts you can easy get > gc timeouts or OOME.
Very unlikely to get OOME here, since there's a limit on the size of all the memstores inside a single region server (default is 40% of configured heap). But you don't really want to hit it since it blocks all inserts until it cleared enough room. But the "number of memstores" argument also implies that since regions flush on the total size of their memstores, filling up a few of them at the same time is very inefficient. The worst case is filling up a family with really big cells while also inserting much smaller cells into other families. In one case on a troublesome cluster I saw regions flushing one ~58MB file along with 5 ~100KB-1MB files. Flushing individual families instead of whole regions would be a fix in this case, but it has other side effects. I personally don't recommend using multiple families unless they are used separately almost all the time. J-D
