On Wed, Jun 9, 2010 at 12:28 PM, Jean-Daniel Cryans <[email protected]>wrote:

> > Am getting back to using Java after a long time, guys.. So, give me a
> little more time to ramp up to '10 :)
>
> Welcome back!
>
> >
> > One more question then: What are the implications of running really large
> regions (like around 4-8 gigs per region)? One implication I can think of is
> coarser grained control over load (since a split will happen less
> frequently).. But with a large number of nodes, this isnt that
> coarse-grained I guess?
>
> I don't know anybody who's that high up, here we run at 1GB on our
> table that has a few TBs. But yeah, at scale that stuff won't matter
> as much, but with 8GB you could blow out your memory.
>

Howso, JD? I can't think of anything that would cause 1 8GB region to take
more RAM than 16x512MB regions.

-Todd


>
> >
> > We are trying to load 100's of terabytes eventually.. And running even
> 100s of regions per RS seems like a big hit on the memory.
>
> From what I saw in your metrics dump, storing the actual regions was
> costing you almost nothing. But, you will encounter problems when the
> global size of all memstores is getting very big (a true random write
> pattern will always get you there).
>
> IMO, the biggest issue with the number of regions served per RS is
> more about the actual data that is stored and retrieved WRT the
> performance each of your node can deliver (capacity planning).
>
> J-D
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to