On Wed, Jun 9, 2010 at 12:28 PM, Jean-Daniel Cryans <[email protected]>wrote:
> > Am getting back to using Java after a long time, guys.. So, give me a > little more time to ramp up to '10 :) > > Welcome back! > > > > > One more question then: What are the implications of running really large > regions (like around 4-8 gigs per region)? One implication I can think of is > coarser grained control over load (since a split will happen less > frequently).. But with a large number of nodes, this isnt that > coarse-grained I guess? > > I don't know anybody who's that high up, here we run at 1GB on our > table that has a few TBs. But yeah, at scale that stuff won't matter > as much, but with 8GB you could blow out your memory. > Howso, JD? I can't think of anything that would cause 1 8GB region to take more RAM than 16x512MB regions. -Todd > > > > > We are trying to load 100's of terabytes eventually.. And running even > 100s of regions per RS seems like a big hit on the memory. > > From what I saw in your metrics dump, storing the actual regions was > costing you almost nothing. But, you will encounter problems when the > global size of all memstores is getting very big (a true random write > pattern will always get you there). > > IMO, the biggest issue with the number of regions served per RS is > more about the actual data that is stored and retrieved WRT the > performance each of your node can deliver (capacity planning). > > J-D > -- Todd Lipcon Software Engineer, Cloudera
