OK, I tried it, truncated the table and ran inserts for about a day. Now I tried flushing the table but I get a "Region is not online" error, although all the servers are up, no regions are in transition and as far as I can tell all the regions seem up. I can even read rows which are supposedly in the offline region (I'm assuming the region name indicates the first key in the region).
-eran On Wed, May 4, 2011 at 15:20, Eran Kutner <[email protected]> wrote: > J-D, > I'll try what you suggest but it is worth pointing out that my data set has > over 300M rows, however in my read test I am random reading out of a subset > that contains only 0.5M rows (5000 rows in each of the 100 key ranges in the > table). > > -eran > > > > On Tue, May 3, 2011 at 23:29, Jean-Daniel Cryans <[email protected]>wrote: > >> On Tue, May 3, 2011 at 6:20 AM, Eran Kutner <[email protected]> wrote: >> > Flushing, at least when I try it now, long after I stopped writing, >> doesn't >> > seem to have any effect. >> >> Bummer. >> >> > >> > In my log I see this: >> > 2011-05-03 08:57:55,384 DEBUG >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=3.39 >> GB, >> > free=897.87 MB, max=4.27 GB, blocks=54637, accesses=89411811, >> hits=75769916, >> > hitRatio=84.74%%, cachingAccesses=83656318, cachingHits=75714473, >> > cachingHitsRatio=90.50%%, evictions=1135, evicted=7887205, >> > evictedPerRun=6949.0791015625 >> > >> > and every 30 seconds or so something like this: >> > 2011-05-03 08:58:07,900 DEBUG >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction >> > started; Attempting to free 436.92 MB of total=3.63 GB >> > 2011-05-03 08:58:07,947 DEBUG >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction >> > completed; freed=436.95 MB, total=3.2 GB, single=931.65 MB, multi=2.68 >> GB, >> > memory=3.69 KB >> > >> > Now, if the entire working set I'm reading is 100MB in size, why would >> it >> > have to evict 436MB just to get it filled back in 30 seconds? >> >> I was about to ask the same question... from what I can tell from the >> this log, it seems that your working dataset is much larger than 3GB >> (the fact that it's evicting means it could be a lot more) and that's >> only on that region server. >> >> First reason that comes in mind on why it would be so much bigger is >> that you would have uploaded your dataset more than once and since >> HBase keeps versions of the data, it could accumulate. That doesn't >> explain how it would grow into GBs since by default a family only >> keeps 3 versions... unless you set that higher than the default or you >> uploaded the same data tens of times within 24 hours and the major >> compactions didn't kick in. >> >> In any case, it would be interesting that you: >> >> - truncate the table >> - re-import the data >> - force a flush >> - wait a bit until the flushes are done (should take 2-3 seconds if >> your dataset is really 100MB) >> - do a "hadoop dfs -dus" on the table's directory (should be under/hbase) >> - if the number is way out of whack, review how you are inserting >> your data. Either way, please report back. >> >> > >> > Also, what is a good value for hfile.block.cache.size (I have it now on >> .35) >> > but with 12.5GB of RAM available for the region servers it seem I should >> be >> > able to get it much higher. >> >> Depends, you also have to account for the MemStores which by default >> can use up to 40% of the heap >> (hbase.regionserver.global.memstore.upperLimit) leaving currently for >> you only 100-40-35=25% of the heap to do stuff like serving requests, >> compacting, flushing, etc. It's hard to give a good number for what >> should be left to the rest of HBase tho... >> > >
