On Fri, Oct 29, 2010 at 6:41 AM, Sean Bigdatafun <[email protected]> wrote: > I have the same doubt here. Let's say I have a totally random read pattern > (uniformly distributed). > > Now let's assume my total data size stored in HBase is 100TB on 10 > machines(not a big deal considering nowaday's disks), and the total size of > my RS' memory is 10 * 6G = 60 GB. That translate into a 60/100*1000 = 0.06% > cache hit probablity. Under random read pattern, each read is bound to > experience the "open-> read index -> .... -> read datablock" sequence, which > would be expensive. > > Any comment? >
If totally random, as per Alvin's suggestion, yes, just turn off block caching since it is doing you no good. But totally random is unusual in practise, no? St.Ack
