RE: HBase random access in HDFS and block indices

Michael Segel Mon, 01 Nov 2010 09:28:53 -0700

> Date: Fri, 29 Oct 2010 10:01:24 -0700
> Subject: Re: HBase random access in HDFS and block indices
> From: [email protected]
> To: [email protected]
> 
> On Fri, Oct 29, 2010 at 6:41 AM, Sean Bigdatafun
> <[email protected]> wrote:
> > I have the same doubt here. Let's say I have a totally random read pattern
> > (uniformly distributed).
> >
> > Now let's assume my total data size stored in HBase is 100TB on 10
> > machines(not a big deal considering nowaday's disks), and the total size of
> > my RS' memory is 10 * 6G = 60 GB. That translate into a 60/100*1000 = 0.06%
> > cache hit probablity. Under random read pattern, each read is bound to
> > experience the "open-> read index -> .... -> read datablock" sequence, which
> > would be expensive.
> >
> > Any comment?
> >
> 
> If totally random, as per Alvin's suggestion, yes, just turn off block
> caching since it is doing you no good.
> 
> But totally random is unusual in practise, no?
> 
> St.Ack

Uhm... not exactly.

One of the benefits of HBase is that it should scale in a *near* linear fashion.

So if we don't know how the data is to be accessed, or we know that there are a 
couple of access patterns that are orthogonal to each other, putting the data 
in to the cloud in a 'random' fashion should provide consistent read access 
times.

So the design of 'random' stored data shouldn't be that unusual. It just means 
you're going to have a couple of different indexes. ;-)
RE: HBase random access in HDFS and block indices

Reply via email to