What Ted and Intea said.

Are you asking out of interest or do you see performance issues?

One "issue" is that the KeyValues (KVs) in the blocks is not indexed. KVs are 
variable length and hence once a block is loaded it needs to be searched 
linearly in order to find the KV (or determine its absence).
It's on my list of things to investigate noting the start offsets of all KVs 
somewhere and hence allow a binary search the KVs.

Since blocks are small (64k by default) it might not make a difference, but we 
should check.

Another issue is that we cache only blocks. So for workloads with random reads 
where the working set of blocks does not fit into the aggregate block cache 
HBase would need to load an entire block for each KV it wants to read. For 
those workloads we might want to consider a KV cache. (See also Vladimirs 
BigBase - https://github.com/VladRodionov/bigbase).


-- Lars



________________________________
 From: Ted Yu <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Friday, July 4, 2014 7:39 AM
Subject: Re: How Hbase achieves efficient random access?
 

For description of HFile v2, see http://hbase.apache.org/book.html#hfilev2

For block cache, see http://hbase.apache.org/book.html#block.cache

In "HBase In Action", starting page 28, there is description for read path.

Cheers



On Fri, Jul 4, 2014 at 2:02 AM, Intae Kim <[email protected]> wrote:

> Except memstore, blockcache, hfile count etc..
>
> Simply stated, data are sorted in file called HFile (composed of  blocks)
> when client try to access data, hbase search proper block in file and load
> block to check if the block has the data.
>
> See HFile Format in more details, (meta index, data index ...)
>
> Good Luck!!
>
>
> 2014-07-04 17:30 GMT+09:00 Ted Yu <[email protected]>:
>
> > Please take a look at http://hbase.apache.org/book/perf.reading.html
> >
> > Cheers
> >
> > On Jul 4, 2014, at 12:22 AM, yl wu <[email protected]> wrote:
> >
> > > Hi All,
> > >
> > > HBase has sorted and indexed Hfile format, which enables fast lookup.
> > > I am wondering is there any other feature help Hbase achieve efficient
> > > random access?
> > > I want to know the whole story, but I can't find any article talks
> about
> > > random access in HBase in high level.
> > >
> > > Can anyone help me resolve my confusion in this?
> > >
> > > Best,
> > > Yanglin
> >
>

Reply via email to