Re: HBase random access in HDFS and block indices

Ryan Rawson Mon, 18 Oct 2010 19:59:16 -0700

On Mon, Oct 18, 2010 at 7:49 PM, William Kang <[email protected]> wrote:
> Hi,
> Recently I have spent some efforts to try to understand the mechanisms
> of HBase to exploit possible performance tunning options. And many
> thanks to the folks who helped with my questions in this community, I
> have sent a report. But, there are still few questions left.
>
> 1. If a HFile block contains more than one keyvalue pair, will the
> block index in HFile point out the offset for every keyvalue pair in
> that block? Or, the block index will just point out the key ranges
> inside that block, so you have to traverse inside the block until you
> meet the key you are looking for?


The block index contains the first key for every block.  It therefore
defines in an [a,b) manner the range of each block. Once a block has
been selected to read from, it is read into memory then iterated over
until the key in question has been found (or the closest match has
been found).

> 2. When HBase read block to fetching the data or traverse in it, is
> this block read into memory?

yes, the entire block at a time is read in a single read operation.

>
> 3. HBase blocks (64k configurable) are inside HDFS blocks (64m
> configurable), to read the HBase blocks, we have to random access the
> HDFS blocks. Even HBase can use in(p, buf, 0, x) to read a small
> portion of the larger HDFS blocks, it is still a random access. Would
> this be slow?

Random access reads are not necessarily slow, they require several things:
- disk seeks to the data in question
- disk seeks to the checksum files in question
- checksum computation and verification

While not particularly slow, this could probably be optimized a bit.

Most of the issues with random reads in HDFS is parallelizing the
reads and doing as much io-pushdown/scheduling as possible without
consuming an excess of sockets and threads.  The actual speed can be
excellent, or not, depending on how busy the IO subsystem is.


>
> Many thanks. I would be grateful for your answers.
>
>
> William
>

Re: HBase random access in HDFS and block indices

Reply via email to