Hi William.  Answers inline.

> -----Original Message-----
> From: William Kang [mailto:[email protected]]
> Sent: Monday, October 18, 2010 7:48 PM
> To: hbase-user
> Subject: HBase random access in HDFS and block indices
> 
> Hi,
> Recently I have spent some efforts to try to understand the mechanisms
> of HBase to exploit possible performance tunning options. And many
> thanks to the folks who helped with my questions in this community, I
> have sent a report. But, there are still few questions left.
> 
> 1. If a HFile block contains more than one keyvalue pair, will the
> block index in HFile point out the offset for every keyvalue pair in
> that block? Or, the block index will just point out the key ranges
> inside that block, so you have to traverse inside the block until you
> meet the key you are looking for?

It is the latter.  Block index points to the start keys of each block, so you 
effectively have a range for each block.

Lots of work has gone in recently to seek/reseek/early-out when possible and 
skip unnecessary blocks.

> 2. When HBase read block to fetching the data or traverse in it, is
> this block read into memory?

Yes.  And if the block cache is turned on, it will be put into an LRU cache.

> 3. HBase blocks (64k configurable) are inside HDFS blocks (64m
> configurable), to read the HBase blocks, we have to random access the
> HDFS blocks. Even HBase can use in(p, buf, 0, x) to read a small
> portion of the larger HDFS blocks, it is still a random access. Would
> this be slow?

Yes, this is still random access.  HBase provides the indexing/retrieval/etc on 
top of HDFS to make the random read access as efficient as possible (and with 
caching) and makes random writes possible.

JG

> 
> Many thanks. I would be grateful for your answers.
> 
> 
> William

Reply via email to