Re: Help needed! Performance related questions

Jean-Daniel Cryans Thu, 14 Oct 2010 11:51:50 -0700

> 1. about the row searching mechanism, I understand the part before the
> HBase locate where the row resides in which region. I am confused
> after that. So, I am going to write down what I understand so far,
> please correct me if it's wrong.
> a. The HRegion Store identifies where the row is in which HFile.
> b. There is a block index in HFile identify which block this row resides.
> c. If the row size is smaller than block size (which mean a block has
> multiple rows), HBase has to traverse in that block to locate the row
> matching the key. The traverse is sequence traverse.


More or less.

>
> 2. And if the row size is larger than the block size, what's going to
> happen? Does the block index in HFile point to multiple blocks which
> contains different cells of that row?

The block index stores full keys, row+family+qualifier+timestamp, so
it's not talking in terms of total row size. A single row can have
multiple blocks (in multiple files) with possibly as many entries in
the block index. If a single cell is larger than the block size, then
the size of that block will be the size of that cell.

>
> 3. Does a column family has to reside inside one block, which means a
> column family cannot be larger than a block?

My previous answer covers this.

J-D

Re: Help needed! Performance related questions

Reply via email to