Re: Help needed! Performance related questions

William Kang Thu, 14 Oct 2010 12:45:45 -0700

Hey J-D,
Thanks a lot! That has cleared a lot of my confusions. :) I really
appreciate it.



William


On Thu, Oct 14, 2010 at 2:51 PM, Jean-Daniel Cryans <[email protected]> wrote:
>> 1. about the row searching mechanism, I understand the part before the
>> HBase locate where the row resides in which region. I am confused
>> after that. So, I am going to write down what I understand so far,
>> please correct me if it's wrong.
>> a. The HRegion Store identifies where the row is in which HFile.
>> b. There is a block index in HFile identify which block this row resides.
>> c. If the row size is smaller than block size (which mean a block has
>> multiple rows), HBase has to traverse in that block to locate the row
>> matching the key. The traverse is sequence traverse.
>
> More or less.
>
>>
>> 2. And if the row size is larger than the block size, what's going to
>> happen? Does the block index in HFile point to multiple blocks which
>> contains different cells of that row?
>
> The block index stores full keys, row+family+qualifier+timestamp, so
> it's not talking in terms of total row size. A single row can have
> multiple blocks (in multiple files) with possibly as many entries in
> the block index. If a single cell is larger than the block size, then
> the size of that block will be the size of that cell.
>
>>
>> 3. Does a column family has to reside inside one block, which means a
>> column family cannot be larger than a block?
>
> My previous answer covers this.
>
> J-D
>

Re: Help needed! Performance related questions

Reply via email to