Hey J-D, Thanks a lot! That has cleared a lot of my confusions. :) I really appreciate it.
William On Thu, Oct 14, 2010 at 2:51 PM, Jean-Daniel Cryans <[email protected]> wrote: >> 1. about the row searching mechanism, I understand the part before the >> HBase locate where the row resides in which region. I am confused >> after that. So, I am going to write down what I understand so far, >> please correct me if it's wrong. >> a. The HRegion Store identifies where the row is in which HFile. >> b. There is a block index in HFile identify which block this row resides. >> c. If the row size is smaller than block size (which mean a block has >> multiple rows), HBase has to traverse in that block to locate the row >> matching the key. The traverse is sequence traverse. > > More or less. > >> >> 2. And if the row size is larger than the block size, what's going to >> happen? Does the block index in HFile point to multiple blocks which >> contains different cells of that row? > > The block index stores full keys, row+family+qualifier+timestamp, so > it's not talking in terms of total row size. A single row can have > multiple blocks (in multiple files) with possibly as many entries in > the block index. If a single cell is larger than the block size, then > the size of that block will be the size of that cell. > >> >> 3. Does a column family has to reside inside one block, which means a >> column family cannot be larger than a block? > > My previous answer covers this. > > J-D >
