> 1. about the row searching mechanism, I understand the part before the > HBase locate where the row resides in which region. I am confused > after that. So, I am going to write down what I understand so far, > please correct me if it's wrong. > a. The HRegion Store identifies where the row is in which HFile. > b. There is a block index in HFile identify which block this row resides. > c. If the row size is smaller than block size (which mean a block has > multiple rows), HBase has to traverse in that block to locate the row > matching the key. The traverse is sequence traverse.
More or less. > > 2. And if the row size is larger than the block size, what's going to > happen? Does the block index in HFile point to multiple blocks which > contains different cells of that row? The block index stores full keys, row+family+qualifier+timestamp, so it's not talking in terms of total row size. A single row can have multiple blocks (in multiple files) with possibly as many entries in the block index. If a single cell is larger than the block size, then the size of that block will be the size of that cell. > > 3. Does a column family has to reside inside one block, which means a > column family cannot be larger than a block? My previous answer covers this. J-D
