Hi guys, Thanks so much for answering my questions. I really appreciate that. They helps a lot!
I have a few more follow up questions though. 1. about the row searching mechanism, I understand the part before the HBase locate where the row resides in which region. I am confused after that. So, I am going to write down what I understand so far, please correct me if it's wrong. a. The HRegion Store identifies where the row is in which HFile. b. There is a block index in HFile identify which block this row resides. c. If the row size is smaller than block size (which mean a block has multiple rows), HBase has to traverse in that block to locate the row matching the key. The traverse is sequence traverse. 2. And if the row size is larger than the block size, what's going to happen? Does the block index in HFile point to multiple blocks which contains different cells of that row? 3. Does a column family has to reside inside one block, which means a column family cannot be larger than a block? Many thanks! William On Thu, Oct 14, 2010 at 2:16 PM, Jean-Daniel Cryans <[email protected]> wrote: >> If you could answer any of these >> following questions, I would greatly grateful for that. > > People usually give me beer in exchange for quick help, let me know if > that works for you ;) > >> >> 1. For cell size, why it should not be larger than 20m in general? > > General answer: it pokes HBase in all the corner cases. You have to > change a lot of default configs in order to keep some sort of > efficiency. > >> >> 2. What is the block size if the cell is 20m? Can a cell covers multiple >> blocks? > > No, one HFile block per cell (KeyValue) in this case. It basically > gives you a perfect index. > >> >> 3. For single cell column family (it has only one cell), does it share >> the same size limit as cell? In other words, does single column family >> should be smaller than 20m? > > It's the same to me. > >> >> 4. Is there any advantage to put rows close in HBase, if these rows >> have a high chance to be queried together? > > If you do Scans, then you want your rows together right? > >> >> 5. Any general rule for row size? > > Try not to go into the MBs, it's currently missing some optimizations > that would make this use case work perfectly. > >> >> 6. Where does the HReigion host the row keys in HFile or other files? > > Block index in HFile, not all the row keys are there if a single block > fits more than one row. > > J-D >
