> > I could be wrong. I think HFile index block (which is located at the end > of HFile) is a binary search tree containing all row-key values (of the > HFile) in the binary search tree. Searching a specific row-key in the > binary search tree could easily find whether a row-key exists (some node in > the tree has the same row-key value) or not. Why we need load every block > to find if the row exists? > > Hmm... It is a multilevel index. Only the root Index's (Data, Meta etc) are loaded when a region is opened. The rest of the tree (intermediate and leaf index's) are present in each block level. I am assuming a HFile v2 here for the discussion. Read this for more clarity http://hbase.apache.org/book/apes03.html
Nice discussion. You made me read lot of things. :-) Now i will dig in to the code and check this out. ./Zahoor
