Thanks all.. i Sent from my iPad with iMstakes
On Aug 22, 2012, at 20:53, "J Mohamed Zahoor" <[email protected]> wrote: > If you need to search row and column qualifiers you can pick row+ col bloom > to help you skip blocks. > > ./Zahoor@iPad > > On 22-Aug-2012, at 10:58 PM, "Pamecha, Abhishek" <[email protected]> wrote: > >> Great explanation. May be diverging from the thread's original question, but >> could you also care to explain the difference if any, in searching for a >> rowkey [ that you mentioned below ] Vs searching for a specific column >> qualifier. Are there any optimizations for column qualifier search too or >> that one just needs to load all blocks that match the rowkey crieteria and >> then scan each one of them from start to end? >> >> Thanks, >> Abhishek >> >> >> -----Original Message----- >> From: Anoop Sam John [mailto:[email protected]] >> Sent: Wednesday, August 22, 2012 5:35 AM >> To: [email protected]; J Mohamed Zahoor >> Subject: RE: Using HBase serving to replace memcached >> >>> I could be wrong. I think HFile index block (which is located at the >>> end >>>> of HFile) is a binary search tree containing all row-key values (of >>>> the >>>> HFile) in the binary search tree. Searching a specific row-key in the >>>> binary search tree could easily find whether a row-key exists (some >>>> node in the tree has the same row-key value) or not. Why we need load >>>> every block to find if the row exists? >> >> I think there is some confusion with you people regarding the blooms and the >> block index.I will try to clarify this point. >> Block index will be there with every HFile. Within an HFile the data will be >> written as multiple blocks. While reading data block by block only HBase >> read data from the HDFS layer. The block index contains the information >> regarding the blocks within that HFile. The information include the start >> and end rowkeys which resides in that particular block and the block >> information like offset of that block and its length etc. Now when a request >> comes for getting a rowkey 'x' all the HFiles within that region need to be >> checked.[KV can be present in any of the HFile] Now in order to know this >> row will be present in which block within an HFile, this block index will be >> used. Well this block index will be there in memory always. This lookup will >> tell only the possible block in which the row is present. HBase will load >> that block and will read through it to get the row which we are interested >> in now. >> Bloom is like it will have information about each and every row added into >> that HFile[Block index wont have info about each and every row]. This bloom >> information will be there in memory always. So when a read request to get >> row 'x' in an Hfile comes, 1st the bloom is checked whether this row is >> there in this file or not. If this is not there, as per the bloom, no block >> at all will be fetched. But if bloom is not enabled, we might find one block >> which is having a row range such that 'x' comes in between and Hbase will >> load that block. So usage of blooms can avoid this IO. Hope this is clear >> for you now. >> >> -Anoop- >> ________________________________________ >> From: Lin Ma [[email protected]] >> Sent: Wednesday, August 22, 2012 5:41 PM >> To: J Mohamed Zahoor; [email protected] >> Subject: Re: Using HBase serving to replace memcached >> >> Thanks Zahoor, >> >> I read through the document you referred to, I am confused about what means >> leaf-level index, intermediate-level index and root-level index. It is >> appreciate if you could give more details what they are, or point me to the >> related documents. >> >> BTW: the document you pointed me is very good, however I miss some basic >> background of 3 terms I mentioned above. :-) >> >> regards, >> Lin >> >> On Wed, Aug 22, 2012 at 12:51 PM, J Mohamed Zahoor <[email protected]> wrote: >> >>> I could be wrong. I think HFile index block (which is located at the >>> end >>>> of HFile) is a binary search tree containing all row-key values (of >>>> the >>>> HFile) in the binary search tree. Searching a specific row-key in the >>>> binary search tree could easily find whether a row-key exists (some >>>> node in the tree has the same row-key value) or not. Why we need load >>>> every block to find if the row exists? >>>> >>>> >>> Hmm... >>> It is a multilevel index. Only the root Index's (Data, Meta etc) are >>> loaded when a region is opened. The rest of the tree (intermediate and >>> leaf >>> index's) are present in each block level. >>> I am assuming a HFile v2 here for the discussion. >>> Read this for more clarity http://hbase.apache.org/book/apes03.html >>> >>> Nice discussion. You made me read lot of things. :-) Now i will dig in >>> to the code and check this out. >>> >>> ./Zahoor >>>
