Re: Using HBase serving to replace memcached

anil gupta Wed, 22 Aug 2012 09:58:12 -0700

Nice explanation, Anoop. This deserves to be part of Hbase wiki.

On Wed, Aug 22, 2012 at 5:34 AM, Anoop Sam John <[email protected]> wrote:


> > I could be wrong. I think HFile index block (which is located at the end
> >> of HFile) is a binary search tree containing all row-key values (of the
> >> HFile) in the binary search tree. Searching a specific row-key in the
> >> binary search tree could easily find whether a row-key exists (some
> node in
> >> the tree has the same row-key value) or not. Why we need load every
> block
> >> to find if the row exists?
>
> I think there is some confusion with you people regarding the blooms and
> the block index.I will try to clarify this point.
> Block index will be there with every HFile. Within an HFile the data will
> be written as multiple blocks. While reading data block by block only HBase
> read data from the HDFS layer. The block index contains the information
> regarding the blocks within that HFile. The information include the start
> and end rowkeys which resides in that particular block and the block
> information like offset of that block and its length etc. Now when a
> request comes for getting a rowkey 'x' all the HFiles within that region
> need to be checked.[KV can be present in any of the HFile] Now in order to
> know this row will be present in which block within an HFile, this block
> index will be used. Well this block index will be there in memory always.
> This lookup will tell only the possible block in which the row is present.
> HBase will load that block and will read through it to get the row which we
> are interested in now.
> Bloom is like it will have information about each and every row added into
> that HFile[Block index wont have info about each and every row]. This bloom
> information will be there in memory always. So when a read request to get
> row 'x' in an Hfile comes, 1st the bloom is checked whether this row is
> there in this file or not. If this is not there, as per the bloom, no block
> at all will be fetched. But if bloom is not enabled, we might find one
> block which is having a row range such that 'x' comes in between and Hbase
> will load that block. So usage of blooms can avoid this IO. Hope this is
> clear for you now.
>
> -Anoop-
> ________________________________________
> From: Lin Ma [[email protected]]
> Sent: Wednesday, August 22, 2012 5:41 PM
> To: J Mohamed Zahoor; [email protected]
> Subject: Re: Using HBase serving to replace memcached
>
> Thanks Zahoor,
>
> I read through the document you referred to, I am confused about what means
> leaf-level index, intermediate-level index and root-level index. It is
> appreciate if you could give more details what they are, or point me to the
> related documents.
>
> BTW: the document you pointed me is very good, however I miss some basic
> background of 3 terms I mentioned above. :-)
>
> regards,
> Lin
>
> On Wed, Aug 22, 2012 at 12:51 PM, J Mohamed Zahoor <[email protected]>
> wrote:
>
> > I could be wrong. I think HFile index block (which is located at the end
> >> of HFile) is a binary search tree containing all row-key values (of the
> >> HFile) in the binary search tree. Searching a specific row-key in the
> >> binary search tree could easily find whether a row-key exists (some
> node in
> >> the tree has the same row-key value) or not. Why we need load every
> block
> >> to find if the row exists?
> >>
> >>
> > Hmm...
> > It is a multilevel index. Only the root Index's (Data, Meta etc) are
> > loaded when a region is opened. The rest of the tree (intermediate and
> leaf
> > index's) are present in each block level.
> > I am assuming a HFile v2 here for the discussion.
> > Read this for more clarity http://hbase.apache.org/book/apes03.html
> >
> > Nice discussion. You made me read lot of things. :-)
> > Now i will dig in to the code and check this out.
> >
> > ./Zahoor
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Using HBase serving to replace memcached

Reply via email to