Re: Using HBase serving to replace memcached

Pamecha, Abhishek Wed, 22 Aug 2012 22:06:15 -0700

Thanks all..

i Sent from my iPad with iMstakes


On Aug 22, 2012, at 20:53, "J Mohamed Zahoor" <[email protected]> wrote:

> If you need to search row and column qualifiers you can pick  row+ col bloom 
> to help you skip blocks.
> 
> ./Zahoor@iPad
> 
> On 22-Aug-2012, at 10:58 PM, "Pamecha, Abhishek" <[email protected]> wrote:
> 
>> Great explanation. May be diverging from the thread's original question, but 
>> could you also care to explain the difference  if any, in searching for a 
>> rowkey [ that you mentioned below ] Vs searching for a specific column 
>> qualifier. Are there any optimizations for column qualifier search too or 
>> that one just needs to load all blocks that match the rowkey crieteria and 
>> then scan each one of them from start to end?
>> 
>> Thanks,
>> Abhishek
>> 
>> 
>> -----Original Message-----
>> From: Anoop Sam John [mailto:[email protected]] 
>> Sent: Wednesday, August 22, 2012 5:35 AM
>> To: [email protected]; J Mohamed Zahoor
>> Subject: RE: Using HBase serving to replace memcached
>> 
>>> I could be wrong. I think HFile index block (which is located at the 
>>> end
>>>> of HFile) is a binary search tree containing all row-key values (of 
>>>> the
>>>> HFile) in the binary search tree. Searching a specific row-key in the 
>>>> binary search tree could easily find whether a row-key exists (some 
>>>> node in the tree has the same row-key value) or not. Why we need load 
>>>> every block to find if the row exists?
>> 
>> I think there is some confusion with you people regarding the blooms and the 
>> block index.I will try to clarify this point.
>> Block index will be there with every HFile. Within an HFile the data will be 
>> written as multiple blocks. While reading data block by block only HBase 
>> read data from the HDFS layer. The block index contains the information 
>> regarding the blocks within that HFile. The information include the start 
>> and end rowkeys which resides in that particular block and the block 
>> information like offset of that block and its length etc. Now when a request 
>> comes for getting a rowkey 'x' all the HFiles within that region need to be 
>> checked.[KV can be present in any of the HFile] Now in order to know this 
>> row will be present in which block within an HFile, this block index will be 
>> used. Well this block index will be there in memory always. This lookup will 
>> tell only the possible block in which the row is present. HBase will load 
>> that block and will read through it to get the row which we are interested 
>> in now.
>> Bloom is like it will have information about each and every row added into 
>> that HFile[Block index wont have info about each and every row]. This bloom 
>> information will be there in memory always. So when a read request to get 
>> row 'x' in an Hfile comes, 1st the bloom is checked whether this row is 
>> there in this file or not. If this is not there, as per the bloom, no block 
>> at all will be fetched. But if bloom is not enabled, we might find one block 
>> which is having a row range such that 'x' comes in between and Hbase will 
>> load that block. So usage of blooms can avoid this IO. Hope this is clear 
>> for you now.
>> 
>> -Anoop-
>> ________________________________________
>> From: Lin Ma [[email protected]]
>> Sent: Wednesday, August 22, 2012 5:41 PM
>> To: J Mohamed Zahoor; [email protected]
>> Subject: Re: Using HBase serving to replace memcached
>> 
>> Thanks Zahoor,
>> 
>> I read through the document you referred to, I am confused about what means 
>> leaf-level index, intermediate-level index and root-level index. It is 
>> appreciate if you could give more details what they are, or point me to the 
>> related documents.
>> 
>> BTW: the document you pointed me is very good, however I miss some basic 
>> background of 3 terms I mentioned above. :-)
>> 
>> regards,
>> Lin
>> 
>> On Wed, Aug 22, 2012 at 12:51 PM, J Mohamed Zahoor <[email protected]> wrote:
>> 
>>> I could be wrong. I think HFile index block (which is located at the 
>>> end
>>>> of HFile) is a binary search tree containing all row-key values (of 
>>>> the
>>>> HFile) in the binary search tree. Searching a specific row-key in the 
>>>> binary search tree could easily find whether a row-key exists (some 
>>>> node in the tree has the same row-key value) or not. Why we need load 
>>>> every block to find if the row exists?
>>>> 
>>>> 
>>> Hmm...
>>> It is a multilevel index. Only the root Index's (Data, Meta etc) are 
>>> loaded when a region is opened. The rest of the tree (intermediate and 
>>> leaf
>>> index's) are present in each block level.
>>> I am assuming a HFile v2 here for the discussion.
>>> Read this for more clarity http://hbase.apache.org/book/apes03.html
>>> 
>>> Nice discussion. You made me read lot of things. :-) Now i will dig in 
>>> to the code and check this out.
>>> 
>>> ./Zahoor
>>>

Re: Using HBase serving to replace memcached

Reply via email to