Re: How Hbase achieves efficient random access?

Andrew Purtell Mon, 07 Jul 2014 12:57:38 -0700

Out of curiosity Vladimir, did you feel like a fork of HBase was necessary 
because of something about the Apache HBase project's process or community? Or 
was it more of a licensing thing (noting you're not using ASL 2)?



On Jul 6, 2014, at 11:26 PM, Vladimir Rodionov <[email protected]> wrote:

>>> 
>>> Another issue is that we cache only blocks. So for workloads with random 
>>> reads where the working set of blocks does not fit into the aggregate block 
>>> cache HBase would need to load an entire block for each KV it wants to 
>>> read. For those >>workloads we might want to consider a KV cache. (See also 
>>> Vladimirs BigBase - https://github.com/VladRodionov/bigbase).
> 
> Yes, the upcoming first release of BigBase (later this month) will have 
> support for SSD cache in row (KV) cache and block cache. You will be able to 
> use efficiently both :
> all server's RAM and available SSD disks (especially useful for those who run 
> HBase on AWS EC2: all new instances come, by default, with local SSD disks.)
> 
> Best regards,
> Vladimir Rodionov
> 
> http://www.bigbase.org
> ________________________________________
> From: lars hofhansl [[email protected]]
> Sent: Saturday, July 05, 2014 5:23 AM
> To: [email protected]
> Subject: Re: How Hbase achieves efficient random access?
> 
> What Ted and Intea said.
> 
> Are you asking out of interest or do you see performance issues?
> 
> One "issue" is that the KeyValues (KVs) in the blocks is not indexed. KVs are 
> variable length and hence once a block is loaded it needs to be searched 
> linearly in order to find the KV (or determine its absence).
> It's on my list of things to investigate noting the start offsets of all KVs 
> somewhere and hence allow a binary search the KVs.
> 
> Since blocks are small (64k by default) it might not make a difference, but 
> we should check.
> 
> Another issue is that we cache only blocks. So for workloads with random 
> reads where the working set of blocks does not fit into the aggregate block 
> cache HBase would need to load an entire block for each KV it wants to read. 
> For those workloads we might want to consider a KV cache. (See also Vladimirs 
> BigBase - https://github.com/VladRodionov/bigbase).
> 
> 
> -- Lars
> 
> 
> 
> ________________________________
> From: Ted Yu <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Friday, July 4, 2014 7:39 AM
> Subject: Re: How Hbase achieves efficient random access?
> 
> 
> For description of HFile v2, see http://hbase.apache.org/book.html#hfilev2
> 
> For block cache, see http://hbase.apache.org/book.html#block.cache
> 
> In "HBase In Action", starting page 28, there is description for read path.
> 
> Cheers
> 
> 
> 
>> On Fri, Jul 4, 2014 at 2:02 AM, Intae Kim <[email protected]> wrote:
>> 
>> Except memstore, blockcache, hfile count etc..
>> 
>> Simply stated, data are sorted in file called HFile (composed of  blocks)
>> when client try to access data, hbase search proper block in file and load
>> block to check if the block has the data.
>> 
>> See HFile Format in more details, (meta index, data index ...)
>> 
>> Good Luck!!
>> 
>> 
>> 2014-07-04 17:30 GMT+09:00 Ted Yu <[email protected]>:
>> 
>>> Please take a look at http://hbase.apache.org/book/perf.reading.html
>>> 
>>> Cheers
>>> 
>>>> On Jul 4, 2014, at 12:22 AM, yl wu <[email protected]> wrote:
>>>> 
>>>> Hi All,
>>>> 
>>>> HBase has sorted and indexed Hfile format, which enables fast lookup.
>>>> I am wondering is there any other feature help Hbase achieve efficient
>>>> random access?
>>>> I want to know the whole story, but I can't find any article talks
>> about
>>>> random access in HBase in high level.
>>>> 
>>>> Can anyone help me resolve my confusion in this?
>>>> 
>>>> Best,
>>>> Yanglin
> 
> Confidentiality Notice:  The information contained in this message, including 
> any attachments hereto, may be confidential and is intended to be read only 
> by the individual or entity to whom this message is addressed. If the reader 
> of this message is not the intended recipient or an agent or designee of the 
> intended recipient, please note that any review, use, disclosure or 
> distribution of this message or its attachments, in any form, is strictly 
> prohibited.  If you have received this message in error, please immediately 
> notify the sender and/or [email protected] and delete or destroy 
> any copy of this message and its attachments.

Re: How Hbase achieves efficient random access?

Reply via email to