Out of curiosity Vladimir, did you feel like a fork of HBase was necessary because of something about the Apache HBase project's process or community? Or was it more of a licensing thing (noting you're not using ASL 2)?
On Jul 6, 2014, at 11:26 PM, Vladimir Rodionov <[email protected]> wrote: >>> >>> Another issue is that we cache only blocks. So for workloads with random >>> reads where the working set of blocks does not fit into the aggregate block >>> cache HBase would need to load an entire block for each KV it wants to >>> read. For those >>workloads we might want to consider a KV cache. (See also >>> Vladimirs BigBase - https://github.com/VladRodionov/bigbase). > > Yes, the upcoming first release of BigBase (later this month) will have > support for SSD cache in row (KV) cache and block cache. You will be able to > use efficiently both : > all server's RAM and available SSD disks (especially useful for those who run > HBase on AWS EC2: all new instances come, by default, with local SSD disks.) > > Best regards, > Vladimir Rodionov > > http://www.bigbase.org > ________________________________________ > From: lars hofhansl [[email protected]] > Sent: Saturday, July 05, 2014 5:23 AM > To: [email protected] > Subject: Re: How Hbase achieves efficient random access? > > What Ted and Intea said. > > Are you asking out of interest or do you see performance issues? > > One "issue" is that the KeyValues (KVs) in the blocks is not indexed. KVs are > variable length and hence once a block is loaded it needs to be searched > linearly in order to find the KV (or determine its absence). > It's on my list of things to investigate noting the start offsets of all KVs > somewhere and hence allow a binary search the KVs. > > Since blocks are small (64k by default) it might not make a difference, but > we should check. > > Another issue is that we cache only blocks. So for workloads with random > reads where the working set of blocks does not fit into the aggregate block > cache HBase would need to load an entire block for each KV it wants to read. > For those workloads we might want to consider a KV cache. (See also Vladimirs > BigBase - https://github.com/VladRodionov/bigbase). > > > -- Lars > > > > ________________________________ > From: Ted Yu <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Friday, July 4, 2014 7:39 AM > Subject: Re: How Hbase achieves efficient random access? > > > For description of HFile v2, see http://hbase.apache.org/book.html#hfilev2 > > For block cache, see http://hbase.apache.org/book.html#block.cache > > In "HBase In Action", starting page 28, there is description for read path. > > Cheers > > > >> On Fri, Jul 4, 2014 at 2:02 AM, Intae Kim <[email protected]> wrote: >> >> Except memstore, blockcache, hfile count etc.. >> >> Simply stated, data are sorted in file called HFile (composed of blocks) >> when client try to access data, hbase search proper block in file and load >> block to check if the block has the data. >> >> See HFile Format in more details, (meta index, data index ...) >> >> Good Luck!! >> >> >> 2014-07-04 17:30 GMT+09:00 Ted Yu <[email protected]>: >> >>> Please take a look at http://hbase.apache.org/book/perf.reading.html >>> >>> Cheers >>> >>>> On Jul 4, 2014, at 12:22 AM, yl wu <[email protected]> wrote: >>>> >>>> Hi All, >>>> >>>> HBase has sorted and indexed Hfile format, which enables fast lookup. >>>> I am wondering is there any other feature help Hbase achieve efficient >>>> random access? >>>> I want to know the whole story, but I can't find any article talks >> about >>>> random access in HBase in high level. >>>> >>>> Can anyone help me resolve my confusion in this? >>>> >>>> Best, >>>> Yanglin > > Confidentiality Notice: The information contained in this message, including > any attachments hereto, may be confidential and is intended to be read only > by the individual or entity to whom this message is addressed. If the reader > of this message is not the intended recipient or an agent or designee of the > intended recipient, please note that any review, use, disclosure or > distribution of this message or its attachments, in any form, is strictly > prohibited. If you have received this message in error, please immediately > notify the sender and/or [email protected] and delete or destroy > any copy of this message and its attachments.
