Hi Ted, Intae and Lars, Thank you very much for your replies.
@Ted, I've read the materials you mentioned. Some of them are more about performance tuning. @Lars, For now, I am asking out of interests. I've read many details about Hbase, but I still don't have a clear idea why we can say "HBase has efficient random access". Just the Hfile format and the block cache mechanism? Maybe my question is not proper, but I am just confused. Best, Yanglin 2014-07-05 20:23 GMT+08:00 lars hofhansl <[email protected]>: > What Ted and Intea said. > > Are you asking out of interest or do you see performance issues? > > One "issue" is that the KeyValues (KVs) in the blocks is not indexed. KVs > are variable length and hence once a block is loaded it needs to be > searched linearly in order to find the KV (or determine its absence). > It's on my list of things to investigate noting the start offsets of all > KVs somewhere and hence allow a binary search the KVs. > > Since blocks are small (64k by default) it might not make a difference, > but we should check. > > Another issue is that we cache only blocks. So for workloads with random > reads where the working set of blocks does not fit into the aggregate block > cache HBase would need to load an entire block for each KV it wants to > read. For those workloads we might want to consider a KV cache. (See also > Vladimirs BigBase - https://github.com/VladRodionov/bigbase). > > > -- Lars > > > > ________________________________ > From: Ted Yu <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Friday, July 4, 2014 7:39 AM > Subject: Re: How Hbase achieves efficient random access? > > > For description of HFile v2, see http://hbase.apache.org/book.html#hfilev2 > > For block cache, see http://hbase.apache.org/book.html#block.cache > > In "HBase In Action", starting page 28, there is description for read path. > > Cheers > > > > On Fri, Jul 4, 2014 at 2:02 AM, Intae Kim <[email protected]> wrote: > > > Except memstore, blockcache, hfile count etc.. > > > > Simply stated, data are sorted in file called HFile (composed of blocks) > > when client try to access data, hbase search proper block in file and > load > > block to check if the block has the data. > > > > See HFile Format in more details, (meta index, data index ...) > > > > Good Luck!! > > > > > > 2014-07-04 17:30 GMT+09:00 Ted Yu <[email protected]>: > > > > > Please take a look at http://hbase.apache.org/book/perf.reading.html > > > > > > Cheers > > > > > > On Jul 4, 2014, at 12:22 AM, yl wu <[email protected]> wrote: > > > > > > > Hi All, > > > > > > > > HBase has sorted and indexed Hfile format, which enables fast lookup. > > > > I am wondering is there any other feature help Hbase achieve > efficient > > > > random access? > > > > I want to know the whole story, but I can't find any article talks > > about > > > > random access in HBase in high level. > > > > > > > > Can anyone help me resolve my confusion in this? > > > > > > > > Best, > > > > Yanglin > > > > > >
