Yeah..thanks a lot..it perfectly sums it up.. --Chandra
On Sunday, 30 March 2014, lars hofhansl <[email protected]> wrote: > Fundamentally HBase performs a merge sort. There is a clearly defined sort > order between KeyValues: > 1. row key > 2. column family > 3. column qualifier > 4. timestamp > (there are a few rules and wrinkles about MVCC visibility of changes) > > row key, column family, and column qualifier are compared > lexicographically, the the timestamps are long values and sorted in reverse > chronological order (so the newest sort first). > > The memstore is a sorted data structure (a skip list set), and HFiles are > sorted as well. > So HBase will "simply" perform a merge sort between all sources (memstore, > HFiles, etc), and return the KeyValues in order. > The block cache does not enter this discussion from a correctness > viewpoint, it is just a means to access data in HFiles more efficiently. > > Does this answer your question? > > -- Lars > > > > ________________________________ > From: chandra kant <[email protected] <javascript:;>> > To: "[email protected] <javascript:;>" > <[email protected]<javascript:;> > > > Sent: Sunday, March 30, 2014 12:12 AM > Subject: Re: Cache invalidation in Blockcache > > > I am using habse 94 version . Just one clarification - if I am requesting > just a single row which is still in memstore , then read operation will > simply send back this result to client. This latest version of row won't be > cached in Blockcache. Blockcaching will only happen if data is read from > storefiles(Hfile). > What if latest version of my row is in memstore and rest 2 versions are in > Hfile and I want all 3 versions? In this case, whether cached block with > that row key will be evicted from Blockcache? > > Thanks > Chandra > On Sunday, 30 March 2014, Anoop John <[email protected] <javascript:;>> > wrote: > > > >Also, if row is changed by some write , > > then it will be reloaded in Blockcache along with the Hfile it belongs > to > > ,if Blockcache is enabled on table > > > > That statement is not so correct.. Because there is no row wise caching. > > It is just block of KVs being cached. So a write will not deal with > block > > cache as such. This write will go to Memstore. During read yes mostly > > this version in memstore will come out (as this is most recent) .. If > > maxversions for that table cf is >1 and Scan is requesting more than one > > version, mutiple versions of a cell can come out. Which version u r > > using? > > > > -Anoop- > > > > On Sun, Mar 30, 2014 at 12:04 PM, chandra kant < > > [email protected] <javascript:;> <javascript:;>> wrote: > > > > > Thanks anoop.. > > > Here is my understanding.. > > > basically memstores will be scanned no matter whether requested row is > > > already present in the Blockcache . Also, if row is changed by some > > write , > > > then it will be reloaded in Blockcache along with the Hfile it belongs > > to > > > ,if Blockcache is enabled on table . > > > > > > Thanks.. > > > Chandra > > > > > > On Sunday, 30 March 2014, Anoop John <[email protected]<javascript:;> > <javascript:;>> > > wrote: > > > > > > > In block cache data is cached not as rows.. As u know when writing > > > HFiles, > > > > one HFile will logically split into blocks (With def size of 64K) . > > > During > > > > reads data is read from files as blocks. (Even if u do a single row > > get) > > > > from file HBase has to to read atleast one block. The block cache > > > caches > > > > these blocks. So during read if we find the requested block being in > > the > > > > cache, we wont read again from HDFS. This way the block cache helps. > > > > > > > > So the 1st question answer is no. > > > > > > > > During reading, it is not like 1st check in memstore and then in > block > > > > cache. It is like a Heap of scanners on the memstore and on all > > HFiles. > > > > KVs comes out of this scanner as per the result of KV comparator > > > > comparison. Compare row, cf, family, TS and finally a memstoreTS > > > (which > > > > is like increasing on every write) So mostly a KV from memstore will > > > > normally comes out 1st before those from files. But during writes > one > > > can > > > > always specify TS, if some one writes explicetly with TS and 1st > write > > > some > > > > future TS cell and it got flushed to a file and later write a past TS > > kv > > > > and it is in memstore , the above said normal case may not come > > > > applicable.. Hope I make it clear for u.. Again when u read from > > Files, > > > > files are read as block by block and during that time check in Cache. > > If > > > > that block of this file is already read into cache, there wont be an > > IO. > > > > > > > > -Anoop- > > > > On Sun, Mar 30, 2014 at 11:44 AM, chandra kant < > > > > [email protected] <javascript:;> <javascript:;> > <javascript:;>> wrote: > > > > > > > > > > Hi, > > > > > I have Blockcache enabled on my table. So, I read a row and it's > > stored > > > > in > > > > > Blockcache . Next, I do a write on that row and I read it again . > > > > > My question is - does writing that row invalidates the entry of > that > > > row > > > > > in Blockcache ? > > > > > Also, while reading , does RegionScanner first check memstore for > any > > > > > updates regrading that row or Blockcache ? > > > > > It's quite confusing from what I have read.. > > > > > Thanks > > > > > Chandra > > > > > > > > > > > > > >
