Varun, There is no need to open new JIRA - there are two already: https://issues.apache.org/jira/browse/HBASE-9769 https://issues.apache.org/jira/browse/HBASE-9778
Both with patches, you can grab and test them. -Vladimir On Mon, Jan 27, 2014 at 9:36 PM, Varun Sharma <[email protected]> wrote: > Hi lars, > > Thanks for the background. It seems that for our case, we will have to > consider some solution like the Facebook one, since the next column is > always the next one - this can be a simple flag. I am going to raise a JIRA > and we can discuss there. > > Thanks > Varun > > > On Sun, Jan 26, 2014 at 3:43 PM, lars hofhansl <[email protected]> wrote: > > > This is somewhat of a known issue, and I'm sure Vladimir will chime in > > soon. :) > > > > Reseek is expensive compared to next if next would get us the KV we're > > looking for. However, HBase does not know that ahead of time. There might > > be a 1000 versions of the previous KV to be skipped first. > > HBase seeks in three situation: > > 1. Seek to the next column (there might be a lot of versions to skip) > > 2. Seek to the next row (there might be a lot of versions and other > > columns to skip) > > 3. Seek to a row via a hint > > > > #3 is definitely useful, with that one can implement very efficient "skip > > scans" (see the FuzzyRowFilter and what Phoenix is doing). > > #2 is helpful if there are many columns and one only "selects" a few (and > > of course also if there are many versions of columns) > > #1 is only helpful when we expect there to be many versions. Or of the > > size of a typical KV aproaches the block size, since then we'd need to > seek > > to the find the next block anyway. > > > > You might well be a victim of #1. Are your rows 10-20 columns or is that > > just the number of column you return? > > > > Vladimir and myself have suggested a SMALL_ROW hint, where we instruct > the > > scanner to not seek to the next column or the next row, but just issue > > next()'s until the KV is found. Another suggested approach (I think by > the > > Facebook guys) was to issue next() opportunistically a few times, and > only > > when that did not get us to ther requested KV issue a reseek. > > I have also thought of a near/far designation of seeks. For near seeks > > we'd do a configurable number of next()'s first, then seek. > > "near" seeks would be those of category #1 (and maybe #2) above. > > > > See: HBASE-9769, HBASE-9778, HBASE-9000 (, and HBASE-9915, maybe) > > > > I'll look at the trace a bit closers. > > So far my scan profiling has been focused on data in the blockcache since > > in the normal case the vast majority of all data is found there and only > > recent changes are in the memstore. > > > > -- Lars > > > > > > > > > > ________________________________ > > From: Varun Sharma <[email protected]> > > To: "[email protected]" <[email protected]>; " > [email protected]" > > <[email protected]> > > Sent: Sunday, January 26, 2014 1:14 PM > > Subject: Sporadic memstore slowness for Read Heavy workloads > > > > > > Hi, > > > > We are seeing some unfortunately low performance in the memstore - we > have > > researched some of the previous JIRA(s) and seen some inefficiencies in > the > > ConcurrentSkipListMap. The symptom is a RegionServer hitting 100 % cpu at > > weird points in time - the bug is hard to reproduce and there isn't like > a > > huge # of extra reads going to that region server or any substantial > > hotspot happening. The region server recovers the moment, we flush the > > memstores or restart the region server. Our queries retrieve wide rows > > which are upto 10-20 columns. A stack trace shows two things: > > > > 1) Time spent inside MemstoreScanner.reseek() and inside the > > ConcurrentSkipListMap > > 2) The reseek() is being called at the "SEEK_NEXT" column inside > > StoreScanner - this is understandable since the rows contain many columns > > and StoreScanner iterates one KeyValue at a time. > > > > So, I was looking at the code and it seems that every single time there > is > > a reseek call on the same memstore scanner, we make a fresh call to build > > an iterator() on the skip list set - this means we an additional skip > list > > lookup for every column retrieved. SkipList lookups are O(n) and not > O(1). > > > > Related JIRA HBASE 3855 made the reseek() scan some KVs and if that > number > > if exceeded, do a lookup. However, it seems this behaviour was reverted > by > > HBASE 4195 and every next row/next column is now a reseek() and a skip > list > > lookup rather than being an iterator. > > > > Are there any strong reasons against having the previous behaviour of > > scanning a small # of keys before degenerating to a skip list lookup ? > > Seems like it would really help for sequential memstore scans and for > > memstore gets with wide tables (even 10-20 columns). > > > > Thanks > > Varun > > >
