Re: Sporadic memstore slowness for Read Heavy workloads

Vladimir Rodionov Mon, 27 Jan 2014 22:35:58 -0800

Varun,

There is no need to open new JIRA - there are two already:
https://issues.apache.org/jira/browse/HBASE-9769
https://issues.apache.org/jira/browse/HBASE-9778


Both with patches, you can grab and test them.

-Vladimir


On Mon, Jan 27, 2014 at 9:36 PM, Varun Sharma <[email protected]> wrote:

> Hi lars,
>
> Thanks for the background. It seems that for our case, we will have to
> consider some solution like the Facebook one, since the next column is
> always the next one - this can be a simple flag. I am going to raise a JIRA
> and we can discuss there.
>
> Thanks
> Varun
>
>
> On Sun, Jan 26, 2014 at 3:43 PM, lars hofhansl <[email protected]> wrote:
>
> > This is somewhat of a known issue, and I'm sure Vladimir will chime in
> > soon. :)
> >
> > Reseek is expensive compared to next if next would get us the KV we're
> > looking for. However, HBase does not know that ahead of time. There might
> > be a 1000 versions of the previous KV to be skipped first.
> > HBase seeks in three situation:
> > 1. Seek to the next column (there might be a lot of versions to skip)
> > 2. Seek to the next row (there might be a lot of versions and other
> > columns to skip)
> > 3. Seek to a row via a hint
> >
> > #3 is definitely useful, with that one can implement very efficient "skip
> > scans" (see the FuzzyRowFilter and what Phoenix is doing).
> > #2 is helpful if there are many columns and one only "selects" a few (and
> > of course also if there are many versions of columns)
> > #1 is only helpful when we expect there to be many versions. Or of the
> > size of a typical KV aproaches the block size, since then we'd need to
> seek
> > to the find the next block anyway.
> >
> > You might well be a victim of #1. Are your rows 10-20 columns or is that
> > just the number of column you return?
> >
> > Vladimir and myself have suggested a SMALL_ROW hint, where we instruct
> the
> > scanner to not seek to the next column or the next row, but just issue
> > next()'s until the KV is found. Another suggested approach (I think by
> the
> > Facebook guys) was to issue next() opportunistically a few times, and
> only
> > when that did not get us to ther requested KV issue a reseek.
> > I have also thought of a near/far designation of seeks. For near seeks
> > we'd do a configurable number of next()'s first, then seek.
> > "near" seeks would be those of category #1 (and maybe #2) above.
> >
> > See: HBASE-9769, HBASE-9778, HBASE-9000 (, and HBASE-9915, maybe)
> >
> > I'll look at the trace a bit closers.
> > So far my scan profiling has been focused on data in the blockcache since
> > in the normal case the vast majority of all data is found there and only
> > recent changes are in the memstore.
> >
> > -- Lars
> >
> >
> >
> >
> > ________________________________
> >  From: Varun Sharma <[email protected]>
> > To: "[email protected]" <[email protected]>; "
> [email protected]"
> > <[email protected]>
> > Sent: Sunday, January 26, 2014 1:14 PM
> > Subject: Sporadic memstore slowness for Read Heavy workloads
> >
> >
> > Hi,
> >
> > We are seeing some unfortunately low performance in the memstore - we
> have
> > researched some of the previous JIRA(s) and seen some inefficiencies in
> the
> > ConcurrentSkipListMap. The symptom is a RegionServer hitting 100 % cpu at
> > weird points in time - the bug is hard to reproduce and there isn't like
> a
> > huge # of extra reads going to that region server or any substantial
> > hotspot happening. The region server recovers the moment, we flush the
> > memstores or restart the region server. Our queries retrieve wide rows
> > which are upto 10-20 columns. A stack trace shows two things:
> >
> > 1) Time spent inside MemstoreScanner.reseek() and inside the
> > ConcurrentSkipListMap
> > 2) The reseek() is being called at the "SEEK_NEXT" column inside
> > StoreScanner - this is understandable since the rows contain many columns
> > and StoreScanner iterates one KeyValue at a time.
> >
> > So, I was looking at the code and it seems that every single time there
> is
> > a reseek call on the same memstore scanner, we make a fresh call to build
> > an iterator() on the skip list set - this means we an additional skip
> list
> > lookup for every column retrieved. SkipList lookups are O(n) and not
> O(1).
> >
> > Related JIRA HBASE 3855 made the reseek() scan some KVs and if that
> number
> > if exceeded, do a lookup. However, it seems this behaviour was reverted
> by
> > HBASE 4195 and every next row/next column is now a reseek() and a skip
> list
> > lookup rather than being an iterator.
> >
> > Are there any strong reasons against having the previous behaviour of
> > scanning a small # of keys before degenerating to a skip list lookup ?
> > Seems like it would really help for sequential memstore scans and for
> > memstore gets with wide tables (even 10-20 columns).
> >
> > Thanks
> > Varun
> >
>

Re: Sporadic memstore slowness for Read Heavy workloads

Reply via email to