Re: Sporadic memstore slowness for Read Heavy workloads

Varun Sharma Mon, 27 Jan 2014 21:37:48 -0800

Hi lars,

Thanks for the background. It seems that for our case, we will have to
consider some solution like the Facebook one, since the next column is
always the next one - this can be a simple flag. I am going to raise a JIRA
and we can discuss there.


Thanks
Varun


On Sun, Jan 26, 2014 at 3:43 PM, lars hofhansl <[email protected]> wrote:

> This is somewhat of a known issue, and I'm sure Vladimir will chime in
> soon. :)
>
> Reseek is expensive compared to next if next would get us the KV we're
> looking for. However, HBase does not know that ahead of time. There might
> be a 1000 versions of the previous KV to be skipped first.
> HBase seeks in three situation:
> 1. Seek to the next column (there might be a lot of versions to skip)
> 2. Seek to the next row (there might be a lot of versions and other
> columns to skip)
> 3. Seek to a row via a hint
>
> #3 is definitely useful, with that one can implement very efficient "skip
> scans" (see the FuzzyRowFilter and what Phoenix is doing).
> #2 is helpful if there are many columns and one only "selects" a few (and
> of course also if there are many versions of columns)
> #1 is only helpful when we expect there to be many versions. Or of the
> size of a typical KV aproaches the block size, since then we'd need to seek
> to the find the next block anyway.
>
> You might well be a victim of #1. Are your rows 10-20 columns or is that
> just the number of column you return?
>
> Vladimir and myself have suggested a SMALL_ROW hint, where we instruct the
> scanner to not seek to the next column or the next row, but just issue
> next()'s until the KV is found. Another suggested approach (I think by the
> Facebook guys) was to issue next() opportunistically a few times, and only
> when that did not get us to ther requested KV issue a reseek.
> I have also thought of a near/far designation of seeks. For near seeks
> we'd do a configurable number of next()'s first, then seek.
> "near" seeks would be those of category #1 (and maybe #2) above.
>
> See: HBASE-9769, HBASE-9778, HBASE-9000 (, and HBASE-9915, maybe)
>
> I'll look at the trace a bit closers.
> So far my scan profiling has been focused on data in the blockcache since
> in the normal case the vast majority of all data is found there and only
> recent changes are in the memstore.
>
> -- Lars
>
>
>
>
> ________________________________
>  From: Varun Sharma <[email protected]>
> To: "[email protected]" <[email protected]>; "[email protected]"
> <[email protected]>
> Sent: Sunday, January 26, 2014 1:14 PM
> Subject: Sporadic memstore slowness for Read Heavy workloads
>
>
> Hi,
>
> We are seeing some unfortunately low performance in the memstore - we have
> researched some of the previous JIRA(s) and seen some inefficiencies in the
> ConcurrentSkipListMap. The symptom is a RegionServer hitting 100 % cpu at
> weird points in time - the bug is hard to reproduce and there isn't like a
> huge # of extra reads going to that region server or any substantial
> hotspot happening. The region server recovers the moment, we flush the
> memstores or restart the region server. Our queries retrieve wide rows
> which are upto 10-20 columns. A stack trace shows two things:
>
> 1) Time spent inside MemstoreScanner.reseek() and inside the
> ConcurrentSkipListMap
> 2) The reseek() is being called at the "SEEK_NEXT" column inside
> StoreScanner - this is understandable since the rows contain many columns
> and StoreScanner iterates one KeyValue at a time.
>
> So, I was looking at the code and it seems that every single time there is
> a reseek call on the same memstore scanner, we make a fresh call to build
> an iterator() on the skip list set - this means we an additional skip list
> lookup for every column retrieved. SkipList lookups are O(n) and not O(1).
>
> Related JIRA HBASE 3855 made the reseek() scan some KVs and if that number
> if exceeded, do a lookup. However, it seems this behaviour was reverted by
> HBASE 4195 and every next row/next column is now a reseek() and a skip list
> lookup rather than being an iterator.
>
> Are there any strong reasons against having the previous behaviour of
> scanning a small # of keys before degenerating to a skip list lookup ?
> Seems like it would really help for sequential memstore scans and for
> memstore gets with wide tables (even 10-20 columns).
>
> Thanks
> Varun
>

Re: Sporadic memstore slowness for Read Heavy workloads

Reply via email to