Re: Extremely high CPU usage after upgrading to Hbase 1.4.4

Srinidhi Muppalla Mon, 10 Sep 2018 12:09:02 -0700

Hi Ted, 

The highest number of filters used is 10, but the average is generally close to 
1. Is it possible the CPU usage spike has to do with Hbase internal maintenance 
operations? It looks like post-upgrade the spike isn’t correlated with the 
frequency of reads/writes we are making, because the high CPU usage persisted 
when the number of operations went down.


Thank you, 
Srinidhi

On 9/8/18, 9:44 AM, "Ted Yu" <[email protected]> wrote:

    Srinidhi :
    Do you know the average / highest number of ColumnPrefixFilter's in the
    FilterList ?
    
    Thanks
    
    On Fri, Sep 7, 2018 at 10:00 PM Ted Yu <[email protected]> wrote:
    
    > Thanks for detailed background information.
    >
    > I assume your code has done de-dup for the filters contained in
    > FilterListWithOR.
    >
    > I took a look at JIRAs which
    > touched hbase-client/src/main/java/org/apache/hadoop/hbase/filter in
    > branch-1.4
    > There were a few patches (some were very big) since the release of 1.3.0
    > So it is not obvious at first glance which one(s) might be related.
    >
    > I noticed ColumnPrefixFilter.getNextCellHint (and
    > KeyValueUtil.createFirstOnRow) appearing many times in the stack trace.
    >
    > I plan to dig more in this area.
    >
    > Cheers
    >
    > On Fri, Sep 7, 2018 at 11:30 AM Srinidhi Muppalla <[email protected]>
    > wrote:
    >
    >> Sure thing. For our table schema, each row represents one user and the
    >> row key is that user’s unique id in our system. We currently only use one
    >> column family in the table. The column qualifiers represent an item that
    >> has been surfaced to that user as well as additional information to
    >> differentiate the way the item has been surfaced to the user. Without
    >> getting into too many specifics, the qualifier follows the rough format 
of:
    >>
    >> “Channel-itemId-distinguisher”.
    >>
    >> The channel here is the channel through the item was previously surfaced
    >> to the user. The itemid is the unique id of the item that has been 
surfaced
    >> to the user. A distinguisher is some attribute about how that item was
    >> surfaced to the user.
    >>
    >> When we run a scan, we currently only ever run it on one row at a time.
    >> It was chosen over ‘get’ because (from our understanding) the performance
    >> difference is negligible, and down the road using scan would allow us 
some
    >> more flexibility.
    >>
    >> The filter list that is constructed with scan works by using a
    >> ColumnPrefixFilter as you mentioned. When a user is being communicated to
    >> on a particular channel, we have a list of items that we want to
    >> potentially surface for that user. So, we construct a prefix list with 
the
    >> channel and each of the item ids in the form of: “channel-itemId”. Then 
we
    >> run a scan on that row with that filter list using “WithOr” to get all of
    >> the matching channel-itemId combinations currently in that row/column
    >> family in the table. This way we can then know which of the items we want
    >> to surface to that user on that channel have already been surfaced on 
that
    >> channel. The reason we query using a prefix filter is so that we don’t 
need
    >> to know the ‘distinguisher’ part of the record when writing the actual
    >> query, because the distinguisher is only relevant in certain 
circumstances.
    >>
    >> Let me know if this is the information about our query pattern that you
    >> were looking for and if there is anything I can clarify or add.
    >>
    >> Thanks,
    >> Srinidhi
    >>
    >> On 9/6/18, 12:24 PM, "Ted Yu" <[email protected]> wrote:
    >>
    >>     From the stack trace, ColumnPrefixFilter is used during scan.
    >>
    >>     Can you illustrate how various filters are formed thru
    >> FilterListWithOR ?
    >>     It would be easier for other people to reproduce the problem given
    >> your
    >>     query pattern.
    >>
    >>     Cheers
    >>
    >>     On Thu, Sep 6, 2018 at 11:43 AM Srinidhi Muppalla <
    >> [email protected]>
    >>     wrote:
    >>
    >>     > Hi Vlad,
    >>     >
    >>     > Thank you for the suggestion. I recreated the issue and attached
    >> the stack
    >>     > traces I took. Let me know if there’s any other info I can provide.
    >> We
    >>     > narrowed the issue down to occurring when upgrading from 1.3.0 to
    >> any 1.4.x
    >>     > version.
    >>     >
    >>     > Thanks,
    >>     > Srinidhi
    >>     >
    >>     > On 9/4/18, 8:19 PM, "Vladimir Rodionov" <[email protected]>
    >> wrote:
    >>     >
    >>     >     Hi, Srinidhi
    >>     >
    >>     >     Next time you will see this issue, take jstack of a RS several
    >> times
    >>     > in a
    >>     >     row. W/o stack traces it is hard
    >>     >     to tell what was going on with your cluster after upgrade.
    >>     >
    >>     >     -Vlad
    >>     >
    >>     >
    >>     >
    >>     >     On Tue, Sep 4, 2018 at 3:50 PM Srinidhi Muppalla <
    >> [email protected]
    >>     > >
    >>     >     wrote:
    >>     >
    >>     >     > Hello all,
    >>     >     >
    >>     >     > We are currently running Hbase 1.3.0 on an EMR cluster
    >> running EMR
    >>     > 5.5.0.
    >>     >     > Recently, we attempted to upgrade our cluster to using Hbase
    >> 1.4.4
    >>     > (along
    >>     >     > with upgrading our EMR cluster to 5.16). After upgrading, the
    >> CPU
    >>     > usage for
    >>     >     > all of our region servers spiked up to 90%. The load_one for
    >> all of
    >>     > our
    >>     >     > servers spiked from roughly 1-2 to 10 threads. After
    >> upgrading, the
    >>     > number
    >>     >     > of operations to the cluster hasn’t increased. After giving
    >> the
    >>     > cluster a
    >>     >     > few hours, we had to revert the upgrade. From the logs, we 
are
    >>     > unable to
    >>     >     > tell what is occupying the CPU resources. Is this a known
    >> issue with
    >>     > 1.4.4?
    >>     >     > Any guidance or ideas for debugging the cause would be 
greatly
    >>     >     > appreciated.  What are the best steps for debugging CPU 
usage?
    >>     >     >
    >>     >     > Thank you,
    >>     >     > Srinidhi
    >>     >     >
    >>     >
    >>     >
    >>     >
    >>
    >>
    >>

Re: Extremely high CPU usage after upgrading to Hbase 1.4.4

Reply via email to