Hi Srinidhi, We are also facing a similar problem in HBase-1.4.8. We're curious to know what you did after this to resolve the issue. Did you revert back to a previous version? I've seen a Jira regarding this issue. ( https://issues.apache.org/jira/browse/HBASE-21620)
@Ted Yu <yuzhih...@gmail.com> any update on this? On Tue, Sep 11, 2018 at 4:11 AM Srinidhi Muppalla <srinid...@trulia.com> wrote: > It is during a period when the number of client operations was relatively > low. It wasn’t zero, but it was definitely off peak hours. > > On 9/10/18, 12:16 PM, "Ted Yu" <yuzhih...@gmail.com> wrote: > > In the previous stack trace you sent, shortCompactions and > longCompactions > threads were not active. > > Was the stack trace captured during period when the number of client > operations was low ? > > If not, can you capture stack trace during off peak hours ? > > Cheers > > On Mon, Sep 10, 2018 at 12:08 PM Srinidhi Muppalla < > srinid...@trulia.com> > wrote: > > > Hi Ted, > > > > The highest number of filters used is 10, but the average is > generally > > close to 1. Is it possible the CPU usage spike has to do with Hbase > > internal maintenance operations? It looks like post-upgrade the > spike isn’t > > correlated with the frequency of reads/writes we are making, because > the > > high CPU usage persisted when the number of operations went down. > > > > Thank you, > > Srinidhi > > > > On 9/8/18, 9:44 AM, "Ted Yu" <yuzhih...@gmail.com> wrote: > > > > Srinidhi : > > Do you know the average / highest number of ColumnPrefixFilter's > in the > > FilterList ? > > > > Thanks > > > > On Fri, Sep 7, 2018 at 10:00 PM Ted Yu <yuzhih...@gmail.com> > wrote: > > > > > Thanks for detailed background information. > > > > > > I assume your code has done de-dup for the filters contained in > > > FilterListWithOR. > > > > > > I took a look at JIRAs which > > > touched > hbase-client/src/main/java/org/apache/hadoop/hbase/filter in > > > branch-1.4 > > > There were a few patches (some were very big) since the > release of > > 1.3.0 > > > So it is not obvious at first glance which one(s) might be > related. > > > > > > I noticed ColumnPrefixFilter.getNextCellHint (and > > > KeyValueUtil.createFirstOnRow) appearing many times in the > stack > > trace. > > > > > > I plan to dig more in this area. > > > > > > Cheers > > > > > > On Fri, Sep 7, 2018 at 11:30 AM Srinidhi Muppalla < > > srinid...@trulia.com> > > > wrote: > > > > > >> Sure thing. For our table schema, each row represents one > user and > > the > > >> row key is that user’s unique id in our system. We currently > only > > use one > > >> column family in the table. The column qualifiers represent > an item > > that > > >> has been surfaced to that user as well as additional > information to > > >> differentiate the way the item has been surfaced to the user. > > Without > > >> getting into too many specifics, the qualifier follows the > rough > > format of: > > >> > > >> “Channel-itemId-distinguisher”. > > >> > > >> The channel here is the channel through the item was > previously > > surfaced > > >> to the user. The itemid is the unique id of the item that has > been > > surfaced > > >> to the user. A distinguisher is some attribute about how that > item > > was > > >> surfaced to the user. > > >> > > >> When we run a scan, we currently only ever run it on one row > at a > > time. > > >> It was chosen over ‘get’ because (from our understanding) the > > performance > > >> difference is negligible, and down the road using scan would > allow > > us some > > >> more flexibility. > > >> > > >> The filter list that is constructed with scan works by using a > > >> ColumnPrefixFilter as you mentioned. When a user is being > > communicated to > > >> on a particular channel, we have a list of items that we want > to > > >> potentially surface for that user. So, we construct a prefix > list > > with the > > >> channel and each of the item ids in the form of: > “channel-itemId”. > > Then we > > >> run a scan on that row with that filter list using “WithOr” > to get > > all of > > >> the matching channel-itemId combinations currently in that > > row/column > > >> family in the table. This way we can then know which of the > items > > we want > > >> to surface to that user on that channel have already been > surfaced > > on that > > >> channel. The reason we query using a prefix filter is so that > we > > don’t need > > >> to know the ‘distinguisher’ part of the record when writing > the > > actual > > >> query, because the distinguisher is only relevant in certain > > circumstances. > > >> > > >> Let me know if this is the information about our query > pattern that > > you > > >> were looking for and if there is anything I can clarify or > add. > > >> > > >> Thanks, > > >> Srinidhi > > >> > > >> On 9/6/18, 12:24 PM, "Ted Yu" <yuzhih...@gmail.com> wrote: > > >> > > >> From the stack trace, ColumnPrefixFilter is used during > scan. > > >> > > >> Can you illustrate how various filters are formed thru > > >> FilterListWithOR ? > > >> It would be easier for other people to reproduce the > problem > > given > > >> your > > >> query pattern. > > >> > > >> Cheers > > >> > > >> On Thu, Sep 6, 2018 at 11:43 AM Srinidhi Muppalla < > > >> srinid...@trulia.com> > > >> wrote: > > >> > > >> > Hi Vlad, > > >> > > > >> > Thank you for the suggestion. I recreated the issue and > > attached > > >> the stack > > >> > traces I took. Let me know if there’s any other info I > can > > provide. > > >> We > > >> > narrowed the issue down to occurring when upgrading from > > 1.3.0 to > > >> any 1.4.x > > >> > version. > > >> > > > >> > Thanks, > > >> > Srinidhi > > >> > > > >> > On 9/4/18, 8:19 PM, "Vladimir Rodionov" < > > vladrodio...@gmail.com> > > >> wrote: > > >> > > > >> > Hi, Srinidhi > > >> > > > >> > Next time you will see this issue, take jstack of a > RS > > several > > >> times > > >> > in a > > >> > row. W/o stack traces it is hard > > >> > to tell what was going on with your cluster after > upgrade. > > >> > > > >> > -Vlad > > >> > > > >> > > > >> > > > >> > On Tue, Sep 4, 2018 at 3:50 PM Srinidhi Muppalla < > > >> srinid...@trulia.com > > >> > > > > >> > wrote: > > >> > > > >> > > Hello all, > > >> > > > > >> > > We are currently running Hbase 1.3.0 on an EMR > cluster > > >> running EMR > > >> > 5.5.0. > > >> > > Recently, we attempted to upgrade our cluster to > using > > Hbase > > >> 1.4.4 > > >> > (along > > >> > > with upgrading our EMR cluster to 5.16). After > > upgrading, the > > >> CPU > > >> > usage for > > >> > > all of our region servers spiked up to 90%. The > > load_one for > > >> all of > > >> > our > > >> > > servers spiked from roughly 1-2 to 10 threads. > After > > >> upgrading, the > > >> > number > > >> > > of operations to the cluster hasn’t increased. > After > > giving > > >> the > > >> > cluster a > > >> > > few hours, we had to revert the upgrade. From the > logs, > > we are > > >> > unable to > > >> > > tell what is occupying the CPU resources. Is this > a > > known > > >> issue with > > >> > 1.4.4? > > >> > > Any guidance or ideas for debugging the cause > would be > > greatly > > >> > > appreciated. What are the best steps for > debugging CPU > > usage? > > >> > > > > >> > > Thank you, > > >> > > Srinidhi > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > >> > > > > > > > > > -- <https://about.me/karthick.r?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=gmail_api&utm_content=thumb> Karthick R about.me/karthick.r <https://about.me/karthick.r?promo=email_sig&utm_source=product&utm_medium=email_sig&utm_campaign=gmail_api&utm_content=thumb>