Srinidhi : Do you know the average / highest number of ColumnPrefixFilter's in the FilterList ?
Thanks On Fri, Sep 7, 2018 at 10:00 PM Ted Yu <yuzhih...@gmail.com> wrote: > Thanks for detailed background information. > > I assume your code has done de-dup for the filters contained in > FilterListWithOR. > > I took a look at JIRAs which > touched hbase-client/src/main/java/org/apache/hadoop/hbase/filter in > branch-1.4 > There were a few patches (some were very big) since the release of 1.3.0 > So it is not obvious at first glance which one(s) might be related. > > I noticed ColumnPrefixFilter.getNextCellHint (and > KeyValueUtil.createFirstOnRow) appearing many times in the stack trace. > > I plan to dig more in this area. > > Cheers > > On Fri, Sep 7, 2018 at 11:30 AM Srinidhi Muppalla <srinid...@trulia.com> > wrote: > >> Sure thing. For our table schema, each row represents one user and the >> row key is that user’s unique id in our system. We currently only use one >> column family in the table. The column qualifiers represent an item that >> has been surfaced to that user as well as additional information to >> differentiate the way the item has been surfaced to the user. Without >> getting into too many specifics, the qualifier follows the rough format of: >> >> “Channel-itemId-distinguisher”. >> >> The channel here is the channel through the item was previously surfaced >> to the user. The itemid is the unique id of the item that has been surfaced >> to the user. A distinguisher is some attribute about how that item was >> surfaced to the user. >> >> When we run a scan, we currently only ever run it on one row at a time. >> It was chosen over ‘get’ because (from our understanding) the performance >> difference is negligible, and down the road using scan would allow us some >> more flexibility. >> >> The filter list that is constructed with scan works by using a >> ColumnPrefixFilter as you mentioned. When a user is being communicated to >> on a particular channel, we have a list of items that we want to >> potentially surface for that user. So, we construct a prefix list with the >> channel and each of the item ids in the form of: “channel-itemId”. Then we >> run a scan on that row with that filter list using “WithOr” to get all of >> the matching channel-itemId combinations currently in that row/column >> family in the table. This way we can then know which of the items we want >> to surface to that user on that channel have already been surfaced on that >> channel. The reason we query using a prefix filter is so that we don’t need >> to know the ‘distinguisher’ part of the record when writing the actual >> query, because the distinguisher is only relevant in certain circumstances. >> >> Let me know if this is the information about our query pattern that you >> were looking for and if there is anything I can clarify or add. >> >> Thanks, >> Srinidhi >> >> On 9/6/18, 12:24 PM, "Ted Yu" <yuzhih...@gmail.com> wrote: >> >> From the stack trace, ColumnPrefixFilter is used during scan. >> >> Can you illustrate how various filters are formed thru >> FilterListWithOR ? >> It would be easier for other people to reproduce the problem given >> your >> query pattern. >> >> Cheers >> >> On Thu, Sep 6, 2018 at 11:43 AM Srinidhi Muppalla < >> srinid...@trulia.com> >> wrote: >> >> > Hi Vlad, >> > >> > Thank you for the suggestion. I recreated the issue and attached >> the stack >> > traces I took. Let me know if there’s any other info I can provide. >> We >> > narrowed the issue down to occurring when upgrading from 1.3.0 to >> any 1.4.x >> > version. >> > >> > Thanks, >> > Srinidhi >> > >> > On 9/4/18, 8:19 PM, "Vladimir Rodionov" <vladrodio...@gmail.com> >> wrote: >> > >> > Hi, Srinidhi >> > >> > Next time you will see this issue, take jstack of a RS several >> times >> > in a >> > row. W/o stack traces it is hard >> > to tell what was going on with your cluster after upgrade. >> > >> > -Vlad >> > >> > >> > >> > On Tue, Sep 4, 2018 at 3:50 PM Srinidhi Muppalla < >> srinid...@trulia.com >> > > >> > wrote: >> > >> > > Hello all, >> > > >> > > We are currently running Hbase 1.3.0 on an EMR cluster >> running EMR >> > 5.5.0. >> > > Recently, we attempted to upgrade our cluster to using Hbase >> 1.4.4 >> > (along >> > > with upgrading our EMR cluster to 5.16). After upgrading, the >> CPU >> > usage for >> > > all of our region servers spiked up to 90%. The load_one for >> all of >> > our >> > > servers spiked from roughly 1-2 to 10 threads. After >> upgrading, the >> > number >> > > of operations to the cluster hasn’t increased. After giving >> the >> > cluster a >> > > few hours, we had to revert the upgrade. From the logs, we are >> > unable to >> > > tell what is occupying the CPU resources. Is this a known >> issue with >> > 1.4.4? >> > > Any guidance or ideas for debugging the cause would be greatly >> > > appreciated. What are the best steps for debugging CPU usage? >> > > >> > > Thank you, >> > > Srinidhi >> > > >> > >> > >> > >> >> >>