Sure thing. For our table schema, each row represents one user and the row key
is that user’s unique id in our system. We currently only use one column family
in the table. The column qualifiers represent an item that has been surfaced to
that user as well as additional information to differentiate the way the item
has been surfaced to the user. Without getting into too many specifics, the
qualifier follows the rough format of:
“Channel-itemId-distinguisher”.
The channel here is the channel through the item was previously surfaced to the
user. The itemid is the unique id of the item that has been surfaced to the
user. A distinguisher is some attribute about how that item was surfaced to the
user.
When we run a scan, we currently only ever run it on one row at a time. It was
chosen over ‘get’ because (from our understanding) the performance difference
is negligible, and down the road using scan would allow us some more
flexibility.
The filter list that is constructed with scan works by using a
ColumnPrefixFilter as you mentioned. When a user is being communicated to on a
particular channel, we have a list of items that we want to potentially surface
for that user. So, we construct a prefix list with the channel and each of the
item ids in the form of: “channel-itemId”. Then we run a scan on that row with
that filter list using “WithOr” to get all of the matching channel-itemId
combinations currently in that row/column family in the table. This way we can
then know which of the items we want to surface to that user on that channel
have already been surfaced on that channel. The reason we query using a prefix
filter is so that we don’t need to know the ‘distinguisher’ part of the record
when writing the actual query, because the distinguisher is only relevant in
certain circumstances.
Let me know if this is the information about our query pattern that you were
looking for and if there is anything I can clarify or add.
Thanks,
Srinidhi
On 9/6/18, 12:24 PM, "Ted Yu" <[email protected]> wrote:
From the stack trace, ColumnPrefixFilter is used during scan.
Can you illustrate how various filters are formed thru FilterListWithOR ?
It would be easier for other people to reproduce the problem given your
query pattern.
Cheers
On Thu, Sep 6, 2018 at 11:43 AM Srinidhi Muppalla <[email protected]>
wrote:
> Hi Vlad,
>
> Thank you for the suggestion. I recreated the issue and attached the stack
> traces I took. Let me know if there’s any other info I can provide. We
> narrowed the issue down to occurring when upgrading from 1.3.0 to any
1.4.x
> version.
>
> Thanks,
> Srinidhi
>
> On 9/4/18, 8:19 PM, "Vladimir Rodionov" <[email protected]> wrote:
>
> Hi, Srinidhi
>
> Next time you will see this issue, take jstack of a RS several times
> in a
> row. W/o stack traces it is hard
> to tell what was going on with your cluster after upgrade.
>
> -Vlad
>
>
>
> On Tue, Sep 4, 2018 at 3:50 PM Srinidhi Muppalla <[email protected]
> >
> wrote:
>
> > Hello all,
> >
> > We are currently running Hbase 1.3.0 on an EMR cluster running EMR
> 5.5.0.
> > Recently, we attempted to upgrade our cluster to using Hbase 1.4.4
> (along
> > with upgrading our EMR cluster to 5.16). After upgrading, the CPU
> usage for
> > all of our region servers spiked up to 90%. The load_one for all of
> our
> > servers spiked from roughly 1-2 to 10 threads. After upgrading, the
> number
> > of operations to the cluster hasn’t increased. After giving the
> cluster a
> > few hours, we had to revert the upgrade. From the logs, we are
> unable to
> > tell what is occupying the CPU resources. Is this a known issue with
> 1.4.4?
> > Any guidance or ideas for debugging the cause would be greatly
> > appreciated. What are the best steps for debugging CPU usage?
> >
> > Thank you,
> > Srinidhi
> >
>
>
>