Re: Schema design question - Hot Key concerns

Sam Seigal Fri, 18 Nov 2011 10:03:20 -0800

One of the concerns I see with this schema is if one of the shows
becomes hot. Since you are maintaining your bookings at the column
level,
a hot "row" cannot be partitioned across regions. Hbase is atomic at
the row level. Therefore, different clients updating to the same
SHOW_ID
will compete with each other. The throughput on a single row is
limited because operations at the row level are atomic.


See this discussion on Quora:

http://www.quora.com/Is-there-a-limit-to-the-number-of-columns-in-an-HBase-row

I will let the experts comment further.


On Fri, Nov 18, 2011 at 9:33 AM, Suraj Varma <[email protected]> wrote:
> I have an HBase schema design question that I wanted to discuss with the list.
>
> Let's say we have a "wide" table design that has a table with one
> column family containing "show bookings", say.
>
> RowKey: SHOW_ID
> Columns: SEATS_AVAILABLE, BOOKING_<#1>, BOOKING_<#2>, BOOKING_<#3>, etc
> Values: <remaining available seats>, <seats booked>, <seats booked,
> <seats booked>, etc
>
> Each "SHOW_ID" will have variable number of columns.
>
> Usage Pattern:
> 1) Multiple clients / threads are constantly
> creating/updating/deleting "bookings" and this results in a column
> being added /updated/deleted to the row.
> 2) The SEATS_AVAILABLE column needs to be atomically updated whenever
> a corresponding BOOKING_<#> is added, updated or deleted.
> 3) Clients update their own unique BOOKING columns (i.e. clients
> update their own mutually exclusive BOOKING_<#> columns.
> 4) Clients can concurrently update the SEATS_AVAILABLE column.
> 5) Some SHOW_ID will be harder hit than other SHOW_IDs
> 6) A TTL on the BOOKING columns will be set to expire them after some set 
> time.
> 7) We want to  leverage the atomic update at "row level" that HBase
> provides for atomically updating the related columns.
>
> So - we are visualizing this as sort of an "equalizer" graphic on a
> stereo where each row is constantly varying in terms of columns added
> & removed. The SEATS_AVAILABLE value goes up & down correspondingly.
>
> Questions / Notes:
> 1) Could this lead to a hot key / hot row scenario? The columns being
> updated are mutually exclusive except for the SEATS_AVAILABLE. Or
> would this be very low overhead given that only one column is really
> being "updated" by multiple client threads?
>
> 2) The alternative we had explored was tall table where each BOOKING
> is a separate row (SHOW_ID-BOOKING-<#> would be the key) ... however,
> in this case, we won't be able to atomically update the
> SEATS_AVAILABLE column at the same time.
>
> 3) In terms of "row locking", what is the granularity? i.e. when is
> the row level lock engaged to make it atomic (i.e. are the column
> updates made on the side and "swapped" in with the row level lock?) or
> is the row level lock held for the full duration of the update.
>
> 4) I think the concern is whether this design is scalable as the
> number of clients keep increasing over time ...
>
> 5) Any other suggestions on how hot row key scenario (if real) can be
> sidestepped?
>
> Thanks,
> --Suraj
>

Re: Schema design question - Hot Key concerns

Reply via email to