If multiple tables have the same key distribution and count, then they will have similar split points for their regions, but the locations of the regions will be randomized.
I wouldn't worry about this until you see evidence it is a problem. On Mon, Jan 10, 2011 at 8:20 AM, Weishung Chung <[email protected]> wrote: > Thank you for the replies. > Most of the queries, (70%) will be for scanning a range of consecutive > times, with some single timestamp query (30%) > But there are multiple tables with the same range of timestamps, will all > these same range of timestamps from multiple tables be stored on the same > region server and if so, could it affect the performance of map reduce jobs > (operated on those tables with the same range of time periods) ? Would > hotspotting defeat the purpose of map reduce? > > On Mon, Jan 10, 2011 at 10:08 AM, Matt Corgan <[email protected]> wrote: > > > You can also add a random (or hashed) prefix to the beginning of the key. > > If your prefix were one byte with values 0-63, you've divided the hot > spot > > into 64 smaller ones, which is better for writing. The downside is that > if > > you want to read a range of values, you will have to query all 64 > "shards" > > and merge the sorted values. You can choose whatever prefix size is best > > for your scenario. > > > > > > On Mon, Jan 10, 2011 at 11:05 AM, Chirstopher Tarnas <[email protected]> > > wrote: > > > > > Some options that I am aware of: > > > > > > reverse the byte order of the timestamp > > > use UUIDs rather than a timestamp > > > use hashing, this working really depends on your requirements > > > > > > On Mon, Jan 10, 2011 at 9:33 AM, Weishung Chung <[email protected]> > > > wrote: > > > > > > > What is the good way to randomize the primary key which is a > timestamp > > in > > > > HBase to avoid hotspotting? > > > > Thank you so much :) > > > > > > > > > >
