How about SecureRandom class. you can get the key from seed.
see http://download.oracle.com/javase/6/docs/api/java/security/SecureRandom.html 2011/1/11 Weishung Chung <[email protected]> > Thanks alot, this will get me started :D > > On Mon, Jan 10, 2011 at 11:04 AM, Matt Corgan <[email protected]> wrote: > > > You could have prefix = timestamp % 64. Then for a single key lookup, > you > > could calculate the prefix and query just one shard. For a scan, you > have > > to query all shards and merge the results. > > > > > > On Mon, Jan 10, 2011 at 11:56 AM, Weishung Chung <[email protected]> > > wrote: > > > > > Thank you for your prompt response. I am a bit confused about the > prefix. > > > If i were to use prefix for the timestamp key, when come to query time, > > how > > > should i specify the row key to search for? How do I know which prefix > > was > > > used for a certain timestamp and needs to be append to the timestamp > for > > > querying? > > > > > > On Mon, Jan 10, 2011 at 10:41 AM, Matt Corgan <[email protected]> > > wrote: > > > > > > > You can put them all in the same table. If you prefix the keys when > > > > written, use a prefix filter when querying. I would choose a prefix > > > window > > > > that's about 4 times the number of nodes. > > > > > > > > > > > > On Mon, Jan 10, 2011 at 11:30 AM, Ted Dunning <[email protected] > > > > > > wrote: > > > > > > > > > If multiple tables have the same key distribution and count, then > > they > > > > will > > > > > have similar split points for their regions, but the locations of > the > > > > > regions will be randomized. > > > > > > > > > > I wouldn't worry about this until you see evidence it is a problem. > > > > > > > > > > On Mon, Jan 10, 2011 at 8:20 AM, Weishung Chung < > [email protected]> > > > > > wrote: > > > > > > > > > > > Thank you for the replies. > > > > > > Most of the queries, (70%) will be for scanning a range of > > > consecutive > > > > > > times, with some single timestamp query (30%) > > > > > > But there are multiple tables with the same range of timestamps, > > will > > > > all > > > > > > these same range of timestamps from multiple tables be stored on > > the > > > > same > > > > > > region server and if so, could it affect the performance of map > > > reduce > > > > > jobs > > > > > > (operated on those tables with the same range of time periods) ? > > > Would > > > > > > hotspotting defeat the purpose of map reduce? > > > > > > > > > > > > On Mon, Jan 10, 2011 at 10:08 AM, Matt Corgan < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > You can also add a random (or hashed) prefix to the beginning > of > > > the > > > > > key. > > > > > > > If your prefix were one byte with values 0-63, you've divided > > the > > > > hot > > > > > > spot > > > > > > > into 64 smaller ones, which is better for writing. The > downside > > is > > > > > that > > > > > > if > > > > > > > you want to read a range of values, you will have to query all > 64 > > > > > > "shards" > > > > > > > and merge the sorted values. You can choose whatever prefix > size > > > is > > > > > best > > > > > > > for your scenario. > > > > > > > > > > > > > > > > > > > > > On Mon, Jan 10, 2011 at 11:05 AM, Chirstopher Tarnas < > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > Some options that I am aware of: > > > > > > > > > > > > > > > > reverse the byte order of the timestamp > > > > > > > > use UUIDs rather than a timestamp > > > > > > > > use hashing, this working really depends on your requirements > > > > > > > > > > > > > > > > On Mon, Jan 10, 2011 at 9:33 AM, Weishung Chung < > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > What is the good way to randomize the primary key which is > a > > > > > > timestamp > > > > > > > in > > > > > > > > > HBase to avoid hotspotting? > > > > > > > > > Thank you so much :) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
