Re: how to randomize the primary key which is a timestamp

Tost Mon, 10 Jan 2011 16:19:01 -0800

How about SecureRandom class.

you can get the key from seed.


see
http://download.oracle.com/javase/6/docs/api/java/security/SecureRandom.html

2011/1/11 Weishung Chung <[email protected]>

> Thanks alot, this will get me started :D
>
> On Mon, Jan 10, 2011 at 11:04 AM, Matt Corgan <[email protected]> wrote:
>
> > You could have prefix = timestamp % 64.  Then for a single key lookup,
> you
> > could calculate the prefix and query just one shard.  For a scan, you
> have
> > to query all shards and merge the results.
> >
> >
> > On Mon, Jan 10, 2011 at 11:56 AM, Weishung Chung <[email protected]>
> > wrote:
> >
> > > Thank you for your prompt response. I am a bit confused about the
> prefix.
> > > If i were to use prefix for the timestamp key, when come to query time,
> > how
> > > should i specify the row key to search for? How do I know which prefix
> > was
> > > used for a certain timestamp and needs to be append to the timestamp
> for
> > > querying?
> > >
> > > On Mon, Jan 10, 2011 at 10:41 AM, Matt Corgan <[email protected]>
> > wrote:
> > >
> > > > You can put them all in the same table.  If you prefix the keys when
> > > > written, use a prefix filter when querying.  I would choose a prefix
> > > window
> > > > that's about 4 times the number of nodes.
> > > >
> > > >
> > > > On Mon, Jan 10, 2011 at 11:30 AM, Ted Dunning <[email protected]
> >
> > > > wrote:
> > > >
> > > > > If multiple tables have the same key distribution and count, then
> > they
> > > > will
> > > > > have similar split points for their regions, but the locations of
> the
> > > > > regions will be randomized.
> > > > >
> > > > > I wouldn't worry about this until you see evidence it is a problem.
> > > > >
> > > > > On Mon, Jan 10, 2011 at 8:20 AM, Weishung Chung <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Thank you for the replies.
> > > > > > Most of the queries, (70%) will be for scanning a range of
> > > consecutive
> > > > > > times, with some single timestamp query (30%)
> > > > > > But there are multiple tables with the same range of timestamps,
> > will
> > > > all
> > > > > > these same range of timestamps from multiple tables be stored on
> > the
> > > > same
> > > > > > region server and if so, could it affect the performance of map
> > > reduce
> > > > > jobs
> > > > > > (operated on those tables with the same range of time periods) ?
> > > Would
> > > > > > hotspotting defeat the purpose of map reduce?
> > > > > >
> > > > > > On Mon, Jan 10, 2011 at 10:08 AM, Matt Corgan <
> [email protected]
> > >
> > > > > wrote:
> > > > > >
> > > > > > > You can also add a random (or hashed) prefix to the beginning
> of
> > > the
> > > > > key.
> > > > > > >  If your prefix were one byte with values 0-63, you've divided
> > the
> > > > hot
> > > > > > spot
> > > > > > > into 64 smaller ones, which is better for writing.  The
> downside
> > is
> > > > > that
> > > > > > if
> > > > > > > you want to read a range of values, you will have to query all
> 64
> > > > > > "shards"
> > > > > > > and merge the sorted values.  You can choose whatever prefix
> size
> > > is
> > > > > best
> > > > > > > for your scenario.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jan 10, 2011 at 11:05 AM, Chirstopher Tarnas <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Some options that I am aware of:
> > > > > > > >
> > > > > > > > reverse the byte order of the timestamp
> > > > > > > > use UUIDs rather than a timestamp
> > > > > > > > use hashing, this working really depends on your requirements
> > > > > > > >
> > > > > > > > On Mon, Jan 10, 2011 at 9:33 AM, Weishung Chung <
> > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > What is the good way to randomize the primary key which is
> a
> > > > > > timestamp
> > > > > > > in
> > > > > > > > > HBase to avoid hotspotting?
> > > > > > > > > Thank you so much :)
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: how to randomize the primary key which is a timestamp

Reply via email to