Ok - now I understand - doing pre-splits using the full binary space does not make sense when using a limited range. I do all my splits in the base-64 character space or let hbase do them organically. thanks for the explanation. -chris
On Mar 17, 2011, at 11:32 AM, Ted Dunning wrote: > Just that base-64 is not uniformly distributed relative to a binary > representation. This is simply because it is all printable characters. If > you do a 256 way pre-split based on a binary interpretation of the key, 64 > regions will get traffic and 192 will get none. Among other things, this can > seriously mess up benchmarking. The situation is even worse with decimal > integer representations. > > On Thu, Mar 17, 2011 at 11:19 AM, Chris Tarnas <c...@email.com> wrote: > I'm not sure I am clear, are you saying 64 bit chunks of a MD5 keys are not > uniformly distributed? Or that a base-64 encoding is not evenly distributed? > > thanks, > -chris > > On Mar 17, 2011, at 10:23 AM, Ted Dunning wrote: > >> >> There can be some odd effects with this because the keys are not uniformly >> distributed. Beware if you are using pre-split tables because the region >> traffic can be pretty unbalanced if you do a naive split. >> >> On Thu, Mar 17, 2011 at 9:20 AM, Chris Tarnas <c...@email.com> wrote: >> I've been using base-64 encoding when I use my hashes as rowkeys - makes >> them printable while still being fairly dense, IIRC a 64bit key should be >> only 11 characters. >> > >