Ok - now I understand - doing pre-splits using the full binary space does not 
make sense when using a limited range. I do all my splits in the base-64 
character space or let hbase do them organically.
thanks for the explanation.
-chris


On Mar 17, 2011, at 11:32 AM, Ted Dunning wrote:

> Just that base-64 is not uniformly distributed relative to a binary 
> representation.  This is simply  because it is all printable characters.  If 
> you do a 256 way pre-split based on a binary interpretation of the key, 64 
> regions will get traffic and 192 will get none.  Among other things, this can 
> seriously mess up benchmarking.  The situation is even worse with decimal 
> integer representations.
> 
> On Thu, Mar 17, 2011 at 11:19 AM, Chris Tarnas <c...@email.com> wrote:
> I'm not sure I am clear, are you saying 64 bit chunks of a MD5 keys are not 
> uniformly distributed? Or that a base-64 encoding is not evenly distributed?
> 
> thanks,
> -chris
> 
> On Mar 17, 2011, at 10:23 AM, Ted Dunning wrote:
> 
>> 
>> There can be some odd effects with this because the keys are not uniformly 
>> distributed.  Beware if you are using pre-split tables because the region 
>> traffic can be pretty unbalanced if you do a naive split.
>> 
>> On Thu, Mar 17, 2011 at 9:20 AM, Chris Tarnas <c...@email.com> wrote:
>> I've been using base-64 encoding when I use my hashes as rowkeys - makes 
>> them printable while still being fairly dense, IIRC a 64bit key should be 
>> only 11 characters.
>> 
> 
> 

Reply via email to