No, this isn't on EC2 and yes, its (supposed to be) production. Please
elaboration on your inferred sigh of dispair....

On 12 June 2012 15:48, Michael Segel <[email protected]> wrote:

> Ok...
>
> Please tell me that this isn't a production system.
>
> Is this on EC2?
>
> On Jun 12, 2012, at 6:55 AM, Simon Kelly wrote:
>
> > Thanks Michael
> >
> > I'm 100% sure its not the UUID distribution that's causing the problem.
> I'm
> > going to try us the API to create the table and see if that changes
> things.
> >
> > The reason I want to pre-split the table is that HBase doesn't handle the
> > initial load to a single regionserver and I can't start the system off
> > slowly and allow a few splits to happen before fully loading it. Its 100%
> > or nothing. I'm also stuck with only 8Gb of RAM per server and only 5
> > servers so I need to try and get as much as I can from the get go.
> >
> > Simon
> >
> > On 12 June 2012 13:37, Michael Segel <[email protected]> wrote:
> >
> >> Ok,
> >> Now that I'm awake, and am drinking my first cup of joe...
> >>
> >> If you just generate UUIDs you are not going to have an even
> distribution.
> >> Nor are they going to be truly random due to how the machines are
> >> generating their random numbers.
> >> But this is not important in solving your problem....
> >>
> >> There is a set of UUIDs which are hashed and then truncated back down
> to a
> >> 128 bit string.
> >> You can generate the UUID, take a hash (SHA-1 or MD5) and then truncate
> it
> >> to 128 bits.
> >> This would generate a more random distribution across your splits.
> >>
> >> I'm also a bit curious about why you're pre-splitting in the first
> place.
> >> I mean I understand why people do it, but its a short term fix and I
> >> wonder how much pain you feel.
> >>
> >> Of course YMMV based on your use case.
> >>
> >> Hash your key and you'll be ok.
> >>
> >>
> >>
> >> On Jun 12, 2012, at 4:41 AM, Simon Kelly wrote:
> >>
> >>> Yes, I'm aware that UUID's are designed to be unique and not evenly
> >>> distributed but I wouldn't expect a big gap in their distribution
> either.
> >>>
> >>> The other thing that is really confusing me is that the regions splits
> >>> aren't lexicographical sorted. Perhaps there is a problem with the way
> >> I'm
> >>> specifying the splits in the split file. I haven't been able to find
> any
> >>> docs on what format the splits keys should be in so I've used what's
> >>> produced by Bytes.toStringBinary. Is that correct?
> >>>
> >>> Simon
> >>>
> >>> On 12 June 2012 10:23, Michael Segel <[email protected]>
> wrote:
> >>>
> >>>> UUIDs are unique but not necessarily random and even in random
> >> samplings,
> >>>> you may not see an even distribution except over time.
> >>>>
> >>>>
> >>>> Sent from my iPhone
> >>>>
> >>>> On Jun 12, 2012, at 3:18 AM, "Simon Kelly" <[email protected]>
> >> wrote:
> >>>>
> >>>>> Hi
> >>>>>
> >>>>> I'm getting some unexpected results with a pre-split table where some
> >> of
> >>>>> the regions are not getting any data.
> >>>>>
> >>>>> The table keys are UUID (generated using Java's UUID.randomUUID() )
> >> which
> >>>>> I'm storing as a byte[16]:
> >>>>>
> >>>>>  key[0-7] = uuid most significant bits
> >>>>>  key[8-15] = uuid least significant bits
> >>>>>
> >>>>> The table is created via the shell as follows:
> >>>>>
> >>>>>  create 'table', {NAME => 'cf'}, {SPLITS_FILE => 'splits.txt'}
> >>>>>
> >>>>> The splits.txt is generated using the code here:
> >>>>> http://pastebin.com/DAExXMDz which generates 32 regions split
> between
> >>>> x00
> >>>>> and xFF. I have also tried with 16 byte regions keys (x00x00... to
> >>>>> xFFxFF...).
> >>>>>
> >>>>> As far as I understand this should distribute the rows evenly across
> >> the
> >>>>> regions but I'm getting a bunch of regions with no rows. I'm also
> >>>> confused
> >>>>> as the the ordering of the regions since it seems the start and end
> >> keys
> >>>>> aren't really matching up correctly. You can see the regions and the
> >>>>> requests they are getting here: http://pastebin.com/B4771g5X
> >>>>>
> >>>>> Thanks in advance for the help.
> >>>>> Simon
> >>>>
> >>
> >>
>
>

Reply via email to