No, this isn't on EC2 and yes, its (supposed to be) production. Please elaboration on your inferred sigh of dispair....
On 12 June 2012 15:48, Michael Segel <[email protected]> wrote: > Ok... > > Please tell me that this isn't a production system. > > Is this on EC2? > > On Jun 12, 2012, at 6:55 AM, Simon Kelly wrote: > > > Thanks Michael > > > > I'm 100% sure its not the UUID distribution that's causing the problem. > I'm > > going to try us the API to create the table and see if that changes > things. > > > > The reason I want to pre-split the table is that HBase doesn't handle the > > initial load to a single regionserver and I can't start the system off > > slowly and allow a few splits to happen before fully loading it. Its 100% > > or nothing. I'm also stuck with only 8Gb of RAM per server and only 5 > > servers so I need to try and get as much as I can from the get go. > > > > Simon > > > > On 12 June 2012 13:37, Michael Segel <[email protected]> wrote: > > > >> Ok, > >> Now that I'm awake, and am drinking my first cup of joe... > >> > >> If you just generate UUIDs you are not going to have an even > distribution. > >> Nor are they going to be truly random due to how the machines are > >> generating their random numbers. > >> But this is not important in solving your problem.... > >> > >> There is a set of UUIDs which are hashed and then truncated back down > to a > >> 128 bit string. > >> You can generate the UUID, take a hash (SHA-1 or MD5) and then truncate > it > >> to 128 bits. > >> This would generate a more random distribution across your splits. > >> > >> I'm also a bit curious about why you're pre-splitting in the first > place. > >> I mean I understand why people do it, but its a short term fix and I > >> wonder how much pain you feel. > >> > >> Of course YMMV based on your use case. > >> > >> Hash your key and you'll be ok. > >> > >> > >> > >> On Jun 12, 2012, at 4:41 AM, Simon Kelly wrote: > >> > >>> Yes, I'm aware that UUID's are designed to be unique and not evenly > >>> distributed but I wouldn't expect a big gap in their distribution > either. > >>> > >>> The other thing that is really confusing me is that the regions splits > >>> aren't lexicographical sorted. Perhaps there is a problem with the way > >> I'm > >>> specifying the splits in the split file. I haven't been able to find > any > >>> docs on what format the splits keys should be in so I've used what's > >>> produced by Bytes.toStringBinary. Is that correct? > >>> > >>> Simon > >>> > >>> On 12 June 2012 10:23, Michael Segel <[email protected]> > wrote: > >>> > >>>> UUIDs are unique but not necessarily random and even in random > >> samplings, > >>>> you may not see an even distribution except over time. > >>>> > >>>> > >>>> Sent from my iPhone > >>>> > >>>> On Jun 12, 2012, at 3:18 AM, "Simon Kelly" <[email protected]> > >> wrote: > >>>> > >>>>> Hi > >>>>> > >>>>> I'm getting some unexpected results with a pre-split table where some > >> of > >>>>> the regions are not getting any data. > >>>>> > >>>>> The table keys are UUID (generated using Java's UUID.randomUUID() ) > >> which > >>>>> I'm storing as a byte[16]: > >>>>> > >>>>> key[0-7] = uuid most significant bits > >>>>> key[8-15] = uuid least significant bits > >>>>> > >>>>> The table is created via the shell as follows: > >>>>> > >>>>> create 'table', {NAME => 'cf'}, {SPLITS_FILE => 'splits.txt'} > >>>>> > >>>>> The splits.txt is generated using the code here: > >>>>> http://pastebin.com/DAExXMDz which generates 32 regions split > between > >>>> x00 > >>>>> and xFF. I have also tried with 16 byte regions keys (x00x00... to > >>>>> xFFxFF...). > >>>>> > >>>>> As far as I understand this should distribute the rows evenly across > >> the > >>>>> regions but I'm getting a bunch of regions with no rows. I'm also > >>>> confused > >>>>> as the the ordering of the regions since it seems the start and end > >> keys > >>>>> aren't really matching up correctly. You can see the regions and the > >>>>> requests they are getting here: http://pastebin.com/B4771g5X > >>>>> > >>>>> Thanks in advance for the help. > >>>>> Simon > >>>> > >> > >> > >
