Isn't a better strategy to create the HBase keys as Key = hash(MySQL_key) + MySQL_key
That way you'll know your key distribution and can add new machines seamlessly. I'm assuming that your rows don't overlap between any 2 machines. If so, you could append the MACHINE_ID to the key (not prepend). I don't think you want the machine # as the first dimension on your rows, because you want the data from new machines to be evenly spread out across the existing regions. On 10/24/11 9:07 AM, "Stack" <[email protected]> wrote: >On Mon, Oct 24, 2011 at 1:27 AM, Sam Seigal <[email protected]> wrote: >> According to the HBase book , pre splitting tables and doing manual >> splits is a better long term strategy than letting HBase handle it. >> > >Its good for getting a table off the ground, yes. > > >> Since I do not know what the keys from the prod system are going to >> look like , I am adding a machine number prefix to the the row keys >> and pre splitting the tables based on the prefix (prefix 0 goes to >> machine A, prefix 1 goes to machine b etc). >> > >You don't need to do inorder scan of the data? Whats the rest of your >row key look like? > > >> Once I decide to add more machines, I can always do a rolling split >> and add more prefixes. >> > >Yes. > >> Is this a good strategy for pre splitting the tables ? >> > >So, you'll start out with one region per server? > >What do you think the rate of splitting will be like? Are you using >default region size or have you bumped this up? > >St.Ack
