There's the "split" command in the shel. HBaseAdmin has that same method.
In the table's page from the master's web UI, there's a "split" button. Finally, when creating a table, you can pre-specify all the split keys with this method: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor, byte[][]) J-D On Thu, Feb 10, 2011 at 8:48 AM, Geoff Hendrey <ghend...@decarta.com> wrote: > I hunted around for some info on how to force a table to split, but I > didn't find what I was looking for. Is there a command I can issue from > the Hbase shell that would force every existing region to divide in > half? That would be quite useful. If not, what's the next best way to > force splits. > > thanks! > -g > > -----Original Message----- > From: Michael Segel [mailto:michael_se...@hotmail.com] > Sent: Thursday, February 10, 2011 8:15 AM > To: user@hbase.apache.org > Cc: hbase-u...@hadoop.apache.org > Subject: RE: getSplits question > > > Ryan, > > Just to point out the obvious... > > On smaller tables where you don't get enough parallelism, you can > manually force the table's regions to be split. > My understanding that if/when the table grows it will then go back to > splitting normally. > > This way if you have a 'small' look up table that is relatively static, > you manually split it to the 'right' size for your cloud. > If you are seeding a system, you can do the splits to get good > parallelism and not overload a single region with inserts, then let it > go back to its normal growth pattern and splits. > > This would solve the OP's issue and as you point out, not worry about > getSplits(). > > Does this make sense, or am I missing something? > > -Mike > >> Date: Wed, 9 Feb 2011 23:54:19 -0800 >> Subject: Re: getSplits question >> From: ryano...@gmail.com >> To: user@hbase.apache.org >> CC: hbase-u...@hadoop.apache.org >> >> By default each map gets the contents of 1 region. A region is by >> default a maximum of 256MB. There is no trivial way to generally >> bisect a region in half, in terms of row count, by just knowing what >> we known (start, end key). >> >> For very large tables that have > 100 regions, this algorithm works >> really well and you get some good parallelism. If you want to see a >> lot of parallelism out of 1 region, you might have to work a lot >> harder. Or reduce your region size and have more regions. Be warned >> though, that more regions has performance hits in other areas >> (specifically server startup/shutdown/assignment times). So you >> probably dont want 50,000 32MB regions. >> >> -ryan >> >> On Wed, Feb 9, 2011 at 11:46 PM, Geoff Hendrey <ghend...@decarta.com> > wrote: >> > Oh, I definitely don't *need* my own to run mapreduce. However, if I > want to control the number of records handled by each mapper (splitsize) > and the startrow and endrow, then I thought I had to write my own > getSplits(). Is there another way to accomplish this, because I do need > the combination of controlled splitsize and start/endrow. >> > >> > -geoff >> > >> > -----Original Message----- >> > From: Ryan Rawson [mailto:ryano...@gmail.com] >> > Sent: Wednesday, February 09, 2011 11:43 PM >> > To: user@hbase.apache.org >> > Cc: hbase-u...@hadoop.apache.org >> > Subject: Re: getSplits question >> > >> > You shouldn't need to write your own getSplits() method to run a map >> > reduce, I never did at least... >> > >> > -ryan >> > >> > On Wed, Feb 9, 2011 at 11:36 PM, Geoff Hendrey > <ghend...@decarta.com> wrote: >> >> Are endrows inclusive or exclusive? The docs say exclusive, but > then the >> >> question arises as to how to form the last split for getSplits(). > The >> >> code below runs fine, but I believe it is omitting some rows, > perhaps >> >> b/c of the exclusive end row. For the final split, should the > endrow be >> >> null? I tried that, and got what appeared to be a final split > without an >> >> endrow at all. Would appreciate a pointer to the correct > implementation >> >> of getSplits in which I desire to provide a startrow, endrow, and >> >> splitsize. Apparently this isn't it J : >> >> >> >> >> >> >> >> int splitSize = context.getConfiguration().getInt("splitsize", > 1000); >> >> >> >> byte[] splitStop = null; >> >> >> >> String hostname = null; >> >> >> >> while ((results = > resultScanner.next(splitSize)).length >> >>> 0) { >> >> >> >> // System.out.println("results >> >> :-------------------------- "+results); >> >> >> >> byte[] splitStart = results[0].getRow(); >> >> >> >> splitStop = results[results.length - > 1].getRow(); >> >> //I think this is a problem...we don't actually include this row in > the >> >> split since it's exclusive..revisit this and correct >> >> >> >> HRegionLocation location = >> >> table.getRegionLocation(splitStart); >> >> >> >> hostname = >> >> location.getServerAddress().getHostname(); >> >> >> >> InputSplit split = new >> >> TableSplit(table.getTableName(), splitStart, splitStop, hostname); >> >> >> >> splits.add(split); >> >> >> >> System.out.println("initializing splits: " + >> >> split.toString()); >> >> >> >> } >> >> >> >> resultScanner.close(); >> >> >> >> >> >> >> >> >> >> >> >> -g >> >> >> >> >> > > >