The initial assignment if creating a table with multiple regions is random across available RSs.
It'd probably make the most sense to pick a random starting node and do it round-robin to get the best distribution of the table. Feel free to file a JIRA. > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of Stack > Sent: Thursday, December 02, 2010 9:05 PM > To: [email protected] > Subject: Re: region, regionserver questions > > On Thu, Dec 2, 2010 at 7:44 PM, Ted Yu <[email protected]> wrote: > > When a table is created with N regions, is it possible to distribute them > > (almost) equally among the region servers ? > > > > Yes. In 0.90 or TRUNK, create table and then wait on the balancer > which runs every 5 minutes, or you can run it manually from the shell: > > hbase> balancer > > St.Ack > > > > Thanks > > > > On Thu, Dec 2, 2010 at 3:10 PM, Jonathan Gray <[email protected]> wrote: > > > >> Yeah, I'd recommend just using the normal TIF which will have a map task > >> per region, attempts to schedule it on that node, and each task would > talk > >> to only one (hopefully local) server. > >> > >> As for assignment, the story has changed significantly between previous > >> versions and the upcoming 0.90 release. > >> > >> In 0.90, there are two modes of startup assignment. The new default is > >> 'retain assignment' where the master will attempt to reuse whatever the > last > >> set of assignments were on the previous run of the cluster. The other > >> option, if you turn off retain assignment, is round-robin. This round- > robin > >> assignment would give you what you want (an approximately equal number > of > >> regions of each table on each server). > >> > >> What I've done to get good distribution of the tables is startup with > >> round-robin, then from then on use retain assignment. > >> > >> JG > >> > >> > -----Original Message----- > >> > From: Sean Sechrist [mailto:[email protected]] > >> > Sent: Thursday, December 02, 2010 2:50 PM > >> > To: [email protected] > >> > Subject: Re: region, regionserver questions > >> > > >> > Hey Albert, > >> > > >> > If you use TableInputFormat, it will create one map task per region in > >> that > >> > table. So, each mapper should just talk to one regionserver. > >> > > >> > -Sean > >> > > >> > On Thu, Dec 2, 2010 at 5:26 PM, Albert Shau <[email protected]> > wrote: > >> > > >> > > Hi, > >> > > > >> > > I'm doing a distributed scan of an hbase table using map-reduce by > >> taking > >> > > all the regions belonging to a regionserver, and then assigning > those > >> > > regions to a mapper (so there's 1 mapper per regionserver, and each > >> > mapper > >> > > only talks to one regionserver). However, doing it this way I'm > >> getting > >> > > some data skew. For example, I have 2 tables U and T. Each > >> regionserver > >> > > may have 30 regions, but one regionserver might have 10 regions from > >> > table U > >> > > while another regionserver might have 25 regions from table U. Is > >> there > >> > a > >> > > way to balance regions per table per regionserver (so that each > >> > regionserver > >> > > has 15 regions from table U for example)? Or should I just not > worry > >> > about > >> > > trying to have each individual mapper only talk to one regionserver? > >> > > > >> > > Also, how do regions get assigned to regionservers? Is it based on > >> data > >> > > locality? Region start/end keys? Randomly? > >> > > > >> > > Thanks, > >> > > Albert > >> > > > >> > >
