I logged https://issues.apache.org/jira/browse/HBASE-3305
Thanks On Thu, Dec 2, 2010 at 9:10 PM, Jonathan Gray <[email protected]> wrote: > The initial assignment if creating a table with multiple regions is random > across available RSs. > > It'd probably make the most sense to pick a random starting node and do it > round-robin to get the best distribution of the table. Feel free to file a > JIRA. > > > -----Original Message----- > > From: [email protected] [mailto:[email protected]] On Behalf Of > Stack > > Sent: Thursday, December 02, 2010 9:05 PM > > To: [email protected] > > Subject: Re: region, regionserver questions > > > > On Thu, Dec 2, 2010 at 7:44 PM, Ted Yu <[email protected]> wrote: > > > When a table is created with N regions, is it possible to distribute > them > > > (almost) equally among the region servers ? > > > > > > > Yes. In 0.90 or TRUNK, create table and then wait on the balancer > > which runs every 5 minutes, or you can run it manually from the shell: > > > > hbase> balancer > > > > St.Ack > > > > > > > Thanks > > > > > > On Thu, Dec 2, 2010 at 3:10 PM, Jonathan Gray <[email protected]> wrote: > > > > > >> Yeah, I'd recommend just using the normal TIF which will have a map > task > > >> per region, attempts to schedule it on that node, and each task would > > talk > > >> to only one (hopefully local) server. > > >> > > >> As for assignment, the story has changed significantly between > previous > > >> versions and the upcoming 0.90 release. > > >> > > >> In 0.90, there are two modes of startup assignment. The new default > is > > >> 'retain assignment' where the master will attempt to reuse whatever > the > > last > > >> set of assignments were on the previous run of the cluster. The other > > >> option, if you turn off retain assignment, is round-robin. This > round- > > robin > > >> assignment would give you what you want (an approximately equal number > > of > > >> regions of each table on each server). > > >> > > >> What I've done to get good distribution of the tables is startup with > > >> round-robin, then from then on use retain assignment. > > >> > > >> JG > > >> > > >> > -----Original Message----- > > >> > From: Sean Sechrist [mailto:[email protected]] > > >> > Sent: Thursday, December 02, 2010 2:50 PM > > >> > To: [email protected] > > >> > Subject: Re: region, regionserver questions > > >> > > > >> > Hey Albert, > > >> > > > >> > If you use TableInputFormat, it will create one map task per region > in > > >> that > > >> > table. So, each mapper should just talk to one regionserver. > > >> > > > >> > -Sean > > >> > > > >> > On Thu, Dec 2, 2010 at 5:26 PM, Albert Shau <[email protected]> > > wrote: > > >> > > > >> > > Hi, > > >> > > > > >> > > I'm doing a distributed scan of an hbase table using map-reduce by > > >> taking > > >> > > all the regions belonging to a regionserver, and then assigning > > those > > >> > > regions to a mapper (so there's 1 mapper per regionserver, and > each > > >> > mapper > > >> > > only talks to one regionserver). However, doing it this way I'm > > >> getting > > >> > > some data skew. For example, I have 2 tables U and T. Each > > >> regionserver > > >> > > may have 30 regions, but one regionserver might have 10 regions > from > > >> > table U > > >> > > while another regionserver might have 25 regions from table U. Is > > >> there > > >> > a > > >> > > way to balance regions per table per regionserver (so that each > > >> > regionserver > > >> > > has 15 regions from table U for example)? Or should I just not > > worry > > >> > about > > >> > > trying to have each individual mapper only talk to one > regionserver? > > >> > > > > >> > > Also, how do regions get assigned to regionservers? Is it based > on > > >> data > > >> > > locality? Region start/end keys? Randomly? > > >> > > > > >> > > Thanks, > > >> > > Albert > > >> > > > > >> > > > >
