On Thu, Dec 2, 2010 at 7:44 PM, Ted Yu <[email protected]> wrote: > When a table is created with N regions, is it possible to distribute them > (almost) equally among the region servers ? >
Yes. In 0.90 or TRUNK, create table and then wait on the balancer which runs every 5 minutes, or you can run it manually from the shell: hbase> balancer St.Ack > Thanks > > On Thu, Dec 2, 2010 at 3:10 PM, Jonathan Gray <[email protected]> wrote: > >> Yeah, I'd recommend just using the normal TIF which will have a map task >> per region, attempts to schedule it on that node, and each task would talk >> to only one (hopefully local) server. >> >> As for assignment, the story has changed significantly between previous >> versions and the upcoming 0.90 release. >> >> In 0.90, there are two modes of startup assignment. The new default is >> 'retain assignment' where the master will attempt to reuse whatever the last >> set of assignments were on the previous run of the cluster. The other >> option, if you turn off retain assignment, is round-robin. This round-robin >> assignment would give you what you want (an approximately equal number of >> regions of each table on each server). >> >> What I've done to get good distribution of the tables is startup with >> round-robin, then from then on use retain assignment. >> >> JG >> >> > -----Original Message----- >> > From: Sean Sechrist [mailto:[email protected]] >> > Sent: Thursday, December 02, 2010 2:50 PM >> > To: [email protected] >> > Subject: Re: region, regionserver questions >> > >> > Hey Albert, >> > >> > If you use TableInputFormat, it will create one map task per region in >> that >> > table. So, each mapper should just talk to one regionserver. >> > >> > -Sean >> > >> > On Thu, Dec 2, 2010 at 5:26 PM, Albert Shau <[email protected]> wrote: >> > >> > > Hi, >> > > >> > > I'm doing a distributed scan of an hbase table using map-reduce by >> taking >> > > all the regions belonging to a regionserver, and then assigning those >> > > regions to a mapper (so there's 1 mapper per regionserver, and each >> > mapper >> > > only talks to one regionserver). However, doing it this way I'm >> getting >> > > some data skew. For example, I have 2 tables U and T. Each >> regionserver >> > > may have 30 regions, but one regionserver might have 10 regions from >> > table U >> > > while another regionserver might have 25 regions from table U. Is >> there >> > a >> > > way to balance regions per table per regionserver (so that each >> > regionserver >> > > has 15 regions from table U for example)? Or should I just not worry >> > about >> > > trying to have each individual mapper only talk to one regionserver? >> > > >> > > Also, how do regions get assigned to regionservers? Is it based on >> data >> > > locality? Region start/end keys? Randomly? >> > > >> > > Thanks, >> > > Albert >> > > >> >
