RE: region, regionserver questions

Jonathan Gray Thu, 02 Dec 2010 21:11:17 -0800

The initial assignment if creating a table with multiple regions is random 
across available RSs.


It'd probably make the most sense to pick a random starting node and do it 
round-robin to get the best distribution of the table.  Feel free to file a 
JIRA.

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Stack
> Sent: Thursday, December 02, 2010 9:05 PM
> To: [email protected]
> Subject: Re: region, regionserver questions
> 
> On Thu, Dec 2, 2010 at 7:44 PM, Ted Yu <[email protected]> wrote:
> > When a table is created with N regions, is it possible to distribute them
> > (almost) equally among the region servers ?
> >
> 
> Yes.  In 0.90 or TRUNK, create table and then wait on the balancer
> which runs every 5 minutes, or you can run it manually from the shell:
> 
> hbase> balancer
> 
> St.Ack
> 
> 
> > Thanks
> >
> > On Thu, Dec 2, 2010 at 3:10 PM, Jonathan Gray <[email protected]> wrote:
> >
> >> Yeah, I'd recommend just using the normal TIF which will have a map task
> >> per region, attempts to schedule it on that node, and each task would
> talk
> >> to only one (hopefully local) server.
> >>
> >> As for assignment, the story has changed significantly between previous
> >> versions and the upcoming 0.90 release.
> >>
> >> In 0.90, there are two modes of startup assignment.  The new default is
> >> 'retain assignment' where the master will attempt to reuse whatever the
> last
> >> set of assignments were on the previous run of the cluster.  The other
> >> option, if you turn off retain assignment, is round-robin.  This round-
> robin
> >> assignment would give you what you want (an approximately equal number
> of
> >> regions of each table on each server).
> >>
> >> What I've done to get good distribution of the tables is startup with
> >> round-robin, then from then on use retain assignment.
> >>
> >> JG
> >>
> >> > -----Original Message-----
> >> > From: Sean Sechrist [mailto:[email protected]]
> >> > Sent: Thursday, December 02, 2010 2:50 PM
> >> > To: [email protected]
> >> > Subject: Re: region, regionserver questions
> >> >
> >> > Hey Albert,
> >> >
> >> > If you use TableInputFormat, it will create one map task per region in
> >> that
> >> > table. So, each mapper should just talk to one regionserver.
> >> >
> >> > -Sean
> >> >
> >> > On Thu, Dec 2, 2010 at 5:26 PM, Albert Shau <[email protected]>
> wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > I'm doing a distributed scan of an hbase table using map-reduce by
> >> taking
> >> > > all the regions belonging to a regionserver, and then assigning
> those
> >> > > regions to a mapper (so there's 1 mapper per regionserver, and each
> >> > mapper
> >> > > only talks to one regionserver).  However, doing it this way I'm
> >> getting
> >> > > some data skew.  For example, I have 2 tables U and T.  Each
> >> regionserver
> >> > > may have 30 regions, but one regionserver might have 10 regions from
> >> > table U
> >> > > while another regionserver might have 25 regions from table U.  Is
> >> there
> >> > a
> >> > > way to balance regions per table per regionserver (so that each
> >> > regionserver
> >> > > has 15 regions from table U for example)?  Or should I just not
> worry
> >> > about
> >> > > trying to have each individual mapper only talk to one regionserver?
> >> > >
> >> > > Also, how do regions get assigned to regionservers?  Is it based on
> >> data
> >> > > locality?  Region start/end keys?  Randomly?
> >> > >
> >> > > Thanks,
> >> > > Albert
> >> > >
> >>
> >

RE: region, regionserver questions

Reply via email to