Re: region, regionserver questions

Ted Yu Thu, 02 Dec 2010 19:45:23 -0800

When a table is created with N regions, is it possible to distribute them
(almost) equally among the region servers ?


Thanks

On Thu, Dec 2, 2010 at 3:10 PM, Jonathan Gray <[email protected]> wrote:

> Yeah, I'd recommend just using the normal TIF which will have a map task
> per region, attempts to schedule it on that node, and each task would talk
> to only one (hopefully local) server.
>
> As for assignment, the story has changed significantly between previous
> versions and the upcoming 0.90 release.
>
> In 0.90, there are two modes of startup assignment.  The new default is
> 'retain assignment' where the master will attempt to reuse whatever the last
> set of assignments were on the previous run of the cluster.  The other
> option, if you turn off retain assignment, is round-robin.  This round-robin
> assignment would give you what you want (an approximately equal number of
> regions of each table on each server).
>
> What I've done to get good distribution of the tables is startup with
> round-robin, then from then on use retain assignment.
>
> JG
>
> > -----Original Message-----
> > From: Sean Sechrist [mailto:[email protected]]
> > Sent: Thursday, December 02, 2010 2:50 PM
> > To: [email protected]
> > Subject: Re: region, regionserver questions
> >
> > Hey Albert,
> >
> > If you use TableInputFormat, it will create one map task per region in
> that
> > table. So, each mapper should just talk to one regionserver.
> >
> > -Sean
> >
> > On Thu, Dec 2, 2010 at 5:26 PM, Albert Shau <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > I'm doing a distributed scan of an hbase table using map-reduce by
> taking
> > > all the regions belonging to a regionserver, and then assigning those
> > > regions to a mapper (so there's 1 mapper per regionserver, and each
> > mapper
> > > only talks to one regionserver).  However, doing it this way I'm
> getting
> > > some data skew.  For example, I have 2 tables U and T.  Each
> regionserver
> > > may have 30 regions, but one regionserver might have 10 regions from
> > table U
> > > while another regionserver might have 25 regions from table U.  Is
> there
> > a
> > > way to balance regions per table per regionserver (so that each
> > regionserver
> > > has 15 regions from table U for example)?  Or should I just not worry
> > about
> > > trying to have each individual mapper only talk to one regionserver?
> > >
> > > Also, how do regions get assigned to regionservers?  Is it based on
> data
> > > locality?  Region start/end keys?  Randomly?
> > >
> > > Thanks,
> > > Albert
> > >
>

Re: region, regionserver questions

Reply via email to