Re: region, regionserver questions

Ted Yu Thu, 02 Dec 2010 21:36:55 -0800

I logged https://issues.apache.org/jira/browse/HBASE-3305


Thanks

On Thu, Dec 2, 2010 at 9:10 PM, Jonathan Gray <[email protected]> wrote:

> The initial assignment if creating a table with multiple regions is random
> across available RSs.
>
> It'd probably make the most sense to pick a random starting node and do it
> round-robin to get the best distribution of the table.  Feel free to file a
> JIRA.
>
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]] On Behalf Of
> Stack
> > Sent: Thursday, December 02, 2010 9:05 PM
> > To: [email protected]
> > Subject: Re: region, regionserver questions
> >
> > On Thu, Dec 2, 2010 at 7:44 PM, Ted Yu <[email protected]> wrote:
> > > When a table is created with N regions, is it possible to distribute
> them
> > > (almost) equally among the region servers ?
> > >
> >
> > Yes.  In 0.90 or TRUNK, create table and then wait on the balancer
> > which runs every 5 minutes, or you can run it manually from the shell:
> >
> > hbase> balancer
> >
> > St.Ack
> >
> >
> > > Thanks
> > >
> > > On Thu, Dec 2, 2010 at 3:10 PM, Jonathan Gray <[email protected]> wrote:
> > >
> > >> Yeah, I'd recommend just using the normal TIF which will have a map
> task
> > >> per region, attempts to schedule it on that node, and each task would
> > talk
> > >> to only one (hopefully local) server.
> > >>
> > >> As for assignment, the story has changed significantly between
> previous
> > >> versions and the upcoming 0.90 release.
> > >>
> > >> In 0.90, there are two modes of startup assignment.  The new default
> is
> > >> 'retain assignment' where the master will attempt to reuse whatever
> the
> > last
> > >> set of assignments were on the previous run of the cluster.  The other
> > >> option, if you turn off retain assignment, is round-robin.  This
> round-
> > robin
> > >> assignment would give you what you want (an approximately equal number
> > of
> > >> regions of each table on each server).
> > >>
> > >> What I've done to get good distribution of the tables is startup with
> > >> round-robin, then from then on use retain assignment.
> > >>
> > >> JG
> > >>
> > >> > -----Original Message-----
> > >> > From: Sean Sechrist [mailto:[email protected]]
> > >> > Sent: Thursday, December 02, 2010 2:50 PM
> > >> > To: [email protected]
> > >> > Subject: Re: region, regionserver questions
> > >> >
> > >> > Hey Albert,
> > >> >
> > >> > If you use TableInputFormat, it will create one map task per region
> in
> > >> that
> > >> > table. So, each mapper should just talk to one regionserver.
> > >> >
> > >> > -Sean
> > >> >
> > >> > On Thu, Dec 2, 2010 at 5:26 PM, Albert Shau <[email protected]>
> > wrote:
> > >> >
> > >> > > Hi,
> > >> > >
> > >> > > I'm doing a distributed scan of an hbase table using map-reduce by
> > >> taking
> > >> > > all the regions belonging to a regionserver, and then assigning
> > those
> > >> > > regions to a mapper (so there's 1 mapper per regionserver, and
> each
> > >> > mapper
> > >> > > only talks to one regionserver).  However, doing it this way I'm
> > >> getting
> > >> > > some data skew.  For example, I have 2 tables U and T.  Each
> > >> regionserver
> > >> > > may have 30 regions, but one regionserver might have 10 regions
> from
> > >> > table U
> > >> > > while another regionserver might have 25 regions from table U.  Is
> > >> there
> > >> > a
> > >> > > way to balance regions per table per regionserver (so that each
> > >> > regionserver
> > >> > > has 15 regions from table U for example)?  Or should I just not
> > worry
> > >> > about
> > >> > > trying to have each individual mapper only talk to one
> regionserver?
> > >> > >
> > >> > > Also, how do regions get assigned to regionservers?  Is it based
> on
> > >> data
> > >> > > locality?  Region start/end keys?  Randomly?
> > >> > >
> > >> > > Thanks,
> > >> > > Albert
> > >> > >
> > >>
> > >
>

Re: region, regionserver questions

Reply via email to