On Thu, Dec 2, 2010 at 7:44 PM, Ted Yu <[email protected]> wrote:
> When a table is created with N regions, is it possible to distribute them
> (almost) equally among the region servers ?
>

Yes.  In 0.90 or TRUNK, create table and then wait on the balancer
which runs every 5 minutes, or you can run it manually from the shell:

hbase> balancer

St.Ack


> Thanks
>
> On Thu, Dec 2, 2010 at 3:10 PM, Jonathan Gray <[email protected]> wrote:
>
>> Yeah, I'd recommend just using the normal TIF which will have a map task
>> per region, attempts to schedule it on that node, and each task would talk
>> to only one (hopefully local) server.
>>
>> As for assignment, the story has changed significantly between previous
>> versions and the upcoming 0.90 release.
>>
>> In 0.90, there are two modes of startup assignment.  The new default is
>> 'retain assignment' where the master will attempt to reuse whatever the last
>> set of assignments were on the previous run of the cluster.  The other
>> option, if you turn off retain assignment, is round-robin.  This round-robin
>> assignment would give you what you want (an approximately equal number of
>> regions of each table on each server).
>>
>> What I've done to get good distribution of the tables is startup with
>> round-robin, then from then on use retain assignment.
>>
>> JG
>>
>> > -----Original Message-----
>> > From: Sean Sechrist [mailto:[email protected]]
>> > Sent: Thursday, December 02, 2010 2:50 PM
>> > To: [email protected]
>> > Subject: Re: region, regionserver questions
>> >
>> > Hey Albert,
>> >
>> > If you use TableInputFormat, it will create one map task per region in
>> that
>> > table. So, each mapper should just talk to one regionserver.
>> >
>> > -Sean
>> >
>> > On Thu, Dec 2, 2010 at 5:26 PM, Albert Shau <[email protected]> wrote:
>> >
>> > > Hi,
>> > >
>> > > I'm doing a distributed scan of an hbase table using map-reduce by
>> taking
>> > > all the regions belonging to a regionserver, and then assigning those
>> > > regions to a mapper (so there's 1 mapper per regionserver, and each
>> > mapper
>> > > only talks to one regionserver).  However, doing it this way I'm
>> getting
>> > > some data skew.  For example, I have 2 tables U and T.  Each
>> regionserver
>> > > may have 30 regions, but one regionserver might have 10 regions from
>> > table U
>> > > while another regionserver might have 25 regions from table U.  Is
>> there
>> > a
>> > > way to balance regions per table per regionserver (so that each
>> > regionserver
>> > > has 15 regions from table U for example)?  Or should I just not worry
>> > about
>> > > trying to have each individual mapper only talk to one regionserver?
>> > >
>> > > Also, how do regions get assigned to regionservers?  Is it based on
>> data
>> > > locality?  Region start/end keys?  Randomly?
>> > >
>> > > Thanks,
>> > > Albert
>> > >
>>
>

Reply via email to