Re: region, regionserver questions

Sean Sechrist Thu, 02 Dec 2010 14:50:51 -0800

Hey Albert,

If you use TableInputFormat, it will create one map task per region in that
table. So, each mapper should just talk to one regionserver.


-Sean

On Thu, Dec 2, 2010 at 5:26 PM, Albert Shau <[email protected]> wrote:

> Hi,
>
> I'm doing a distributed scan of an hbase table using map-reduce by taking
> all the regions belonging to a regionserver, and then assigning those
> regions to a mapper (so there's 1 mapper per regionserver, and each mapper
> only talks to one regionserver).  However, doing it this way I'm getting
> some data skew.  For example, I have 2 tables U and T.  Each regionserver
> may have 30 regions, but one regionserver might have 10 regions from table U
> while another regionserver might have 25 regions from table U.  Is there a
> way to balance regions per table per regionserver (so that each regionserver
> has 15 regions from table U for example)?  Or should I just not worry about
> trying to have each individual mapper only talk to one regionserver?
>
> Also, how do regions get assigned to regionservers?  Is it based on data
> locality?  Region start/end keys?  Randomly?
>
> Thanks,
> Albert
>

Re: region, regionserver questions

Reply via email to