Re: bulk loading regions number

Oleg Ruchovets Mon, 10 Sep 2012 01:46:21 -0700

Great
  That is actually what I am thinking about too.
What is the best practice to choose HFile size?
What is the penalty to define it very big?


Thanks
Oleg.

On Mon, Sep 10, 2012 at 4:24 AM, Harsh J <[email protected]> wrote:

> Hi Oleg,
>
> If the root issue is a growing number of regions, why not control that
> instead of a way to control the Reducer count? You could, for example,
> raise the split-point sizes for HFiles, to not have it split too much,
> and hence have larger but fewer regions?
>
> Given that you have 10 machines, I'd go this way rather than ending up
> with a lot of regions causing issues with load.
>
> On Mon, Sep 10, 2012 at 1:49 PM, Oleg Ruchovets <[email protected]>
> wrote:
> > Hi ,
> >   I am using bulk loading to write my data to hbase.
> >
> > I works fine , but number of regions growing very rapidly.
> > Entering ONE WEEK of data I got  200 regions (I am going to save years of
> > data).
> > As a result job which writes data to HBase has REDUCERS number equals
> > REGIONS number.
> > So entering only one WEEK of data I have 200 reducers.
> >
> > Questions:
> >    How to resolve the problem of constantly growing reducers number using
> > bulk loading and TotalOrderPartition.
> >  I have 10 machine cluster and I think I should have ~ 30 reducers.
> >
> > Thank in advance.
> > Oleg.
>
>
>
> --
> Harsh J
>

Re: bulk loading regions number

Reply via email to