Great That is actually what I am thinking about too. What is the best practice to choose HFile size? What is the penalty to define it very big?
Thanks Oleg. On Mon, Sep 10, 2012 at 4:24 AM, Harsh J <[email protected]> wrote: > Hi Oleg, > > If the root issue is a growing number of regions, why not control that > instead of a way to control the Reducer count? You could, for example, > raise the split-point sizes for HFiles, to not have it split too much, > and hence have larger but fewer regions? > > Given that you have 10 machines, I'd go this way rather than ending up > with a lot of regions causing issues with load. > > On Mon, Sep 10, 2012 at 1:49 PM, Oleg Ruchovets <[email protected]> > wrote: > > Hi , > > I am using bulk loading to write my data to hbase. > > > > I works fine , but number of regions growing very rapidly. > > Entering ONE WEEK of data I got 200 regions (I am going to save years of > > data). > > As a result job which writes data to HBase has REDUCERS number equals > > REGIONS number. > > So entering only one WEEK of data I have 200 reducers. > > > > Questions: > > How to resolve the problem of constantly growing reducers number using > > bulk loading and TotalOrderPartition. > > I have 10 machine cluster and I think I should have ~ 30 reducers. > > > > Thank in advance. > > Oleg. > > > > -- > Harsh J >
