So your answer would be that it is better to have the best possible load 
balancing during the reduce phase instead of taking care to output Hfiles that 
fit within a single Region, because splitting done by Incremental Load is 
rather fast?

> Date: Wed, 25 May 2011 09:20:10 -0700
> Subject: Re: HFiles that fit within a single region VS better load balancing 
> at reduce phase
> From: [email protected]
> To: [email protected]
> 
> LoadIncrementalHFiles would split HFile if it doesn't fit within a single
> region.
> 
> Please refer to the following JIRAs which speedup LoadIncrementalHFiles:
> https://issues.apache.org/jira/browse/HBASE-3871
> https://issues.apache.org/jira/browse/HBASE-3721
> 
> Note: parallelizing splitting of HFile(s) by LoadIncrementalHFiles is done
> on a single machine.
> 
> Thanks
> 
> 2011/5/25 Panayotis Antonopoulos <[email protected]>
> 
> >
> > Hello,
> > I am currently working on a MR job that will output HFiles that will be
> > bulk loaded in an HBase Table.
> > According to the HBase site in order for the bulk loading to be efficient
> > each HFile of the MR job should fit within a single region.
> > In order to achieve that I use the TotalOrderPartitioner so that each
> > reducer gets Key/Value pairs from a single region.
> > However this prevents partitioning Mapper's output in equal splits so that
> > I have the best possible load balancing during the reduce phase.
> >
> > So I would like to ask you how important is to create HFiles that fit
> > within a single region.
> > If it makes bulk loading much faster probably it is better to sacrifice
> > load balancing.
> > But is this the case?
> > Has anyone tried both choices?
> >
> > Thank you in advance!
> > Panagiotis.
> >
                                          

Reply via email to