RE: HFiles that fit within a single region VS better load balancing at reduce phase

Panayotis Antonopoulos Wed, 25 May 2011 10:24:25 -0700

So your answer would be that it is better to have the best possible load 
balancing during the reduce phase instead of taking care to output Hfiles that 
fit within a single Region, because splitting done by Incremental Load is 
rather fast?


> Date: Wed, 25 May 2011 09:20:10 -0700
> Subject: Re: HFiles that fit within a single region VS better load balancing 
> at reduce phase
> From: [email protected]
> To: [email protected]
> 
> LoadIncrementalHFiles would split HFile if it doesn't fit within a single
> region.
> 
> Please refer to the following JIRAs which speedup LoadIncrementalHFiles:
> https://issues.apache.org/jira/browse/HBASE-3871
> https://issues.apache.org/jira/browse/HBASE-3721
> 
> Note: parallelizing splitting of HFile(s) by LoadIncrementalHFiles is done
> on a single machine.
> 
> Thanks
> 
> 2011/5/25 Panayotis Antonopoulos <[email protected]>
> 
> >
> > Hello,
> > I am currently working on a MR job that will output HFiles that will be
> > bulk loaded in an HBase Table.
> > According to the HBase site in order for the bulk loading to be efficient
> > each HFile of the MR job should fit within a single region.
> > In order to achieve that I use the TotalOrderPartitioner so that each
> > reducer gets Key/Value pairs from a single region.
> > However this prevents partitioning Mapper's output in equal splits so that
> > I have the best possible load balancing during the reduce phase.
> >
> > So I would like to ask you how important is to create HFiles that fit
> > within a single region.
> > If it makes bulk loading much faster probably it is better to sacrifice
> > load balancing.
> > But is this the case?
> > Has anyone tried both choices?
> >
> > Thank you in advance!
> > Panagiotis.
> >

RE: HFiles that fit within a single region VS better load balancing at reduce phase

Reply via email to