Re: HFiles that fit within a single region VS better load balancing at reduce phase

Ted Yu Wed, 25 May 2011 09:20:42 -0700

LoadIncrementalHFiles would split HFile if it doesn't fit within a single
region.


Please refer to the following JIRAs which speedup LoadIncrementalHFiles:
https://issues.apache.org/jira/browse/HBASE-3871
https://issues.apache.org/jira/browse/HBASE-3721

Note: parallelizing splitting of HFile(s) by LoadIncrementalHFiles is done
on a single machine.

Thanks

2011/5/25 Panayotis Antonopoulos <[email protected]>

>
> Hello,
> I am currently working on a MR job that will output HFiles that will be
> bulk loaded in an HBase Table.
> According to the HBase site in order for the bulk loading to be efficient
> each HFile of the MR job should fit within a single region.
> In order to achieve that I use the TotalOrderPartitioner so that each
> reducer gets Key/Value pairs from a single region.
> However this prevents partitioning Mapper's output in equal splits so that
> I have the best possible load balancing during the reduce phase.
>
> So I would like to ask you how important is to create HFiles that fit
> within a single region.
> If it makes bulk loading much faster probably it is better to sacrifice
> load balancing.
> But is this the case?
> Has anyone tried both choices?
>
> Thank you in advance!
> Panagiotis.
>

Re: HFiles that fit within a single region VS better load balancing at reduce phase

Reply via email to