HBASE-3721 was integrated to trunk, not 0.90.x HBASE-3871 is under review. So I would interpret my answer as tilting toward outputing Hfiles that fit within a single Region.
If, after your effort, there're still some HFiles that don't fit. You can try my patches. Thanks 2011/5/25 Panayotis Antonopoulos <[email protected]> > > So your answer would be that it is better to have the best possible load > balancing during the reduce phase instead of taking care to output Hfiles > that fit within a single Region, because splitting done by Incremental Load > is rather fast? > > > Date: Wed, 25 May 2011 09:20:10 -0700 > > Subject: Re: HFiles that fit within a single region VS better load > balancing at reduce phase > > From: [email protected] > > To: [email protected] > > > > LoadIncrementalHFiles would split HFile if it doesn't fit within a single > > region. > > > > Please refer to the following JIRAs which speedup LoadIncrementalHFiles: > > https://issues.apache.org/jira/browse/HBASE-3871 > > https://issues.apache.org/jira/browse/HBASE-3721 > > > > Note: parallelizing splitting of HFile(s) by LoadIncrementalHFiles is > done > > on a single machine. > > > > Thanks > > > > 2011/5/25 Panayotis Antonopoulos <[email protected]> > > > > > > > > Hello, > > > I am currently working on a MR job that will output HFiles that will be > > > bulk loaded in an HBase Table. > > > According to the HBase site in order for the bulk loading to be > efficient > > > each HFile of the MR job should fit within a single region. > > > In order to achieve that I use the TotalOrderPartitioner so that each > > > reducer gets Key/Value pairs from a single region. > > > However this prevents partitioning Mapper's output in equal splits so > that > > > I have the best possible load balancing during the reduce phase. > > > > > > So I would like to ask you how important is to create HFiles that fit > > > within a single region. > > > If it makes bulk loading much faster probably it is better to sacrifice > > > load balancing. > > > But is this the case? > > > Has anyone tried both choices? > > > > > > Thank you in advance! > > > Panagiotis. > > > > >
