LoadIncrementalHFiles would split HFile if it doesn't fit within a single region.
Please refer to the following JIRAs which speedup LoadIncrementalHFiles: https://issues.apache.org/jira/browse/HBASE-3871 https://issues.apache.org/jira/browse/HBASE-3721 Note: parallelizing splitting of HFile(s) by LoadIncrementalHFiles is done on a single machine. Thanks 2011/5/25 Panayotis Antonopoulos <[email protected]> > > Hello, > I am currently working on a MR job that will output HFiles that will be > bulk loaded in an HBase Table. > According to the HBase site in order for the bulk loading to be efficient > each HFile of the MR job should fit within a single region. > In order to achieve that I use the TotalOrderPartitioner so that each > reducer gets Key/Value pairs from a single region. > However this prevents partitioning Mapper's output in equal splits so that > I have the best possible load balancing during the reduce phase. > > So I would like to ask you how important is to create HFiles that fit > within a single region. > If it makes bulk loading much faster probably it is better to sacrifice > load balancing. > But is this the case? > Has anyone tried both choices? > > Thank you in advance! > Panagiotis. >
