I have about 1 billion values I am trying to load into a new HBase table (with just one column and column family), but am running into some issues. Currently I am trying to use MapReduce to import these by first converting them to HFiles and then using LoadIncrementalHFiles.doBulkLoad(). I also use HFileOutputFormat.configureIncrementalLoad() as part of my MR job. My code is essentially the same as this example: https://github.com/Paschalis/HBase-Bulk-Load-Example/blob/master/src/cy/ac/ucy/paschalis/hbase/bulkimport/Driver.java
The problem I'm running into is that only 1 reducer is created by configureIncrementalLoad(), and there is not enough space on this node to handle all this data. configureIncrementalLoad() should start one reducer for every region the table has, so apparently the table only has 1 region -- maybe because it is empty and brand new (my understanding of how regions work is not crystal clear)? The cluster has 5 region servers, so I'd at least like that many reducers to handle this loading. On a side note, I also tried the command line tool, completebulkload, but am running into other issues with this (timeouts, possible heap issues) -- probably due to only one server being assigned the task of inserting all the records (i.e. I look at the region servers' logs, and only one of the servers has log entries; the rest are idle). Any help is appreciated -Dolan Antenucci
