See this : https://issues.apache.org/jira/browse/HBASE-3727 And see this thread: http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/21724
You may need to rebase the code to your specific version of hbase, though. --Suraj On Thu, Jun 26, 2014 at 10:28 AM, Kevin <[email protected]> wrote: > I am reading data off of HDFS that don't all get loaded into a single > table. With the current way of bulk loading I can load to the table that > most of the data will end up in, and I can use the client API (i.e., Put) > to load the other data from the file into the other tables. > > The current bulk loading process involves creating the same number of > reducers as there are regions for the specified table. I think I understand > that once the appropriate region servers adopt the HFiles minor compactions > will merge them into the regions' storefiles. > > It seems like you could set the number of reducers to the total number of > regions for all the tables considered. Then you write the partitions file > as key-values where the key is the destination table and the value is a > region start key (instead of the key being the start key and the value > being NullWritable). Mappers could then prefix rows with their destination > table before doing a context.write(). The TotalOrderPartitioner needs to be > modified to account for all these changes. I have a feeling this is an > overly complicated approach or if it would even work. > > Maybe you could do it without all those changes and just use > MultipleOutputs? > > Has anyone else thought about or done bulk loading with multiple tables? >
