Re: Bulk load to multiple tables

Suraj Varma Thu, 26 Jun 2014 23:02:54 -0700

See this : https://issues.apache.org/jira/browse/HBASE-3727
And see this thread:
http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/21724


You may need to rebase the code to your specific version of hbase, though.
--Suraj


On Thu, Jun 26, 2014 at 10:28 AM, Kevin <[email protected]> wrote:

> I am reading data off of HDFS that don't all get loaded into a single
> table. With the current way of bulk loading I can load to the table that
> most of the data will end up in, and I can use the client API (i.e., Put)
> to load the other data from the file into the other tables.
>
> The current bulk loading process involves creating the same number of
> reducers as there are regions for the specified table. I think I understand
> that once the appropriate region servers adopt the HFiles minor compactions
> will merge them into the regions' storefiles.
>
> It seems like you could set the number of reducers to the total number of
> regions for all the tables considered. Then you write the partitions file
> as key-values where the key is the destination table and the value is a
> region start key (instead of the key being the start key and the value
> being NullWritable). Mappers could then prefix rows with their destination
> table before doing a context.write(). The TotalOrderPartitioner needs to be
> modified to account for all these changes. I have a feeling this is an
> overly complicated approach or if it would even work.
>
> Maybe you could do it without all those changes and just use
> MultipleOutputs?
>
> Has anyone else thought about or done bulk loading with multiple tables?
>

Re: Bulk load to multiple tables

Reply via email to