Bulk Loading - LoadIncrementalHFiles

Amit Sela Thu, 01 Nov 2012 10:04:07 -0700

Hi everyone,

I'm using MR to bulk load into HBase by
using HFileOutputFormat.configureIncrementalLoad and after the job is
complete I use loadIncrementalHFiles.doBulkLoad


>From what I see, the MR outputs a file for each CF written and to my
understanding these files are loaded as store files into a region.

What I don't understand is *how many regions will open* ? and *how is that
determined *?
If I have 3 CF's and a lot of data to load, does that mean 3 large store
files will load into 1 region (more ?) and this region will split on major
compaction ?

Can I pre-create regions and tell the bulk load to split the data between
them during the load ?

In general, if someone could elaborate about LoadIncrementalHFiles it would
save me a lot of time diving into it.


Another question I is about running over values, is it possible to load an
updated value ? or generally updating columns and values for an existing
key ?
I'd think that there's no problem but when I try to run the same bulk load
twice (MR and then load) with the same data, the second time fails.
Right after mapreduce.LoadIncrementalHFiles: Trying to load hfile=........
I get: ERROR mapreduce.LoadIncrementalHFiles: Unexpected execution
exception during splitting...


Thanks!

Bulk Loading - LoadIncrementalHFiles

Reply via email to