Hi everyone, I'm using MR to bulk load into HBase by using HFileOutputFormat.configureIncrementalLoad and after the job is complete I use loadIncrementalHFiles.doBulkLoad
>From what I see, the MR outputs a file for each CF written and to my understanding these files are loaded as store files into a region. What I don't understand is *how many regions will open* ? and *how is that determined *? If I have 3 CF's and a lot of data to load, does that mean 3 large store files will load into 1 region (more ?) and this region will split on major compaction ? Can I pre-create regions and tell the bulk load to split the data between them during the load ? In general, if someone could elaborate about LoadIncrementalHFiles it would save me a lot of time diving into it. Another question I is about running over values, is it possible to load an updated value ? or generally updating columns and values for an existing key ? I'd think that there's no problem but when I try to run the same bulk load twice (MR and then load) with the same data, the second time fails. Right after mapreduce.LoadIncrementalHFiles: Trying to load hfile=........ I get: ERROR mapreduce.LoadIncrementalHFiles: Unexpected execution exception during splitting... Thanks!
