Hey,

I am trying to use crunch to bulk load data into HBase.
If you are using plain MR you can push HBase to control number of reducers (1 
HFile per region) using the following code:
HFileOutputFormat.configureIncrementalLoad (job, table)

However, I did not manage to find anything related using crunch classes 
(HFileTarget).
Without this LoadIncrementalHFiles.doBulkLoad(new Path(hBasePath), hTable); 
takes a lot of time because of resplitting files…
I am wondering how can I push crunch to use the same strategy as pure MR uses 
with HFileOutputFormat.configureIncrementalLoad?
Is it possible?

Here is the sample code that I use to write data into HFiles:
PCollection<Cell> cellsUsers = users.parallelDo(new DoFn<Pair<String, Integer>, 
Cell>() {

@Override
public void process(Pair<String, Integer> input, Emitter<Cell> emitter) {
byte[] row = input.first().getBytes();
byte[] value = String.valueOf(input.second()).getBytes();
byte[] family = "cf1".getBytes();
byte[] qualifier = “q1".getBytes();
long timestamp = System.currentTimeMillis();

Cell cell = CellUtil.createCell(row, family, qualifier, timestamp, 
KeyValue.Type.Put.getCode(), value);

emitter.emit(cell);
}
}, cells());

cellsUsers.write(new HFileTarget(hBaseFullPath), WriteMode.OVERWRITE);

Thanks in advance,
Dmitry.

Reply via email to