Dmitry,
You could use the public void configure() method in the DoFn to
manually set the number of reducers. Could also manually split the table
from the HBase API after writing out.
Dave
On Wed, Jan 11, 2017 at 10:47 AM Dmitry Gorbatsevich <
[email protected]> wrote:
> Hey,
>
> I am trying to use crunch to bulk load data into HBase.
> If you are using plain MR you can push HBase to control number of reducers
> (1 HFile per region) using the following code:
>
> HFileOutputFormat.configureIncrementalLoad (job, table)
>
>
> However, I did not manage to find anything related using crunch classes
> (HFileTarget).
> Without this LoadIncrementalHFiles.doBulkLoad(new Path(hBasePath),
> hTable); takes a lot of time because of resplitting files…
> I am wondering how can I push crunch to use the same strategy as pure MR
> uses with HFileOutputFormat.configureIncrementalLoad?
> Is it possible?
>
> Here is the sample code that I use to write data into HFiles:
>
> PCollection<Cell> cellsUsers = users.parallelDo(new DoFn<Pair<String,
> Integer>, Cell>() {
>
> @Override
> public void process(Pair<String, Integer> input, Emitter<Cell> emitter) {
> byte[] row = input.first().getBytes();
> byte[] value = String.valueOf(input.second()).getBytes();
> byte[] family = "cf1".getBytes();
> byte[] qualifier = “q1".getBytes();
> long timestamp = System.currentTimeMillis();
>
> Cell cell = CellUtil.createCell(row, family, qualifier, timestamp,
> KeyValue.Type.Put.getCode(), value);
>
> emitter.emit(cell);
> }
> }, cells());
>
> cellsUsers.write(new HFileTarget(hBaseFullPath), WriteMode.OVERWRITE);
>
> Thanks in advance,
> Dmitry.
>