Bulk load fails to identify pre-split regions

Amit Sela Tue, 19 Nov 2013 05:41:56 -0800

Hi all,
I'm using HBase 0.94.2 (and Hadoop 1.0.4).
I'm using bulk load on daily basis for over a year with no problem.
I recently moved to an OSGi client, and that required some changes.
One of tha changes I made is a fix to what seems like a bug that I
described in https://issues.apache.org/jira/browse/HBASE-9682
While running some tests I executed bulk load (with pre-splitting) a few
times and in one of the times it seems that bulk load didn't identify the
pre-split regions and loaded the HFiles into 2 new regions (instead of 19
pre-split). What's even worse is that it made a mess of lexicographical
order of start/end keys in those regions.


for example:
if pre-split reginos start/end keys were:
Start                 End
                          1
  1                      2
  2                      3
  3

It turned to:
Start                 End
                        new1
  1                      2
  new1
  2                      3
  3

So that even scanning over those regions is impossible.

I'm having hard time recreating this behavior so I'm not sure it's the fix
I did (also described in the Jira comments).

Any ideas ?

Thanks,

Amit

Bulk load fails to identify pre-split regions

Reply via email to