On 4/1/2013 9:42 AM, Stack wrote:
That is an interesting (disturbing) find Alan. Hopefully the fallback is
rare. Did you have a technique for making the compare fallback to pure
java compare?
Thank you,
St.Ack
I agree its disturbing! I based my findings on reading the source code
for 0.92.1 (the CDH4.1.2 distro).
It seems to me that, from org.apache.hadoop.hbase.KeyValue$KVComparator
the KeyComparator calls KeyComparator.compareRows which in turn calls
Bytes.compareTo(left, loffset, llength, righ, roffset, rlength) which in
turn calls Bytes.compareTo which calls
LexicographicalCompareHolder.BEST_COMPARER
which appears to be implemented thus:
static class LexicographicalComparerHolder {
static final String UNSAFE_COMPARER_NAME =
LexicographicalComparerHolder.class.getName() + "$UnsafeComparer";
static final Comparer<byte[]> BEST_COMPARER = getBestComparer();
/**
* Returns the Unsafe-using Comparer, or falls back to the pure-Java
* implementation if unable to do so.
*/
static Comparer<byte[]> getBestComparer() {
try {
Class<?> theClass = Class.forName(UNSAFE_COMPARER_NAME);
...
}
enum PureJavaComparer implements Comparer<byte[]> {
INSTANCE;
@Override
public int compareTo(byte[] buffer1, int offset1, int length1,
...
}
}
So, it looks like to me that Unsafe is the default. However, its not
really very easy to debug this, except by invoking the
KeyValue.KeyComparator and seeing what you get, which is what I did.
Either I'm doing something very stupid (extremely plausible) or there is
a bit of an issue here. I was hoping that someone would point out my error!
I've got some unit tests that appear to show the difference.
Thanks
Alan
On Mon, Apr 1, 2013 at 7:54 AM, Alan Chaney <[email protected]> wrote:
Hi
I need to write some code that sorts row keys identically to HBase.
I looked at the KeyValue.KeyComparator code, and it seems that, by
default, HBase elects to use the 'Unsafe' comparator as the basis of its
comparison, with a fall-back to to the PureJavaComparer should Unsafe not
be available (for example, in tests.)
However, I'm finding that the sort order from a call to
KeyValue.KeyComparator appears to be inconsistent between the two forms.
As an example, comparing:
(first param) (second param)
0000000000000000ffffffffffffff**ffffffffffffffffff616c1b to
0000000000000000ffffffffffffff**ffffffffffffffffff61741b
gives 1 for the default (presumably, Unsafe) call, and -1 using the
PureJavaComparator.
I would actually expect it to be a -ve number, based on the difference of
6c to 74 in the 3rd from last byte above.
Similarly
000000000000000000000000000000**000000000000000000616c1b to
000000000000000000000000000000**0000000000000000061741b
gives > 0 instead of < 0. The PureJavaComparator does a byte-by-byte
comparison by
Is this expected? From the definition of lexicographical compare that I
found, I don't think so. There's no issue of signed comparison here,
because 0x6c and 0x74 are still +ve byte values.
Regards
Alan