Re: Inconsistencies in comparisons using KeyComparator

Alan Chaney Mon, 01 Apr 2013 10:04:59 -0700


On 4/1/2013 9:42 AM, Stack wrote:

That is an interesting (disturbing) find Alan.  Hopefully the fallback is
rare.  Did you have a technique for making the compare fallback to pure
java compare?


Thank you,
St.Ack

I agree its disturbing! I based my findings on reading the source codefor 0.92.1 (the CDH4.1.2 distro).

It seems to me that, from org.apache.hadoop.hbase.KeyValue$KVComparatorthe KeyComparator calls KeyComparator.compareRows which in turn calls

Bytes.compareTo(left, loffset, llength, righ, roffset, rlength) which inturn calls Bytes.compareTo which callsLexicographicalCompareHolder.BEST_COMPARER


which appears to be implemented thus:

  static class LexicographicalComparerHolder {
    static final String UNSAFE_COMPARER_NAME =
        LexicographicalComparerHolder.class.getName() + "$UnsafeComparer";

    static final Comparer<byte[]> BEST_COMPARER = getBestComparer();
    /**
     * Returns the Unsafe-using Comparer, or falls back to the pure-Java
     * implementation if unable to do so.
     */
    static Comparer<byte[]> getBestComparer() {
      try {
        Class<?> theClass = Class.forName(UNSAFE_COMPARER_NAME);
...
    }

    enum PureJavaComparer implements Comparer<byte[]> {
      INSTANCE;

      @Override
      public int compareTo(byte[] buffer1, int offset1, int length1,
   ...
      }
    }

So, it looks like to me that Unsafe is the default. However, its notreally very easy to debug this, except by invoking theKeyValue.KeyComparator and seeing what you get, which is what I did.Either I'm doing something very stupid (extremely plausible) or there isa bit of an issue here. I was hoping that someone would point out my error!


I've got some unit tests that appear to show the difference.

Thanks

Alan



On Mon, Apr 1, 2013 at 7:54 AM, Alan Chaney <[email protected]> wrote:

Hi

I need to write some code that sorts row keys identically to HBase.

I looked at the KeyValue.KeyComparator code, and it seems that, by
default, HBase elects to use the 'Unsafe' comparator as the basis of its
comparison, with a fall-back to to the PureJavaComparer should Unsafe not
be available (for example, in tests.)

However, I'm finding that the sort order from a call to
KeyValue.KeyComparator appears to be inconsistent between the two forms.

As an example, comparing:

(first param) (second param)
0000000000000000ffffffffffffff**ffffffffffffffffff616c1b to
0000000000000000ffffffffffffff**ffffffffffffffffff61741b

gives 1 for the default (presumably, Unsafe) call, and -1 using the
PureJavaComparator.

I would actually expect it to be a -ve number, based on the difference of
6c to 74 in the 3rd from last byte above.

Similarly

000000000000000000000000000000**000000000000000000616c1b to
000000000000000000000000000000**0000000000000000061741b

gives > 0 instead of < 0. The PureJavaComparator does a byte-by-byte
comparison by

Is this expected? From the definition of lexicographical compare that I
found, I don't think so. There's no issue of signed comparison here,
because 0x6c and 0x74 are still +ve byte values.

Regards

Alan

Re: Inconsistencies in comparisons using KeyComparator

Reply via email to