ValueIndexer for Multi-Byte Numeric Types Broken

Terry Rosenbaum 19 Apr 2003 22:36:27 -0000

Hi,

I've discovered that defining a ValueIndexer with multi-byte
numeric types (e.g. long, int, short, etc.) does not work correctly
when resolving relational comparisons other than equals (e.g.
can produce incorrect results for gretater than, less than, etc.).

The reason is that all data is handled as anonymous byte
arrays using the Value class and the Value class has no
type associated with the data. Thus, the Value.compareTo method
is not able to properly perform comparisons for such data. It
performs byte-by-byte comparisons of the data.

As an example, ValueIndexer encodes the value
1049903940000 as the bytes 0, 0, 0, -12, 115, 38, -63, -96
and encodes the value 1050687000291 as the bytes
0, 0, 0, -12, -95, -45, 78, -29. Thus, when comparing
a Value representing 1049903940000 to a Value
representing 1050687000291  (in pseudocode:
Value(1049903940000).compareTo(1050687000291)),
the incorrect result of 5 is returned indicating that
1049903940000 is greater than 1050687000291 (which
is incorrect).

The problem can be encountered during XPath comparisons
if a ValueIndexer is defined as a numeric indexer for an
attribute or element.

e.g. if you define an index to index the Time attribute as type long:

<Bug Time="1050687000291" />

<index name="BugIndex" class="org.apache.xindice.core.indexer.ValueIndexer"
   pattern="[EMAIL PROTECTED]" type="long" />

And use an XPath like /Bug[(1049903940000 < @Time) and (1050687000291 > @Time)] you may see incorrect query results.

Part of the problem is in the code of Value.compareTo(Value):

          short s1 = (short)(b1 >>> 0);
          short s2 = (short)(b2 >>> 0);
          return s1 > s2 ? (i+1)
                          : -(i+1);

This code attempts to shed the sign bits via ">>> 0" which
has no effect. The result is that s1 and s2 are sign-extended
versions of b1 and b2 and the comparison becomes a signed
comparison rather than the desired unsigned comparison.
This can be fixed by using:

           int i1 = (int)b1 & 0xff;
           int i2 = (int)b2 & 0xff;
           return i1 > i2 ? (i+1)
                          : -(i+1);

But, this only fixes the problem for 0 and positive values being
compared. It does not work for the entire domain of numeric
values.

One possible way to fix this would be to add a type field
to the Value object and perform conversion based on type
prior to performing comparisons. A drawback of this approach
would be that exisiting persisted Value objects (e.g. in existing
databases) would be rendered incompatible.

Does anyone else have any suggestions as to how to fix this
problem?

-Terry

ValueIndexer for Multi-Byte Numeric Types Broken

Reply via email to