Hi,
I've discovered that defining a ValueIndexer with multi-byte numeric types (e.g. long, int, short, etc.) does not work correctly when resolving relational comparisons other than equals (e.g. can produce incorrect results for gretater than, less than, etc.).
The reason is that all data is handled as anonymous byte arrays using the Value class and the Value class has no type associated with the data. Thus, the Value.compareTo method is not able to properly perform comparisons for such data. It performs byte-by-byte comparisons of the data.
As an example, ValueIndexer encodes the value 1049903940000 as the bytes 0, 0, 0, -12, 115, 38, -63, -96 and encodes the value 1050687000291 as the bytes 0, 0, 0, -12, -95, -45, 78, -29. Thus, when comparing a Value representing 1049903940000 to a Value representing 1050687000291 (in pseudocode: Value(1049903940000).compareTo(1050687000291)), the incorrect result of 5 is returned indicating that 1049903940000 is greater than 1050687000291 (which is incorrect).
The problem can be encountered during XPath comparisons if a ValueIndexer is defined as a numeric indexer for an attribute or element.
e.g. if you define an index to index the Time attribute as type long:
<Bug Time="1050687000291" />
<index name="BugIndex" class="org.apache.xindice.core.indexer.ValueIndexer" pattern="[EMAIL PROTECTED]" type="long" />
And use an XPath like /Bug[(1049903940000 < @Time) and (1050687000291 > @Time)]
you may see incorrect query results.
Part of the problem is in the code of Value.compareTo(Value):
short s1 = (short)(b1 >>> 0); short s2 = (short)(b2 >>> 0); return s1 > s2 ? (i+1) : -(i+1);
This code attempts to shed the sign bits via ">>> 0" which has no effect. The result is that s1 and s2 are sign-extended versions of b1 and b2 and the comparison becomes a signed comparison rather than the desired unsigned comparison. This can be fixed by using:
int i1 = (int)b1 & 0xff; int i2 = (int)b2 & 0xff; return i1 > i2 ? (i+1) : -(i+1);
But, this only fixes the problem for 0 and positive values being compared. It does not work for the entire domain of numeric values.
One possible way to fix this would be to add a type field to the Value object and perform conversion based on type prior to performing comparisons. A drawback of this approach would be that exisiting persisted Value objects (e.g. in existing databases) would be rendered incompatible.
Does anyone else have any suggestions as to how to fix this problem?
-Terry