Thanks for the explanation Mike. It's not a big issue, it's just a test case
where I was needed to ensure ordering for the test, so I'll just use a valid
high utf-16 character. It just seemed odd that the field was showing strangely
in Luke. Your explanation gives the reason, thanks.
Antony
You're right, Lucene changed wrt the 0x character: 2.3 now uses
this character internally as an "end of term" marker when storing term
text.
This was done as part of LUCENE-843 (speeding up indexing).
Technically that character is an invalid UTF16 character (for
interchange), but it looks lik
Hi,
I had a test case that added two documents, each with one untokenized field, and
sorted them. The data in each document was
char(1) + "First"
char(0x) + "Last"
With Lucene 2.1 the documents are sorted correctly, but with Lucene 2.3.1, they
are not. Looking at the index with Luke sh