Re: Sort difference between 2.1 and 2.3

2008-04-08 Thread Antony Bowesman
Thanks for the explanation Mike. It's not a big issue, it's just a test case where I was needed to ensure ordering for the test, so I'll just use a valid high utf-16 character. It just seemed odd that the field was showing strangely in Luke. Your explanation gives the reason, thanks. Antony

Re: Sort difference between 2.1 and 2.3

2008-04-08 Thread Michael McCandless
You're right, Lucene changed wrt the 0x character: 2.3 now uses this character internally as an "end of term" marker when storing term text. This was done as part of LUCENE-843 (speeding up indexing). Technically that character is an invalid UTF16 character (for interchange), but it looks lik

Sort difference between 2.1 and 2.3

2008-04-07 Thread Antony Bowesman
Hi, I had a test case that added two documents, each with one untokenized field, and sorted them. The data in each document was char(1) + "First" char(0x) + "Last" With Lucene 2.1 the documents are sorted correctly, but with Lucene 2.3.1, they are not. Looking at the index with Luke sh