BTW, nice problem statement... Anyway, I see this too in 3.5. I do NOT see this in 3.6 or trunk, so it looks like a bug that got fixed in the 3.6 time-frame. Don't have the time right now to go back over the JIRA's to see...
Best Erick On Thu, Apr 19, 2012 at 3:39 PM, Cat Bieber <cbie...@techtarget.com> wrote: > I'm trying to use a Solr query to find the next title in alphabetical order > after a given string. The issue I'm facing is that the sort param seems to > sort non-alphanumeric characters in a different order from the ordering used > by a range filter in the q or fq param. I can't filter the non-alphanumeric > characters out because they're integral to the data and it would not be a > useful ordering if it were based only on the alphanumeric portion of the > strings. > > I'm running Solr version 3.5. > > In my current approach, I have a field that is a unique string for each > document: > > <fieldType name="lowerCaseSort" class="solr.TextField" > sortMissingLast="true" omitNorms="true"> > <analyzer> > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.TrimFilterFactory"/> > </analyzer> > </fieldType> > > <field name="uniqueSortString" type="lowerCaseSort" indexed="true" > stored="true"/> > > I'm passing the value for the current document in a range to query > everything after the current string, sorted ascending: > > /select?fl=uniqueSortString&sort=uniqueSortString+asc&q=uniqueSortString:["$1+ZX+Spectrum+HOBETA+format+file"+TO+*]&wt=xml&rows=5&version=2.2 > > In theory, I expect the first result to be the current item and the second > result to be the next one. However, I'm finding that the sort and the range > filter seem to use different ordering: > > <result name="response" numFound="448" start="0"> > <doc> > <str name="uniqueSortString">$1 ZX Spectrum - Emulator</str> > </doc> > <doc> > <str name="uniqueSortString">$1 ZX Spectrum HOBETA format file</str> > </doc> > <doc> > <str name="uniqueSortString">$1 ZX Spectrum Hobetta Picture Format</str> > </doc> > <doc> > <str name="uniqueSortString">$? TR-DOS ZX Spectrum file in HOBETA > format</str> > </doc> > <doc> > <str name="uniqueSortString">$A AutoCAD Autosave File ( Autodesk Inc.)</str> > </doc> > </result> > > Based on the results ordering, sort believes - precedes H, but the range > filter should have excluded that first result if it ordered in the same way. > Digging through the code, I think it looks like sorting uses > String.compareTo() for ordering on a text/string field. However I haven't > been able to track down where the range filter code is. If someone can point > me in the right direction to find that code I'd love to look through it. Or, > if anyone has suggestions regarding a different approach or changes I can > make to this query/field, that would be very helpful. > > Thanks for your time. > -Cat Bieber