BTW, nice problem statement...

Anyway, I see this too in 3.5. I do NOT see
this in 3.6 or trunk, so it looks like a bug that got fixed
in the 3.6 time-frame. Don't have the time right now
to go back over the JIRA's to see...

Best
Erick

On Thu, Apr 19, 2012 at 3:39 PM, Cat Bieber <cbie...@techtarget.com> wrote:
> I'm trying to use a Solr query to find the next title in alphabetical order
> after a given string. The issue I'm facing is that the sort param seems to
> sort non-alphanumeric characters in a different order from the ordering used
> by a range filter in the q or fq param. I can't filter the non-alphanumeric
> characters out because they're integral to the data and it would not be a
> useful ordering if it were based only on the alphanumeric portion of the
> strings.
>
> I'm running Solr version 3.5.
>
> In my current approach, I have a field that is a unique string for each
> document:
>
> <fieldType name="lowerCaseSort" class="solr.TextField"
> sortMissingLast="true" omitNorms="true">
> <analyzer>
> <charFilter class="solr.HTMLStripCharFilterFactory"/>
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.TrimFilterFactory"/>
> </analyzer>
> </fieldType>
>
> <field name="uniqueSortString" type="lowerCaseSort" indexed="true"
> stored="true"/>
>
> I'm passing the value for the current document in a range to query
> everything after the current string, sorted ascending:
>
> /select?fl=uniqueSortString&sort=uniqueSortString+asc&q=uniqueSortString:["$1+ZX+Spectrum+HOBETA+format+file"+TO+*]&wt=xml&rows=5&version=2.2
>
> In theory, I expect the first result to be the current item and the second
> result to be the next one. However, I'm finding that the sort and the range
> filter seem to use different ordering:
>
> <result name="response" numFound="448" start="0">
> <doc>
> <str name="uniqueSortString">$1 ZX Spectrum - Emulator</str>
> </doc>
> <doc>
> <str name="uniqueSortString">$1 ZX Spectrum HOBETA format file</str>
> </doc>
> <doc>
> <str name="uniqueSortString">$1 ZX Spectrum Hobetta Picture Format</str>
> </doc>
> <doc>
> <str name="uniqueSortString">$? TR-DOS ZX Spectrum file in HOBETA
> format</str>
> </doc>
> <doc>
> <str name="uniqueSortString">$A AutoCAD Autosave File ( Autodesk Inc.)</str>
> </doc>
> </result>
>
> Based on the results ordering, sort believes - precedes H, but the range
> filter should have excluded that first result if it ordered in the same way.
> Digging through the code, I think it looks like sorting uses
> String.compareTo() for ordering on a text/string field. However I haven't
> been able to track down where the range filter code is. If someone can point
> me in the right direction to find that code I'd love to look through it. Or,
> if anyone has suggestions regarding a different approach or changes I can
> make to this query/field, that would be very helpful.
>
> Thanks for your time.
> -Cat Bieber

Reply via email to