Hi Yonik, On Sep 2, 2011, at 7:47 PM, Yonik Seeley wrote:
> On Fri, Sep 2, 2011 at 10:26 PM, Mattmann, Chris A (388J) > <chris.a.mattm...@jpl.nasa.gov> wrote: >> I'm left with childrenshospitallosangeles as a single token resultant from >> the chain. >> So, when I go to sort the titles in Solr, I use sort=title_sort asc, and I >> am getting all kinds of weird results when doing >> a query. > > Hmmm, a random guess would be that perhaps your analysis chain is > actually producing more than one token per document. The lucene > FieldCache takes the highest for each document (just a non-intended > side-effect of how the FieldCache entry is populated by enumerating > terms). > > Try adding fsv=true to your request. It's an undocumented feature > used in distributed search (it stands for field sort values) used to > collate results from different shards. It should add "sort_values" to > your response to tell you the sort values for each document. First off, thanks for the reply. I appreciate it. I tried the fsv=true parameter and it's great, it revealed what's really going on here: "sort_values":[ "title_sort",[null, null, null, null, .... I've got one of those null values for each returned document. Now I guess I have to find out what's wrong with my CombiningFilter. All it does basically is have a static method to call incrementToken() and then call TermAttribute.term() for each of the tokens in the stream. It takes these, appends them to a StringBuffer (concats them), and then returns a new KeywordTokenizer providing a StringReader initialized with the merged StringBuffer. Yes, I know this probably isn't the most efficient way and I'm open to suggestions. I think in spelling this out though, I might have elaborated my problem. Since the method I call in the constructor for my CombiningFilter is super(mergeStreamTokens(in)) where mergeStreamTokens is a static method, I think I might have consumed the input TokenStream by the time it gets called for the sort. It works on analysis.jsp probably because the stream isn't re-consumed? Not sure, something wiggy is going on. I'll keep poking, thanks again. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++