Hi James,

Thanks for responding.

The query we were testing looks like this:
http://localhost:8983/solr/testdata/select?q=theatre&spellcheck.q=theatre

I did some further investigation, after discovering that omitting the spellcheck.q parameter stops the error appearing, and it looks like synonym expansion is playing a part in the problem. The spellcheck field is essentially the same as text_general in the example schema, with the substitution of HTMLStripCharFilterFactory instead of the StandardTokenizerFactory at index time:

<fieldType name="text_html" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

With synonyms enabled, spellcheck.q=theatre is being expanded to seven tokens - theatre (3 times), theater, playhouse, studio and workshop. If I disable synonyms in the query analyser, "theatre" is used on its own, and the error doesn't happen (this is the same behaviour as when I omit spellcheck.q).

So, it looks like the quick solution is to disable synonyms in the query analyser for that field. I'll do some further investigation tomorrow to see if I can figure out why the synonym expansion triggers the problem while neither "theatre" nor "theater" on their own do (I can't imagine the other three variants are going to make "there" appear as a spelling correction).

Cheers,

Matt

On 03/12/15 18:53, Dyer, James wrote:
Matt,

Can you give some information about how your spellcheck field is analyzed and 
also if you're using a custom query converter.  Also, try and place the bare 
terms you want checked in spellcheck.q (ex, if your query is q=+movie +theatre, 
then spellcheck.q=movie theatre).  Does it work in this case?  Also, could you 
give the exact query you're using?

This is the very same bug as in the 3 tickets you mention.  We clearly haven't 
solved all of the possible ways this bug can be triggered.  But we cannot fix 
this unless we can come up with a unit test that reliably reproduces it.  At 
the very least, we should handle these problems better than throwing SIOOB like 
this.

Long term, there is probably a better design we could come up with for how 
terms are identified within queries and how collations are generated.

James Dyer
Ingram Content Group


-----Original Message-----
From: Matt Pearce [mailto:m...@flax.co.uk]
Sent: Thursday, December 03, 2015 10:40 AM
To: solr-user
Subject: Spellcheck error

Hi,

We're using Solr 5.3.1, and we're getting a
StringIndexOutOfBoundsException from the SpellCheckCollator. I've done
some investigation, and it looks like the problem is that the corrected
string is shorter than the original query.

For example, the search term is "theatre", the suggested correction is
"there". The error is being thrown when replacing the original query
with the shorter replacement.

This is the stack trace:
java.lang.StringIndexOutOfBoundsException: String index out of range: -2
      at
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:824)
      at java.lang.StringBuilder.replace(StringBuilder.java:262)
      at
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235)
      at
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92)
      at
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:237)
      at
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:202)
      at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)

The error looks very similar to those described in
https://issues.apache.org/jira/browse/SOLR-4489,
https://issues.apache.org/jira/browse/SOLR-3608 and
https://issues.apache.org/jira/browse/SOLR-2509, most of which are closed.

Any suggestions would be appreciated, or should I open a JIRA ticket?

Thanks,

Matt


--
Matt Pearce
Flax - Open Source Enterprise Search
www.flax.co.uk

Reply via email to