Hi James,
Thanks for responding.
The query we were testing looks like this:
http://localhost:8983/solr/testdata/select?q=theatre&spellcheck.q=theatre
I did some further investigation, after discovering that omitting the
spellcheck.q parameter stops the error appearing, and it looks like
synonym expansion is playing a part in the problem. The spellcheck field
is essentially the same as text_general in the example schema, with the
substitution of HTMLStripCharFilterFactory instead of the
StandardTokenizerFactory at index time:
<fieldType name="text_html" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
With synonyms enabled, spellcheck.q=theatre is being expanded to seven
tokens - theatre (3 times), theater, playhouse, studio and workshop. If
I disable synonyms in the query analyser, "theatre" is used on its own,
and the error doesn't happen (this is the same behaviour as when I omit
spellcheck.q).
So, it looks like the quick solution is to disable synonyms in the query
analyser for that field. I'll do some further investigation tomorrow to
see if I can figure out why the synonym expansion triggers the problem
while neither "theatre" nor "theater" on their own do (I can't imagine
the other three variants are going to make "there" appear as a spelling
correction).
Cheers,
Matt
On 03/12/15 18:53, Dyer, James wrote:
Matt,
Can you give some information about how your spellcheck field is analyzed and
also if you're using a custom query converter. Also, try and place the bare
terms you want checked in spellcheck.q (ex, if your query is q=+movie +theatre,
then spellcheck.q=movie theatre). Does it work in this case? Also, could you
give the exact query you're using?
This is the very same bug as in the 3 tickets you mention. We clearly haven't
solved all of the possible ways this bug can be triggered. But we cannot fix
this unless we can come up with a unit test that reliably reproduces it. At
the very least, we should handle these problems better than throwing SIOOB like
this.
Long term, there is probably a better design we could come up with for how
terms are identified within queries and how collations are generated.
James Dyer
Ingram Content Group
-----Original Message-----
From: Matt Pearce [mailto:m...@flax.co.uk]
Sent: Thursday, December 03, 2015 10:40 AM
To: solr-user
Subject: Spellcheck error
Hi,
We're using Solr 5.3.1, and we're getting a
StringIndexOutOfBoundsException from the SpellCheckCollator. I've done
some investigation, and it looks like the problem is that the corrected
string is shorter than the original query.
For example, the search term is "theatre", the suggested correction is
"there". The error is being thrown when replacing the original query
with the shorter replacement.
This is the stack trace:
java.lang.StringIndexOutOfBoundsException: String index out of range: -2
at
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:824)
at java.lang.StringBuilder.replace(StringBuilder.java:262)
at
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235)
at
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92)
at
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:237)
at
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:202)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)
The error looks very similar to those described in
https://issues.apache.org/jira/browse/SOLR-4489,
https://issues.apache.org/jira/browse/SOLR-3608 and
https://issues.apache.org/jira/browse/SOLR-2509, most of which are closed.
Any suggestions would be appreciated, or should I open a JIRA ticket?
Thanks,
Matt
--
Matt Pearce
Flax - Open Source Enterprise Search
www.flax.co.uk