[
https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefan Oestreicher updated SOLR-606:
------------------------------------
Attachment: handler.component.SpellCheckComponent-collate-patch.txt
I recently ran into this exact issue and I found the problem.
The collation is created by replacing the misspelled tokens with the
suggestions using a StringBuilder:
{noformat}
for (Iterator<Map.Entry<Token, String>> bestIter = best.entrySet().iterator();
bestIter.hasNext();) {
Map.Entry<Token, String> entry = bestIter.next();
Token tok = entry.getKey();
collation.replace(tok.startOffset(), tok.endOffset(), entry.getValue());
}
{noformat}
As you can see it's just replacing the relevant tokens in the original query.
However, if the length of a suggestion doesn't equal the length of the original
token, all offsets used after that replacement are no longer valid thus
randomly yielding incorrect results.
I fixed that by keeping track of that difference and adding it to the token
offsets. For this to work I had to change the HashMap to a LinkedHashMap since
this solution depends on the iteration order of the Tokens to correspond to
their occurrence in the string.
> spellcheck.colate doesn't handle multiple tokens properly
> ---------------------------------------------------------
>
> Key: SOLR-606
> URL: https://issues.apache.org/jira/browse/SOLR-606
> Project: Solr
> Issue Type: Bug
> Components: spellchecker
> Affects Versions: 1.3
> Environment: tomcat
> Reporter: Geoffrey Young
> Assignee: Grant Ingersoll
> Priority: Minor
> Attachments: handler.component.SpellCheckComponent-collate-patch.txt,
> SOLR-606.patch
>
>
> originally posted as part of SOLR-572:
>
> https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487
> the new spellcheck.collate feature seems to exhibit some strange behaviors
> when handed a query with multiple tokens.
> {noformat}
> {
> "responseHeader":{
> "params":{
> "q":"redbull air show"}},
> "spellcheck":{
> "suggestions":[
> "redbull",[
> "suggestion",["redbelly"]],
> "show",[
> "suggestion",["shot"]],
> "collation","redbelly airshotw"]}}
> {noformat}
> in this case, note the fields are incorrectly concatenated (no space between
> tokens, left over 'w' from input string)
> {noformat}
> {
> "responseHeader":{
> "params":{
> "q":"redbull air show",
> "spellcheck.q":"redbull air show"}},
> "spellcheck":{
> "suggestions":[
> "redbull air show",[
> "suggestion",["redbull singers"]],
> "collation","redbull singersredbull air show"]}}
> {noformat}
> this is slightly different - the suggestions are still concatenated without a
> space, but the collation is way off.
> --Geoff
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.