[ 
https://issues.apache.org/jira/browse/SOLR-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592328#action_12592328
 ] 

Shalin Shekhar Mangar commented on SOLR-507:
--------------------------------------------

I have just finished implementing a SpellCheck library (using Lucene) for a 
project which was not already using Solr. I implemented a few ideas there which 
can be added to Solr.

 - Given a user query consisting of many words, return just one suggestion for 
the whole query e.g. search for "hybrd sedn" gives you "hybrid sedan" as a 
suggestion
 - Give me a suggestion on a per-field basis
 - Never give duplicate words in a suggestion e.g. My index contains 
"Mercedes-Benz" and user searches for "mercedec bens", he should not get a 
suggestion like "Mercedes-Benz Mercedes-Benz"
 - Don't try to give a suggestion for tokens less than a given length (my impl 
used 3). For a query like "mercedes e class" it avoids giving a suggestion like 
"mercedes e-class c-class"

I understand that these tweaks are often very specific to the use-case, but we 
can atleast provide the features for people to use as they see fit. In order to 
implement the multiple-field support, we can change SpellCheckerRequestHandler 
to create HighFrequencyDictionary for each configured field and add them all to 
the spell check index. We can use the overloaded suggestSimilar method (which 
accepts field) to query. If this sounds fine, I can give a patch to add these 
features.

> Spell Checking Improvements
> ---------------------------
>
>                 Key: SOLR-507
>                 URL: https://issues.apache.org/jira/browse/SOLR-507
>             Project: Solr
>          Issue Type: New Feature
>          Components: spellchecker
>            Reporter: Jayson Minard
>
> Creating a placeholder issue to track Spell Checking Improvements.  
> Individual issues can later be created and linked for each area of separable 
> concern when they are determined.  
> Areas to discuss include:
> # spell suggestions from within the current query (minus terms being 
> corrected) and filter so that suggestions are always valid
> ** need approaches to merging the spelling list with the current mask of 
> valid records.  Also, is this a better change to Lucene first, or something 
> that belongs in Solr?
> ** need to add spell checking as query component and make available to 
> various query handlers
> ** spell checking to be field specific to support responding correctly with 
> dismax queries
> # spell suggestions from a distributed search (SOLR-303) by augmenting the 
> response, or alternatively just provide a federating of Spell Checker 
> requests on their own and let the application decide when to use each.
> # spell suggestions as a search component to augment other queries
> What are other typical areas of concern, or suggestions for improvements for 
> spell checking that can be tracked?  
> I am willing to look at driving a patch for this area, especially for spell 
> checking working within the current result set, and across  distributed 
> search.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to