[jira] Commented: (SOLR-572) Spell Checker as a Search Component

Shalin Shekhar Mangar (JIRA) Sun, 18 May 2008 08:09:17 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597814#action_12597814
 ]


Shalin Shekhar Mangar commented on SOLR-572:
--------------------------------------------

Grant - I was trying to implement the onlyMorePopular and extendedResults 
format of SCRH when I realized that supporting such a response is not possible 
for text file based dictionaries in the current implementation. Currently, we 
use Lucene's PlainTextDictionary to load such text files and we don't maintain 
any frequency information. What do you suggest?

Bojan/Otis - The terms loaded from the text files are passed onto Lucene's 
SpellChecker as it is. As per Noble's suggestion, I've added support for a 
optional fieldType attribute (this type must be defined in schema.xml). This 
type's query analyzer is used for queries. Wouldn't it be more consistent to 
apply the index-analyzer during index time also?

Both the above problems can be solved if we keep the words loaded from the text 
files in a Lucene index but I'm not sure if we want to go that way.

> Spell Checker as a Search Component
> -----------------------------------
>
>                 Key: SOLR-572
>                 URL: https://issues.apache.org/jira/browse/SOLR-572
>             Project: Solr
>          Issue Type: New Feature
>          Components: spellchecker
>    Affects Versions: 1.3
>            Reporter: Shalin Shekhar Mangar
>             Fix For: 1.3
>
>         Attachments: SOLR-572.patch, SOLR-572.patch
>
>
> Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
> following features:
> * Allow creating a spell index on a given field and make it possible to have 
> multiple spell indices -- one for each field
> * Give suggestions on a per-field basis
> * Given a multi-word query, give only one consistent suggestion
> * Process the query with the same analyzer specified for the source field and 
> process each token separately
> * Allow the user to specify minimum length for a token (optional)
> Consistency criteria for a multi-word query can consist of the following:
> * Preserve the correct words in the original query as it is
> * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

Reply via email to