[jira] Commented: (SOLR-572) Spell Checker as a Search Component

Grant Ingersoll (JIRA) Tue, 17 Jun 2008 09:14:06 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605645#action_12605645
 ]


Grant Ingersoll commented on SOLR-572:
--------------------------------------

{quote}
Why is a WhiteSpaceTokenizer being used for tokenizing the value for a 
spellcheck.q parameter? Wouldn't it be more correct to use the query analyzer 
if the index is being built from a Solr field?

The above argument also applies to queryAnalyzerFieldType which is being used 
for QueryConverter
{quote}

My understanding was that the sc.q parameter was already analyzed and ready to 
be checked, thus all it needed was a conversion to tokens.  As for the 
queryAnalyzerFieldType, that assumes the implementation is the 
IndexBasedSpellChecker or some other field based one that the 
SpellCheckComponent doesn't have access to, thus my reasoning that it needs to 
be handled separately and explicitly, which is why it isn't a part of the 
spellchecker configuration.

 {quote}
I see that we can specify our own query converter through the queryConverter 
section in solrconfig.xml. But the SpellCheckComponent uses 
SpellingQueryConverter directly instead of an interface. We should add a 
QueryConvertor interface if this needs to be pluggable.
{quote}

I thought about making it an abstract base class, but in my mind it is really 
easy to override the SpellingQueryConverter and the component should know how 
to deal with it.

 {quote}
If name is omitted from two dictionaries in solrconfig.xml then both get named 
as Default from the SolrSpellChecker#init method and they overwrite each other 
in the spellCheckers map
{quote}

Hmm, not good.  I will fix.

{quote}
How about building the index in the inform() method? I understand that the 
users can build the index using spellcheck.build=true and they can also use 
QuerySenderListener to build the index but this limits the user to use 
FSDirectory because if we use RAMDirectory and solr is restarted, the 
QuerySenderListener never fires and spell checker is left with no index. It's 
not a major inconvenience to use FSDirectory always but then RAMDirectory 
doesn't bring much to the table.
{quote}

I think this gets back to our early discussions about it not working in inform 
b/c we don't have the reader at that point, or something like that.  I really 
don't know the right answer, but do feel free to try it out.  I do think it 
belongs in inform, but not sure if Solr is ready at that point.  As for the 
QuerySenderListener, seems like it should fire if it is restarted, but I admit 
I don't know a whole lot about that functionality.  


> Spell Checker as a Search Component
> -----------------------------------
>
>                 Key: SOLR-572
>                 URL: https://issues.apache.org/jira/browse/SOLR-572
>             Project: Solr
>          Issue Type: New Feature
>          Components: spellchecker
>    Affects Versions: 1.3
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
>
>
> Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
> following features:
> * Allow creating a spell index on a given field and make it possible to have 
> multiple spell indices -- one for each field
> * Give suggestions on a per-field basis
> * Given a multi-word query, give only one consistent suggestion
> * Process the query with the same analyzer specified for the source field and 
> process each token separately
> * Allow the user to specify minimum length for a token (optional)
> Consistency criteria for a multi-word query can consist of the following:
> * Preserve the correct words in the original query as it is
> * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-572) Spell Checker as a Search Component

Reply via email to