Re: Spell Check Handler

Tristan Vittorio Mon, 09 Jul 2007 03:46:35 -0700

I think there is some confusion regarding how the spell checker actually
uses the termSourceField.  It is suggested that you use a simple field type
such a "string", however since this field type does not tokenize or split
words, it is only useful in situations where the whole field is considered a
dictionary "word":


<add>
<doc>
<field name="title">Accountant</field>
<http://localhost:8984/solr/select/?q=Accountent&qt=spellchecker&cmd=rebuildand><field
name="title">Auditor</field>
<field name="title">Solicitor</field>
</doc
</add>

The follow example case will not work with spell checker since the whole
field is considered a single word or string:

<add>
<doc>
<field name="title">Accountant reveals that Accounting is boring</field>
</doc
</add>

I might suggest that you create an additional field in your schema that
takes advantage of the StandardTokenizer and StandardFilter which doesn't
perform a great deal of processing on the field yet should provide decent
results when used with the spell checker:

<fieldType name="spell" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
   <tokenizer class="solr.StandardTokenizerFactory"/>
   <filter class="solr.StopFilterFactory" ignoreCase="true" words="
stopwords.txt"/>
   <filter class="solr.StandardFilterFactory"/>
   <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer>
 <analyzer type="query">
   <tokenizer class="solr.StandardTokenizerFactory"/>
   <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
   <filter class="solr.StopFilterFactory" ignoreCase="true" words="
stopwords.txt"/>
   <filter class="solr.StandardFilterFactory"/>
   <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer>
</fieldType>

If you want this field to be automatically populated with the contents of
the title field when a document is added to the index, simply use a
copyField:

<copyField source="title" dest="spell"/>

Hope this helps, let me know if this is still not clear, I probably will add
it to the wiki page soon.

cheers,
Tristan



On 7/9/07, climbingrose <[EMAIL PROTECTED]> wrote:


Thanks for the quick reply. However, I'm still not able to setup
spellchecker. Solr does create spell directory under data but doesn't seem
to build the spellchecker index. Here are snippets of my schema.xml:

<field name="title" type="string" indexed="true" stored="true"/>

<requestHandler name="spellchecker" class="solr.SpellCheckerRequestHandler
"
startup="lazy">
    <!-- default values for query parameters -->
     <lst name="defaults">
       <int name="suggestionCount">1</int>
       <float name="accuracy">0.5</float>
     </lst>

     <!-- Main init params for handler -->

     <!-- The directory where your SpellChecker Index should live.   -->
     <!-- May be absolute, or relative to the Solr "dataDir" directory.
-->
     <!-- If this option is not specified, a RAM directory will be used
-->
     <str name="spellcheckerIndexDir">spell</str>

     <!-- the field in your schema that you want to be able to build -->
     <!-- your spell index on. This should be a field that uses a very -->
     <!-- simple FieldType without a lot of Analysis (ie: string) -->
     <str name="termSourceField">title</str>

   </requestHandler>

I tried this url:

http://localhost:8984/solr/select/?q=Accountent&qt=spellchecker&cmd=rebuildand
receive this:

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">2</int>
</lst>
<str name="cmdExecuted">rebuild</str>
<arr name="suggestions"/>
</response>


On 7/9/07, Tristan Vittorio <[EMAIL PROTECTED]> wrote:
>
> The spellchecker should be available in 1.2 release, your query is
> incorrect, try the following:
>
>
>
http://localhost:8984/solr/select/?q=java&qt=spellchecker&termSourceField=title_text&cmd=rebuild
>
> the 'q' parameter must only contain the word being checked; you must
> specify
> the field separately.  You can set "termSourceField" in your
> solrconfig.xmlfile so you do not need to explicitly set it each time
> you want to run a
> spell check query. Also make sure your field isn't heavily processed (
i.e.
> with porter stemmer analyzers) otherwise the suggestions will look a bit
> weird / mangled.  Take a look at the wiki page for more info:
>
> http://wiki.apache.org/solr/SpellCheckerRequestHandler
>
> cheers,
> Tristan
>
>
>
> On 7/9/07, climbingrose <[EMAIL PROTECTED]> wrote:
> >
> > Hi Tristan,
> >
> > Is this spellchecker available in 1.2 release or I have to build the
> > trunk.
> > I tried your instructions but Solr returns nothing:
> >
> >
> >
>
http://localhost:8984/solr/select/?q=title_text:java&qt=spellchecker&cmd=rebuild
> >
> > Result:
> >
> > <response>
> > <lst name="responseHeader">
> > <int name="status">0</int>
> > <int name="QTime">3</int>
> > </lst>
> > <str name="cmdExecuted">rebuild</str>
> > <arr name="suggestions"/>
> > </response>
> >
> > Thanks.
> >
> >
> > On 7/8/07, Tristan Vittorio <[EMAIL PROTECTED]> wrote:
> > >
> > > Hi Otis,
> > >
> > > I have written a draft wiki entry for the spell checker:
> > > http://wiki.apache.org/solr/SpellCheckerRequestHandler
> > >
> > > I've learned that my initial observation about the suggestion
ordering
> > was
> > > incorrect, it does in fact order the results by popularity (or term
> > > frequency) of the word in the termSourceField, the problem I
> experienced
> > > was
> > > caused by setting termSourceField to a field of type "text", which
> > heavily
> > > stemmed and analyzed the words.  I found that using the
> > StandardTokenizer
> > > and StandardFilter and removing the PorterStemmer and
LowerCaseFilter
> > from
> > > the field schema really improved the spell checker performance.
> > >
> > > I haven't included this info on the wiki page yet, I'll try to
update
> it
> > > soon when I have a bit more time.
> > >
> > > cheers,
> > > Tristan
> > >
> > >
> > >
> > > On 7/8/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> > > >
> > > > Tristan - good summary - want to copy that to the Solr Wiki?
> > > >
> > > > Thanks,
> > > > Otis
> > > >
> > > > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> > > > Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
> > > >
> > > > ----- Original Message ----
> > > > From: Tristan Vittorio <[EMAIL PROTECTED]>
> > > > To: solr-user@lucene.apache.org
> > > > Sent: Saturday, July 7, 2007 1:51:15 AM
> > > > Subject: Re: Spell Check Handler
> > > >
> > > > I couldn't find any documention on the spell check handler either
> but
> > > > found
> > > > enough information from the solrconfig.xml file, simply search for
> > > > "SpellCheckerRequestHandler" (online version here):
> > > >
> > > >
> > >
> >
>
http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/solrconfig.xml
> > > >
> > > > You can view the original development discussion from JIRA (not
sure
> > how
> > > > helpful that will be for you though):
> > > > https://issues.apache.org/jira/browse/SOLR-81
> > > >
> > > > In a nutshell, the configuration parameters available are::
> > > >
> > > > suggestionCount: determines how many spelling suggestions are
> > returned.
> > > > accuracy: a float value between 1.0 and 0.0 on how close the
> suggested
> > > > words
> > > > should match the original word being checked.
> > > > spellcheckerIndexDir and  termSourceField: check solrconfig.xmlfor
> a
> > > full
> > > > explanation.
> > > >
> > > > In order to use the spell checking hander for the first time, you
> need
> > > to
> > > > explicitly build the spelling index with a sample query something
> like
> > > > this:
> > > >
> > > >
> > >
> >
>
http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker&cmd=rebuild
> > > > <http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker>
> > > > Depending on how large you main index is, this rebuild operation
> could
> > > > take
> > > > a while.  Subsequent queries can omit '&cmd=rebuild' and will
return
> > > > results
> > > > much faster:
> > > >
> > > > http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker
> > > > <http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker>
> > > > The order of the suggestions returned seems to be based on the
> > accuracy
> > > > figure (i.e. how close it matches the original word). it would be
> > great
> > > to
> > > > be able to sort these suggested results based on term frequency /
> > > document
> > > > frequency of the suggested word in the main index, since the most
> > > accurate
> > > > suggestion may not always be the most relevant.
> > > >
> > > > As far as I can tell there is currently no way of doing this using
> the
> > > > spellchecker handler alone (you could always run seperate standard
> > > queries
> > > > on each word suggestion and order by numDocs, but that would be
very
> > > > inefficient), has anybody else tried to achieve this?
> > > >
> > > > cheers,
> > > > Tristan
> > > >
> > > >
> > > >
> > > > On 7/7/07, Andrew Nagy <[EMAIL PROTECTED] > wrote:
> > > > >
> > > > > Hello, is there any documentation on how to use the new spell
> check
> > > > > module?
> > > > >
> > > > > Thanks
> > > > > Andrew
> > > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Cuong Hoang
> >
>



--
Regards,

Cuong Hoang

Re: Spell Check Handler

Reply via email to