[ http://issues.apache.org/jira/browse/NUTCH-48?page=all ]

Andy Liu updated NUTCH-48:
--------------------------

    Attachment: spell-check.patch

run this command:

bin/nutch org.apache.nutch.spell.NGramSpeller -i [main index] -o [output 
spelling index] -f content -minThreshold 500

to generate the NGrams spelling index.  minThreshold tells NGramSpeller to only 
include terms that have a document frequency higher than X.  Your index will 
contain a lot of mispelled words, so this parameter will help you exclude many 
of them.  You'll have to experiment to find which value works best for you.

After you generate the index, you can test using:

bin/nutch org.apache.nutch.spell.SpellCheckerBean [spelling index]

To activate spell checking, you'll have to uncomment the line including 
spell-check.jsp in search.jsp.  You'll also have to edit a config parameter in 
nutch-site to define where your spelling index is located.

There's a bunch of other levers you can tweak when generating and using your 
n-gram speling index.  Look at the comments in NGramSpeller (written by David 
Spencer) and SpellCheckerBean for more details.

> "Did you mean"  query enhancement/refignment feature request
> ------------------------------------------------------------
>
>          Key: NUTCH-48
>          URL: http://issues.apache.org/jira/browse/NUTCH-48
>      Project: Nutch
>         Type: New Feature
>   Components: web gui
>  Environment: All platforms
>     Reporter: byron miller
>     Priority: Minor
>  Attachments: spell-check.patch
>
> Looking to implement a "Did you mean" feature for query result pages that 
> return < = x amount of results to invoke a response that would recommend a 
> fixed/related or spell checked query to try.
> Note from Doug to users list:
> David Spencer has worked on this some.
> http://www.searchmorph.com/weblog/index.php?id=23
> I think the code on his site might be more recent than what's committed
> to the lucene/contrib directory.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to