[ https://issues.apache.org/jira/browse/NUTCH-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-72: ------------------------------- Bulk close of legacy issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_open_legacy_issues_in_jira > Query basic filter with correction feature > ------------------------------------------ > > Key: NUTCH-72 > URL: https://issues.apache.org/jira/browse/NUTCH-72 > Project: Nutch > Issue Type: New Feature > Components: searcher > Environment: lucene > Reporter: Christophe Noel > Attachments: querycorrectionplugin.zip > > > This plugin improves query-basic plugin with a correction feature. > Lucene includes FuzzyQuery feature which consists of searching not only for > matching terms, but searching for very similar terms too. > This plugin should be used instead of query-basic, for people looking for an > easy solution about users query requests correction. > Correction Query Plugin can be used as follows : > Solution 1 : If you want to search for very similar terms, add > autocorrectionmod as the first term of the query (example : 'nutch engine' -> > 'autocorrectionmod nutch engine') > Solution 2 : Create a new search.jsp page which include a "correction" > checkbox management (<input type="checkbox" name="autocorrection" > value="true"> may automatically add 'autocorrectionmod' as the first term of > the query) > QueryFuzzy knows a big problem : it is very slow for large index ! > So Correction Query Plugin works as follows : > - it is not useful for big indexes > - it only works for 5 characters and more words > - it only look for words matching with the 2 first characters (to improve > performance this should be set to 3/4) > - it only works for 65 % matching suffixes (algorithm is levenstein) > PLease give your opinion about it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira