[jira] Commented: (SOLR-41) PATCH: HyphenatedWordsFilter, Factory and test

Boris Vitez (JIRA) Fri, 28 Jul 2006 11:34:51 -0700

    [ 
http://issues.apache.org/jira/browse/SOLR-41?page=comments#action_12424145 ] 
            
Boris Vitez commented on SOLR-41:
---------------------------------


Thank you for the feedback and suggestion.
I will change the Filter to use this new feature of Token class as soon as I'm 
back - on Monday.

> PATCH: HyphenatedWordsFilter, Factory and test
> ----------------------------------------------
>
>                 Key: SOLR-41
>                 URL: http://issues.apache.org/jira/browse/SOLR-41
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Boris Vitez
>            Priority: Minor
>         Attachments: HyphenatedWordsFilter.java, hyphenatedwordsfilter.patch, 
> HyphenatedWordsFilterFactory.java, TestHyphenatedWordsFilter.java
>
>
> When the plain text is extracted from documents, we will often have many 
> words hyphenated and broken into two lines. This is often the case with 
> documents where narrow text columns are used, such as newsletters.
> In order to increase searching efficiency, this filter unites hyphenated 
> words broken in two lines.
> This filter has to be used together with the WordDelimiterFilter having 
> catenateWords=1.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (SOLR-41) PATCH: HyphenatedWordsFilter, Factory and test

Reply via email to