[ http://issues.apache.org/jira/browse/SOLR-41?page=all ]
Yonik Seeley resolved SOLR-41.
------------------------------
Resolution: Fixed
Assignee: Yonik Seeley
Thanks Boris, I just committed this.
> PATCH: HyphenatedWordsFilter, Factory and test
> ----------------------------------------------
>
> Key: SOLR-41
> URL: http://issues.apache.org/jira/browse/SOLR-41
> Project: Solr
> Issue Type: New Feature
> Components: search
> Reporter: Boris Vitez
> Assigned To: Yonik Seeley
> Priority: Minor
> Attachments: HyphenatedWordsFilter.java, hyphenatedwordsfilter.patch,
> hyphenatedwordsfilter.patch, HyphenatedWordsFilterFactory.java,
> TestHyphenatedWordsFilter.java
>
>
> When the plain text is extracted from documents, we will often have many
> words hyphenated and broken into two lines. This is often the case with
> documents where narrow text columns are used, such as newsletters.
> In order to increase searching efficiency, this filter unites hyphenated
> words broken in two lines.
> This filter has to be used together with the WordDelimiterFilter having
> catenateWords=1.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira