[ http://issues.apache.org/jira/browse/SOLR-41?page=comments#action_12424145 ] Boris Vitez commented on SOLR-41: ---------------------------------
Thank you for the feedback and suggestion. I will change the Filter to use this new feature of Token class as soon as I'm back - on Monday. > PATCH: HyphenatedWordsFilter, Factory and test > ---------------------------------------------- > > Key: SOLR-41 > URL: http://issues.apache.org/jira/browse/SOLR-41 > Project: Solr > Issue Type: New Feature > Components: search > Reporter: Boris Vitez > Priority: Minor > Attachments: HyphenatedWordsFilter.java, hyphenatedwordsfilter.patch, > HyphenatedWordsFilterFactory.java, TestHyphenatedWordsFilter.java > > > When the plain text is extracted from documents, we will often have many > words hyphenated and broken into two lines. This is often the case with > documents where narrow text columns are used, such as newsletters. > In order to increase searching efficiency, this filter unites hyphenated > words broken in two lines. > This filter has to be used together with the WordDelimiterFilter having > catenateWords=1. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
