[jira] Commented: (SOLR-41) PATCH: HyphenatedWordsFilter, Factory and test

Yonik Seeley (JIRA) Fri, 28 Jul 2006 08:47:40 -0700

    [ 
http://issues.apache.org/jira/browse/SOLR-41?page=comments#action_12424112 ] 
            
Yonik Seeley commented on SOLR-41:
----------------------------------


Thanks Boris!

A common problem when creating new tokens is losing existing position 
increments.
I recently changed Lucene's Token class so that it's cloneable and you can 
change the text with setTermText().

So you may want to just change the text of the first token rather than creating 
a new one.

> PATCH: HyphenatedWordsFilter, Factory and test
> ----------------------------------------------
>
>                 Key: SOLR-41
>                 URL: http://issues.apache.org/jira/browse/SOLR-41
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Boris Vitez
>            Priority: Minor
>         Attachments: HyphenatedWordsFilter.java, hyphenatedwordsfilter.patch, 
> HyphenatedWordsFilterFactory.java, TestHyphenatedWordsFilter.java
>
>
> When the plain text is extracted from documents, we will often have many 
> words hyphenated and broken into two lines. This is often the case with 
> documents where narrow text columns are used, such as newsletters.
> In order to increase searching efficiency, this filter unites hyphenated 
> words broken in two lines.
> This filter has to be used together with the WordDelimiterFilter having 
> catenateWords=1.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (SOLR-41) PATCH: HyphenatedWordsFilter, Factory and test

Reply via email to