[jira] Commented: (SOLR-14) Add the ability to preserve the original term when using WordDelimiterFilter

Yonik Seeley (JIRA) Tue, 02 May 2006 08:36:25 -0700

    [ 
http://issues.apache.org/jira/browse/SOLR-14?page=comments#action_12377415 ]


Yonik Seeley commented on SOLR-14:
----------------------------------

Thanks for the patch Trey!

Can you give an example with the resulting token positions (or 
positionIncrements?)

Also, is there an easy way to prevent duplicate tokens from being produced (the 
preserveOriginal version will often be identical to catenateWords or 
catenateNumbers, right?)

> Add the ability to preserve the original term when using WordDelimiterFilter
> ----------------------------------------------------------------------------
>
>          Key: SOLR-14
>          URL: http://issues.apache.org/jira/browse/SOLR-14
>      Project: Solr
>         Type: Improvement

>   Components: search
>     Reporter: Richard "Trey" Hyde
>  Attachments: TokenizerFactory.java, WordDelimiterFilter.patch
>
> When doing prefix searching, you need to hang on to the original term 
> othewise you'll miss many matches you should be making.
> Data: ABC-12345
> WordDelimiterFitler may change this into
> ABC 12345 ABC12345
> A user may enter a search such as 
>  ABC\-123*
> Which will fail to find a match given the above scenario.
> The attached patch will allow the use of the "preserveOriginal" option to 
> WordDelimiterFilter and will analyse as
> ABC 12345 ABC12345  ABC-12345 
> in which case we will get a postive match.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (SOLR-14) Add the ability to preserve the original term when using WordDelimiterFilter

Reply via email to