[ 
http://issues.apache.org/jira/browse/SOLR-14?page=comments#action_12377440 ] 

Hoss Man commented on SOLR-14:
------------------------------

It would probably be good to make sure we have some UnitTests of the existing 
WDF behavior prior to applying this patch, and then some tests that use this 
new feature just so it's clera how it works in various situations.

As for duplicates: my initial thought was that this could be handled by the 
proposed Filter in SOLR-11... but then i realized yonik has a point: the common 
case is probably going to be no intra-word delimiters, so a short circut check 
that doesn't crete two of every token would probably be better

> Add the ability to preserve the original term when using WordDelimiterFilter
> ----------------------------------------------------------------------------
>
>          Key: SOLR-14
>          URL: http://issues.apache.org/jira/browse/SOLR-14
>      Project: Solr
>         Type: Improvement

>   Components: search
>     Reporter: Richard "Trey" Hyde
>  Attachments: TokenizerFactory.java, WordDelimiterFilter.patch
>
> When doing prefix searching, you need to hang on to the original term 
> othewise you'll miss many matches you should be making.
> Data: ABC-12345
> WordDelimiterFitler may change this into
> ABC 12345 ABC12345
> A user may enter a search such as 
>  ABC\-123*
> Which will fail to find a match given the above scenario.
> The attached patch will allow the use of the "preserveOriginal" option to 
> WordDelimiterFilter and will analyse as
> ABC 12345 ABC12345  ABC-12345 
> in which case we will get a postive match.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to