[ http://issues.apache.org/jira/browse/SOLR-14?page=comments#action_12377415 ]
Yonik Seeley commented on SOLR-14: ---------------------------------- Thanks for the patch Trey! Can you give an example with the resulting token positions (or positionIncrements?) Also, is there an easy way to prevent duplicate tokens from being produced (the preserveOriginal version will often be identical to catenateWords or catenateNumbers, right?) > Add the ability to preserve the original term when using WordDelimiterFilter > ---------------------------------------------------------------------------- > > Key: SOLR-14 > URL: http://issues.apache.org/jira/browse/SOLR-14 > Project: Solr > Type: Improvement > Components: search > Reporter: Richard "Trey" Hyde > Attachments: TokenizerFactory.java, WordDelimiterFilter.patch > > When doing prefix searching, you need to hang on to the original term > othewise you'll miss many matches you should be making. > Data: ABC-12345 > WordDelimiterFitler may change this into > ABC 12345 ABC12345 > A user may enter a search such as > ABC\-123* > Which will fail to find a match given the above scenario. > The attached patch will allow the use of the "preserveOriginal" option to > WordDelimiterFilter and will analyse as > ABC 12345 ABC12345 ABC-12345 > in which case we will get a postive match. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira