[jira] Commented: (SOLR-1279) ApostropheTokenizer

Robert Muir (JIRA) Tue, 14 Jul 2009 13:57:40 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731125#action_12731125
 ]


Robert Muir commented on SOLR-1279:
-----------------------------------

Sergey, have you looked at SOLR-1266?

By using the new stemEnglishPossessive=0 option, I think you can get the same 
behavior with WordDelimiterFilter, if you use preserveOriginal=1 along with 
catenateWords=1


> ApostropheTokenizer
> -------------------
>
>                 Key: SOLR-1279
>                 URL: https://issues.apache.org/jira/browse/SOLR-1279
>             Project: Solr
>          Issue Type: New Feature
>          Components: Analysis
>            Reporter: Sergey Borisov
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: ApostropheTokenizer.zip
>
>
> ApostropheTokenizer creates extra tokens during the analysis stage for the 
> fields containing apostrophes. The reason for adding this is to ensure that 
> documents that differ only by apostrophe have the same relevancy score. 
> For example, if the document contains string "McDonald's", it will be 
> tokenized as "McDonald's McDonalds". This way when the search is performed 
> against "McDonald's" or "McDonalds" will produce similar score.
> This code handles up to two apostrophes in a token.
> To use this tokenizer add the following line in schema.xml
> <analyzer type="index">
>       <filter class="org.apache.lucene.analysis.ApostropheTokenFactory"/>
> ...
> </analyzer>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1279) ApostropheTokenizer

Reply via email to