[ 
https://issues.apache.org/jira/browse/SOLR-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527795
 ] 

Hoss Man commented on SOLR-319:
-------------------------------

I haven't thought it out all hte way, but it should be possible.  we only have 
to remember the name of the fieldtype in SynonymFilterFactory.init ... then in 
the create method we can call schema.getFieldTypes().get(fieldtypename).

Hmmm... except we probably don't have any access to the schema at that point do 
we?

Hmmm....  i'm not sure what the best way to do this would be.  we could just go 
get the schema from the SolrCore -- except we're moving away from it being a 
singleton and we dn't have direct access to it either.

anyone have any other suggestions?
 

> changes SynonymFilterFactoryto "Analyze" synonyms file
> ------------------------------------------------------
>
>                 Key: SOLR-319
>                 URL: https://issues.apache.org/jira/browse/SOLR-319
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Koji Sekiguchi
>            Priority: Minor
>         Attachments: SOLR-319.patch
>
>
> WHAT:
> Currently, SynonymFilterFactory works very well with N-gram tokenizer 
> (CJKTokenizer, for example).
> But we have to take care of the statement in synonyms.txt.
> For example, if I use CJKTokenizer (work as bi-gram for CJK chars) and want 
> C1C2C3 maps to C4C5C6,
> I have to write the rule as follows:
> C1C2 C2C3 => C4C5 C5C6
> But I want to write it "C1C2C3=>C4C5C6". This patch allows it. It is also 
> helpful for sharing synonyms.txt.
> HOW:
> tokenFactory attribute is added to <filter 
> class="solr.SynonymFilterFactory"/>.
> If the attribute is specified, SynonymFilterFactory uses the TokenizerFactory 
> to create Tokenizer.
> Then SynonymFilterFactory uses the Tokenizer to get tokens from the rules in 
> synonyms.txt file.
> sample-1: CJKTokenizer
>     <fieldtype name="text_cjk" class="solr.TextField" 
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.CJKTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" 
> synonyms="ngram_synonym_test_ja.txt"
>                       ignoreCase="true" expand="true" 
> tokenFactory="solr.CJKTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.CJKTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldtype>
> sample-2: NGramTokenizer
>     <fieldtype name="text_ngram" class="solr.TextField" 
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.NGramTokenizerFactory" minGramSize="2" 
> maxGramSize="2"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.NGramTokenizerFactory" minGramSize="2" 
> maxGramSize="2"/>
>         <filter class="solr.SynonymFilterFactory" 
> synonyms="ngram_synonym_test_ngram.txt"
>                       ignoreCase="true" expand="true"
>                       tokenFactory="solr.NGramTokenizerFactory" 
> minGramSize="2" maxGramSize="2"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldtype>
> backward compatibility:
> Yes. If you omit tokenFactory attribute from <filter 
> class="solr.SynonymFilterFactory"/> tag, it works as usual.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to