Je, I also think that!.
We have some serious gaps on what you explain to me.
First, you point me that there's no real need to use ShingleFilter, I tried
with all Tokenizer and the result is the same, the species are not caught.
On the simplest scenario I've got this:
<fieldType name="genus_type" class="solr.TextField"
positionIncrementGap="0">
<analyzer type="index">
<tokenizer class=""/> PUT YOUR FAVORITE TOKENIZER HERE
<filter class="solr.KeepWordFilterFactory" words="species.txt"
ignoreCase="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
</analyzer>
</fieldType>
And testing on Analysis tab, wouldn't catch any tag with blank space, like
"acacia acicularis". Am I missing something?
Then, by using ShingleFilter, tags with blank space are caught correctly.
But you said you're having no trouble applying multiple successive keepword
filters. So, I just use 2 KWF files as I depict:
<fieldType name="genus_type" class="solr.TextField"
positionIncrementGap="0">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="3"
outputUnigrams="true"/>
<filter class="solr.KeepWordFilterFactory" words="species.txt"
ignoreCase="true"/>
<filter class="solr.KeepWordFilterFactory" words="genus.txt"
ignoreCase="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
On species file there's only one line, that is "hey you"
on genus file, there's also one line, which is "hey"
Catching nothing at all for the second KWF
<http://lucene.472066.n3.nabble.com/file/n4347541/1.png>
Well, I have to say I'm so confused with this behaviour, have I forgot
something?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Copy-field-a-source-of-copy-field-tp4346425p4347541.html
Sent from the Solr - User mailing list archive at Nabble.com.