RE: stemming filter analyzers, any favorites?

2011-04-21 Thread Em
Hi Robert,

we often ran into the same issue with stemmers. This is why we created more
than one field, each field with different stemmers. It adds some overhead
but worked quite well.

Regarding your off-topic-question:
Look at the debugging-output of your searches. Sometimes you configured your
tools, especially the WDF, wrong and the queryParser creates an unexpected
result which leads to unmatched but still relevant documents.

Please, show us your debugging-output and the field-definition so that we
can provide you some help!

Regards,
Em


Robert Petersen-3 wrote:
 
 I have been doing that, and for Bags example the trailing 's' is not being
 removed by the Kstemmer so if indexing the word bags and searching on bag
 you get no matches.  Why wouldn't the trailing 's' get stemmed off? 
 Kstemmer is dictionary based so bags isn't in the dictionary?   That
 trailing 's' should always be dropped no?  That seems like it would be
 better, we don't want to make synonyms for basic use cases like this.  I
 fear I will have to return to the Porter stemmer.  Are there other better
 ones is my main question.
 
 Off topic secondary question: sometimes I am puzzled by the output of the
 analysis page.  It seems like there should be a match, but I don't get the
 results during a search that I'd expect...  
 
 Like in the case if the WordDelimiterFilterFactory splits up a term into a
 bunch of terms before the K-stemmer is applied, sometimes if the matching
 term is in position two of the final analysis but the searcher had the
 partial term just alone and so thereby in position 1 in the analysis stack
 then when searching there wasn't a match.  Am I reading this correctly? 
 Is that right or should that match and I am misreading my analysis output?  
 
 Thanks!
 
 Robi
 
 PS  I have a category named Bags and am catching flack for it not coming
 up in a search for bag.  hah
 PPS the term is not in protwords.txt
 
 
 com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
 {protected=protwords.txt}
 term position 1
 term text bags
 term type word
 source start,end  0,4
 payload   
 
 
 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com] 
 Sent: Wednesday, April 20, 2011 10:55 AM
 To: solr-user@lucene.apache.org
 Subject: Re: stemming filter analyzers, any favorites?
 
 You can get a better sense of exactly what tranformations occur when
 if you look at the analysis page (be sure to check the verbose
 checkbox).
 
 I'm surprised that bags doesn't match bag, what does the analysis
 page say?
 
 Best
 Erick
 
 On Wed, Apr 20, 2011 at 1:44 PM, Robert Petersen lt;rober...@buy.comgt;
 wrote:
 Stemming filter analyzers... anyone have any favorites for particular
 search domains?  Just wondering what people are using.  I'm using Lucid
 K Stemmer and having issues.   Seems like it misses a lot of common
 stems.  We went to that because of excessively loose matches on the
 solr.PorterStemFilterFactory


 I understand K Stemmer is a dictionary based stemmer.  Seems to me like
 it is missing a lot of common stem reductions.  Ie   Bags does not match
 Bag in our searches.

 Here is my analyzer stack:

                fieldType name=text class=solr.TextField
 positionIncrementGap=100
                        analyzer type=index
                                tokenizer
 class=solr.WhitespaceTokenizerFactory/
                                filter
 class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=true/
                                filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
          filter class=solr.WordDelimiterFilterFactory
                generateWordParts=1
                generateNumberParts=1
                catenateWords=1
                catenateNumbers=1
                catenateAll=1
                preserveOriginal=1
                /                              filter
 class=solr.LowerCaseFilterFactory/
                                
                                filter
 class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
 protected=protwords.txt/
                                filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
                        /analyzer
                        analyzer type=query
                                tokenizer
 class=solr.WhitespaceTokenizerFactory/
                                filter
 class=solr.SynonymFilterFactory synonyms=query_synonyms.txt
 ignoreCase=true expand=true/
                                filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
          filter class=solr.WordDelimiterFilterFactory
                generateWordParts=1
                generateNumberParts=1
                catenateWords=1
                catenateNumbers=1
                catenateAll=1
                preserveOriginal=1
                /                              filter
 class=solr.LowerCaseFilterFactory

RE: stemming filter analyzers, any favorites?

2011-04-21 Thread Robert Petersen
Adding another field with another stemmer and searching both???  Wow never 
thought of doing that.  I guess that doesn't really double the size of your 
index tho because all the terms are almost the same right?  Let me look into 
that.  I'll raise the other issue in a separate thread and thanks.

-Original Message-
From: Em [mailto:mailformailingli...@yahoo.de] 
Sent: Thursday, April 21, 2011 1:55 AM
To: solr-user@lucene.apache.org
Subject: RE: stemming filter analyzers, any favorites?

Hi Robert,

we often ran into the same issue with stemmers. This is why we created more
than one field, each field with different stemmers. It adds some overhead
but worked quite well.

Regarding your off-topic-question:
Look at the debugging-output of your searches. Sometimes you configured your
tools, especially the WDF, wrong and the queryParser creates an unexpected
result which leads to unmatched but still relevant documents.

Please, show us your debugging-output and the field-definition so that we
can provide you some help!

Regards,
Em


Robert Petersen-3 wrote:
 
 I have been doing that, and for Bags example the trailing 's' is not being
 removed by the Kstemmer so if indexing the word bags and searching on bag
 you get no matches.  Why wouldn't the trailing 's' get stemmed off? 
 Kstemmer is dictionary based so bags isn't in the dictionary?   That
 trailing 's' should always be dropped no?  That seems like it would be
 better, we don't want to make synonyms for basic use cases like this.  I
 fear I will have to return to the Porter stemmer.  Are there other better
 ones is my main question.
 
 Off topic secondary question: sometimes I am puzzled by the output of the
 analysis page.  It seems like there should be a match, but I don't get the
 results during a search that I'd expect...  
 
 Like in the case if the WordDelimiterFilterFactory splits up a term into a
 bunch of terms before the K-stemmer is applied, sometimes if the matching
 term is in position two of the final analysis but the searcher had the
 partial term just alone and so thereby in position 1 in the analysis stack
 then when searching there wasn't a match.  Am I reading this correctly? 
 Is that right or should that match and I am misreading my analysis output?  
 
 Thanks!
 
 Robi
 
 PS  I have a category named Bags and am catching flack for it not coming
 up in a search for bag.  hah
 PPS the term is not in protwords.txt
 
 
 com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
 {protected=protwords.txt}
 term position 1
 term text bags
 term type word
 source start,end  0,4
 payload   
 
 
 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com] 
 Sent: Wednesday, April 20, 2011 10:55 AM
 To: solr-user@lucene.apache.org
 Subject: Re: stemming filter analyzers, any favorites?
 
 You can get a better sense of exactly what tranformations occur when
 if you look at the analysis page (be sure to check the verbose
 checkbox).
 
 I'm surprised that bags doesn't match bag, what does the analysis
 page say?
 
 Best
 Erick
 
 On Wed, Apr 20, 2011 at 1:44 PM, Robert Petersen lt;rober...@buy.comgt;
 wrote:
 Stemming filter analyzers... anyone have any favorites for particular
 search domains?  Just wondering what people are using.  I'm using Lucid
 K Stemmer and having issues.   Seems like it misses a lot of common
 stems.  We went to that because of excessively loose matches on the
 solr.PorterStemFilterFactory


 I understand K Stemmer is a dictionary based stemmer.  Seems to me like
 it is missing a lot of common stem reductions.  Ie   Bags does not match
 Bag in our searches.

 Here is my analyzer stack:

                fieldType name=text class=solr.TextField
 positionIncrementGap=100
                        analyzer type=index
                                tokenizer
 class=solr.WhitespaceTokenizerFactory/
                                filter
 class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=true/
                                filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
          filter class=solr.WordDelimiterFilterFactory
                generateWordParts=1
                generateNumberParts=1
                catenateWords=1
                catenateNumbers=1
                catenateAll=1
                preserveOriginal=1
                /                              filter
 class=solr.LowerCaseFilterFactory/
                                
                                filter
 class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
 protected=protwords.txt/
                                filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
                        /analyzer
                        analyzer type=query
                                tokenizer
 class=solr.WhitespaceTokenizerFactory/
                                filter
 class=solr.SynonymFilterFactory synonyms

RE: stemming filter analyzers, any favorites?

2011-04-21 Thread Em
As far as I know Lucene does not store an inverted index per field, so no, it
would not double the size of the index.

However, it could influence the score a little bit.

For example: If both stemmers reduce schools to school and you are
searching for all schools in america the term school has more weight to
the resulting score, since it definitly occurs in two fields which consist
of nearly the same value.

To reduce this effect you could write your own queryParser which creates a
disjunctionMaxQuery consisting of two boolean queries and a tie-break of 0 -
so only the better scoring stemmed-field contributes to the total score of
your document.

Regards,
Em


Robert Petersen-3 wrote:
 
 Adding another field with another stemmer and searching both???  Wow never
 thought of doing that.  I guess that doesn't really double the size of
 your index tho because all the terms are almost the same right?  Let me
 look into that.  I'll raise the other issue in a separate thread and
 thanks.
 
 -Original Message-
 From: Em [mailto:mailformailingli...@yahoo.de] 
 Sent: Thursday, April 21, 2011 1:55 AM
 To: solr-user@lucene.apache.org
 Subject: RE: stemming filter analyzers, any favorites?
 
 Hi Robert,
 
 we often ran into the same issue with stemmers. This is why we created
 more
 than one field, each field with different stemmers. It adds some overhead
 but worked quite well.
 
 Regarding your off-topic-question:
 Look at the debugging-output of your searches. Sometimes you configured
 your
 tools, especially the WDF, wrong and the queryParser creates an unexpected
 result which leads to unmatched but still relevant documents.
 
 Please, show us your debugging-output and the field-definition so that we
 can provide you some help!
 
 Regards,
 Em
 
 
 Robert Petersen-3 wrote:
 
 I have been doing that, and for Bags example the trailing 's' is not
 being
 removed by the Kstemmer so if indexing the word bags and searching on bag
 you get no matches.  Why wouldn't the trailing 's' get stemmed off? 
 Kstemmer is dictionary based so bags isn't in the dictionary?   That
 trailing 's' should always be dropped no?  That seems like it would be
 better, we don't want to make synonyms for basic use cases like this.  I
 fear I will have to return to the Porter stemmer.  Are there other better
 ones is my main question.
 
 Off topic secondary question: sometimes I am puzzled by the output of the
 analysis page.  It seems like there should be a match, but I don't get
 the
 results during a search that I'd expect...  
 
 Like in the case if the WordDelimiterFilterFactory splits up a term into
 a
 bunch of terms before the K-stemmer is applied, sometimes if the matching
 term is in position two of the final analysis but the searcher had the
 partial term just alone and so thereby in position 1 in the analysis
 stack
 then when searching there wasn't a match.  Am I reading this correctly? 
 Is that right or should that match and I am misreading my analysis
 output?  
 
 Thanks!
 
 Robi
 
 PS  I have a category named Bags and am catching flack for it not coming
 up in a search for bag.  hah
 PPS the term is not in protwords.txt
 
 
 com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
 {protected=protwords.txt}
 term position1
 term textbags
 term typeword
 source start,end 0,4
 payload  
 
 
 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com] 
 Sent: Wednesday, April 20, 2011 10:55 AM
 To: solr-user@lucene.apache.org
 Subject: Re: stemming filter analyzers, any favorites?
 
 You can get a better sense of exactly what tranformations occur when
 if you look at the analysis page (be sure to check the verbose
 checkbox).
 
 I'm surprised that bags doesn't match bag, what does the analysis
 page say?
 
 Best
 Erick
 
 On Wed, Apr 20, 2011 at 1:44 PM, Robert Petersen lt;rober...@buy.comgt;
 wrote:
 Stemming filter analyzers... anyone have any favorites for particular
 search domains?  Just wondering what people are using.  I'm using Lucid
 K Stemmer and having issues.   Seems like it misses a lot of common
 stems.  We went to that because of excessively loose matches on the
 solr.PorterStemFilterFactory


 I understand K Stemmer is a dictionary based stemmer.  Seems to me like
 it is missing a lot of common stem reductions.  Ie   Bags does not match
 Bag in our searches.

 Here is my analyzer stack:

                fieldType name=text class=solr.TextField
 positionIncrementGap=100
                        analyzer type=index
                                tokenizer
 class=solr.WhitespaceTokenizerFactory/
                                filter
 class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=true/
                                filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
          filter class=solr.WordDelimiterFilterFactory
                generateWordParts=1
                generateNumberParts=1

RE: stemming filter analyzers, any favorites?

2011-04-21 Thread Robert Petersen
Nice!  Thanks!

-Original Message-
From: Em [mailto:mailformailingli...@yahoo.de] 
Sent: Thursday, April 21, 2011 9:23 AM
To: solr-user@lucene.apache.org
Subject: RE: stemming filter analyzers, any favorites?

As far as I know Lucene does not store an inverted index per field, so no, it
would not double the size of the index.

However, it could influence the score a little bit.

For example: If both stemmers reduce schools to school and you are
searching for all schools in america the term school has more weight to
the resulting score, since it definitly occurs in two fields which consist
of nearly the same value.

To reduce this effect you could write your own queryParser which creates a
disjunctionMaxQuery consisting of two boolean queries and a tie-break of 0 -
so only the better scoring stemmed-field contributes to the total score of
your document.

Regards,
Em


Robert Petersen-3 wrote:
 
 Adding another field with another stemmer and searching both???  Wow never
 thought of doing that.  I guess that doesn't really double the size of
 your index tho because all the terms are almost the same right?  Let me
 look into that.  I'll raise the other issue in a separate thread and
 thanks.
 
 -Original Message-
 From: Em [mailto:mailformailingli...@yahoo.de] 
 Sent: Thursday, April 21, 2011 1:55 AM
 To: solr-user@lucene.apache.org
 Subject: RE: stemming filter analyzers, any favorites?
 
 Hi Robert,
 
 we often ran into the same issue with stemmers. This is why we created
 more
 than one field, each field with different stemmers. It adds some overhead
 but worked quite well.
 
 Regarding your off-topic-question:
 Look at the debugging-output of your searches. Sometimes you configured
 your
 tools, especially the WDF, wrong and the queryParser creates an unexpected
 result which leads to unmatched but still relevant documents.
 
 Please, show us your debugging-output and the field-definition so that we
 can provide you some help!
 
 Regards,
 Em
 
 
 Robert Petersen-3 wrote:
 
 I have been doing that, and for Bags example the trailing 's' is not
 being
 removed by the Kstemmer so if indexing the word bags and searching on bag
 you get no matches.  Why wouldn't the trailing 's' get stemmed off? 
 Kstemmer is dictionary based so bags isn't in the dictionary?   That
 trailing 's' should always be dropped no?  That seems like it would be
 better, we don't want to make synonyms for basic use cases like this.  I
 fear I will have to return to the Porter stemmer.  Are there other better
 ones is my main question.
 
 Off topic secondary question: sometimes I am puzzled by the output of the
 analysis page.  It seems like there should be a match, but I don't get
 the
 results during a search that I'd expect...  
 
 Like in the case if the WordDelimiterFilterFactory splits up a term into
 a
 bunch of terms before the K-stemmer is applied, sometimes if the matching
 term is in position two of the final analysis but the searcher had the
 partial term just alone and so thereby in position 1 in the analysis
 stack
 then when searching there wasn't a match.  Am I reading this correctly? 
 Is that right or should that match and I am misreading my analysis
 output?  
 
 Thanks!
 
 Robi
 
 PS  I have a category named Bags and am catching flack for it not coming
 up in a search for bag.  hah
 PPS the term is not in protwords.txt
 
 
 com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
 {protected=protwords.txt}
 term position1
 term textbags
 term typeword
 source start,end 0,4
 payload  
 
 
 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com] 
 Sent: Wednesday, April 20, 2011 10:55 AM
 To: solr-user@lucene.apache.org
 Subject: Re: stemming filter analyzers, any favorites?
 
 You can get a better sense of exactly what tranformations occur when
 if you look at the analysis page (be sure to check the verbose
 checkbox).
 
 I'm surprised that bags doesn't match bag, what does the analysis
 page say?
 
 Best
 Erick
 
 On Wed, Apr 20, 2011 at 1:44 PM, Robert Petersen lt;rober...@buy.comgt;
 wrote:
 Stemming filter analyzers... anyone have any favorites for particular
 search domains?  Just wondering what people are using.  I'm using Lucid
 K Stemmer and having issues.   Seems like it misses a lot of common
 stems.  We went to that because of excessively loose matches on the
 solr.PorterStemFilterFactory


 I understand K Stemmer is a dictionary based stemmer.  Seems to me like
 it is missing a lot of common stem reductions.  Ie   Bags does not match
 Bag in our searches.

 Here is my analyzer stack:

                fieldType name=text class=solr.TextField
 positionIncrementGap=100
                        analyzer type=index
                                tokenizer
 class=solr.WhitespaceTokenizerFactory/
                                filter
 class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=true

Re: stemming filter analyzers, any favorites?

2011-04-20 Thread Erick Erickson
You can get a better sense of exactly what tranformations occur when
if you look at the analysis page (be sure to check the verbose
checkbox).

I'm surprised that bags doesn't match bag, what does the analysis
page say?

Best
Erick

On Wed, Apr 20, 2011 at 1:44 PM, Robert Petersen rober...@buy.com wrote:
 Stemming filter analyzers... anyone have any favorites for particular
 search domains?  Just wondering what people are using.  I'm using Lucid
 K Stemmer and having issues.   Seems like it misses a lot of common
 stems.  We went to that because of excessively loose matches on the
 solr.PorterStemFilterFactory


 I understand K Stemmer is a dictionary based stemmer.  Seems to me like
 it is missing a lot of common stem reductions.  Ie   Bags does not match
 Bag in our searches.

 Here is my analyzer stack:

                fieldType name=text class=solr.TextField
 positionIncrementGap=100
                        analyzer type=index
                                tokenizer
 class=solr.WhitespaceTokenizerFactory/
                                filter
 class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=true/
                                filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
          filter class=solr.WordDelimiterFilterFactory
                generateWordParts=1
                generateNumberParts=1
                catenateWords=1
                catenateNumbers=1
                catenateAll=1
                preserveOriginal=1
                /                              filter
 class=solr.LowerCaseFilterFactory/
                                !-- The LucidKStemmer currently
 requires a lowercase filter somewhere before it. --
                                filter
 class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
 protected=protwords.txt/
                                filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
                        /analyzer
                        analyzer type=query
                                tokenizer
 class=solr.WhitespaceTokenizerFactory/
                                filter
 class=solr.SynonymFilterFactory synonyms=query_synonyms.txt
 ignoreCase=true expand=true/
                                filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
          filter class=solr.WordDelimiterFilterFactory
                generateWordParts=1
                generateNumberParts=1
                catenateWords=1
                catenateNumbers=1
                catenateAll=1
                preserveOriginal=1
                /                              filter
 class=solr.LowerCaseFilterFactory/
                                !-- The LucidKStemmer currently
 requires a lowercase filter somewhere before it. --
                                filter
 class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
 protected=protwords.txt/
                                filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
                        /analyzer
                /fieldType



RE: stemming filter analyzers, any favorites?

2011-04-20 Thread Robert Petersen
I have been doing that, and for Bags example the trailing 's' is not being 
removed by the Kstemmer so if indexing the word bags and searching on bag you 
get no matches.  Why wouldn't the trailing 's' get stemmed off?  Kstemmer is 
dictionary based so bags isn't in the dictionary?   That trailing 's' should 
always be dropped no?  That seems like it would be better, we don't want to 
make synonyms for basic use cases like this.  I fear I will have to return to 
the Porter stemmer.  Are there other better ones is my main question.

Off topic secondary question: sometimes I am puzzled by the output of the 
analysis page.  It seems like there should be a match, but I don't get the 
results during a search that I'd expect...  

Like in the case if the WordDelimiterFilterFactory splits up a term into a 
bunch of terms before the K-stemmer is applied, sometimes if the matching term 
is in position two of the final analysis but the searcher had the partial term 
just alone and so thereby in position 1 in the analysis stack then when 
searching there wasn't a match.  Am I reading this correctly?  Is that right or 
should that match and I am misreading my analysis output?  

Thanks!

Robi

PS  I have a category named Bags and am catching flack for it not coming up in 
a search for bag.  hah
PPS the term is not in protwords.txt


com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory 
{protected=protwords.txt}
term position   1
term text   bags
term type   word
source start,end0,4
payload 


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, April 20, 2011 10:55 AM
To: solr-user@lucene.apache.org
Subject: Re: stemming filter analyzers, any favorites?

You can get a better sense of exactly what tranformations occur when
if you look at the analysis page (be sure to check the verbose
checkbox).

I'm surprised that bags doesn't match bag, what does the analysis
page say?

Best
Erick

On Wed, Apr 20, 2011 at 1:44 PM, Robert Petersen rober...@buy.com wrote:
 Stemming filter analyzers... anyone have any favorites for particular
 search domains?  Just wondering what people are using.  I'm using Lucid
 K Stemmer and having issues.   Seems like it misses a lot of common
 stems.  We went to that because of excessively loose matches on the
 solr.PorterStemFilterFactory


 I understand K Stemmer is a dictionary based stemmer.  Seems to me like
 it is missing a lot of common stem reductions.  Ie   Bags does not match
 Bag in our searches.

 Here is my analyzer stack:

                fieldType name=text class=solr.TextField
 positionIncrementGap=100
                        analyzer type=index
                                tokenizer
 class=solr.WhitespaceTokenizerFactory/
                                filter
 class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=true/
                                filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
          filter class=solr.WordDelimiterFilterFactory
                generateWordParts=1
                generateNumberParts=1
                catenateWords=1
                catenateNumbers=1
                catenateAll=1
                preserveOriginal=1
                /                              filter
 class=solr.LowerCaseFilterFactory/
                                !-- The LucidKStemmer currently
 requires a lowercase filter somewhere before it. --
                                filter
 class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
 protected=protwords.txt/
                                filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
                        /analyzer
                        analyzer type=query
                                tokenizer
 class=solr.WhitespaceTokenizerFactory/
                                filter
 class=solr.SynonymFilterFactory synonyms=query_synonyms.txt
 ignoreCase=true expand=true/
                                filter class=solr.StopFilterFactory
 ignoreCase=true words=stopwords.txt/
          filter class=solr.WordDelimiterFilterFactory
                generateWordParts=1
                generateNumberParts=1
                catenateWords=1
                catenateNumbers=1
                catenateAll=1
                preserveOriginal=1
                /                              filter
 class=solr.LowerCaseFilterFactory/
                                !-- The LucidKStemmer currently
 requires a lowercase filter somewhere before it. --
                                filter
 class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
 protected=protwords.txt/
                                filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
                        /analyzer
                /fieldType