On 1/15/2013 11:54 AM, Lighton Phiri wrote:
I would like to get a sense of the top terms for fields in my index
and just enable the LukeRequestHandler [1] in my solrconfig.xml file.
However, Luke seems to include stopwords as well.
I've tried searching previous threads but nothing I've come across [2,
3, 4] has helped.
How can I tell Luke not to include stopwords? Alternatively, what's
the easiest way of getting top terms without stopwords?
If you don't want stopwords in the top terms report, you have to remove
them from your index. IMHO, this is not a good idea because you will
lose search precision, but using StopFilterFactory in a fieldType
analysis chain is very common.
If you were to leave stopwords in your index but tell the tools to not
display them, then the top terms list would be lying to you, and it
would not be very useful as a troubleshooting tool. Troubleshooting is
one of Luke's primary purposes.
To get an idea for which non-stopwords are dominant in your index, just
ask for more top terms, instead of just the top ten or top twenty. If
you are using a program to parse the information, have your program
remove the terms that you don't want to include, then trim the list to
the proper size.
Thanks,
Shawn