On 1/15/2013 11:54 AM, Lighton Phiri wrote:
I would like to get a sense of the top terms for fields in my index
and just enable the LukeRequestHandler [1] in my solrconfig.xml file.
However, Luke seems to include stopwords as well.

I've tried searching previous threads but nothing I've come across [2,
3, 4] has helped.

How can I tell Luke not to include stopwords?  Alternatively, what's
the easiest way of getting top terms without stopwords?

If you don't want stopwords in the top terms report, you have to remove them from your index. IMHO, this is not a good idea because you will lose search precision, but using StopFilterFactory in a fieldType analysis chain is very common.

If you were to leave stopwords in your index but tell the tools to not display them, then the top terms list would be lying to you, and it would not be very useful as a troubleshooting tool. Troubleshooting is one of Luke's primary purposes.

To get an idea for which non-stopwords are dominant in your index, just ask for more top terms, instead of just the top ten or top twenty. If you are using a program to parse the information, have your program remove the terms that you don't want to include, then trim the list to the proper size.

Thanks,
Shawn

Reply via email to