I still don't understand your final goal but if you want to get an output in the form of "run(40) => 20 from running, 10 from run, 8 from runners, 2 from runner" you need to index your documents using standard analyzer. Walk through the index using org.apache.lucene.index.IndexReader and stem each term using stemmer. Storing stems (key) and orignal word list (value) in a map will give that kind of output.
However if seeing something like the following list (not exactly you want but similar) on schema.jsp will help you run=>run run=>running run=>runner run=>runners add one line of code newstr = newstr + "=>" + new String(termBuffer, 0, len); to org.apache.solr.analysis.EnglishPorterFilterFactory.java between lines #116 and #117. Rename the file, compile the code, put your jar file to libs directory under your solr home. Now you can use your new FilfterFactory in your schema.xml --- On Sat, 1/24/09, Thushara Wijeratna <thu...@gmail.com> wrote: > From: Thushara Wijeratna <thu...@gmail.com> > Subject: Re: Solr stemming -> preserve original words > To: solr-user@lucene.apache.org, iori...@yahoo.com > Date: Saturday, January 24, 2009, 1:53 AM > Chris, Ahmet - thanks for the responses. > > Ahmet - yes, i want to see "run" as a top term + > the original words that > formed that term > The reason is that due to mis-stemming, the terms could > become non-english. > ex: "permanent" would stem to "perm", > "archive" would become "archiv". > > I need to extract a set of keywords from the indexed > content - I'd like > these to be correct full english words. > > thanks, > thushara