We have a search system based on Solr using the Solrnet library in C# which
supports some advanced search features like Fuzzy, Synonym and Stemming.
While all of these work, *the expectation from the Stemming Search seems to
be a combination of Stemming by reduction as well as stemming by expansion
to cover grammatical variations on a word*. A use case will make it more
clear:

 - a search for fish would also find fishing
 - a search for applied would also find applying, applies, and apply

We had implemented Stemming using a CopyField with
SnowballPorterFilterFactory. *As a result, when /searching for burning the
results are returning for burning and burn/ but when /searching for burn the
results are not returning for burning or burnt or burns/*

Since all stemmers supported Lucene/Solr all use stemming by reduction, we
are not sure on how to go about this. As per the Solr Wiki: 

> A related technology to stemming is lemmatization, which allows for
> "stemming" by expansion, taking a root word and 'expanding' it to all of
> its various forms. Lemmatization can be used either at insertion time or
> at query time. Lucene/Solr does not have built-in support for
> lemmatization but it can be simulated by using your own dictionaries and
> the SynonymFilterFactory

We are not sure of exactly how to go about this in Solr. Any ideas.

We were also thinking in terms of using some C# based stemmer/lemmatizer
library to get the root of the word and using some public database like
WordNet to extract the different grammatical variations of the stem and then
send across all these terms for querying in Solr. We have not yet done a lot
of research to figure out a stable C# stemmer/lemmatizer and a WordNet C#
API, but seems like this will get too convoluted and it should have a way to
be executed from within Solr.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-query-in-Solr-tp4073862.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to