Martin Grotzke schrieb:
Try the SnowballPorterFilterFactory with German2 as language attribute
first and use synonyms for combined words i.e. "Herrenhose" => "Herren",
"Hose".
so you use a combined approach?
Yes, we define the relevant parts of compounded words (keywords only) as
synonyms and feed them in a special field that is used for searching and
for the product index. I hope there will be a filter that can split
compounded word sometimes in the future...
By using stemming you will maybe have some "interesting" results, but it
is much better living with them than having no or much less results ;o)
Do you have an example what "interesting" results I can expect, just to
get an idea?
Find more infos on the Snowball stemming algorithms here:
http://snowball.tartarus.org/
Thanx! I also had a look at this site already, but what is missing is a
demo where one can see what's happening. I think I'll play a little with
stemming to get a feeling for this.
I think the Snowball stemmer is very good so I have no practical example
for you. Maybe this is of value to see what happens:
http://snowball.tartarus.org/algorithms/german/diffs.txt
If you have mixed languages in your content, which sometimes happens in
product data, you might get into some trouble.
Also have a look at the StopFilterFactory, here is a sample stopwordlist
for the german language:
http://snowball.tartarus.org/algorithms/german/stop.txt
Our application handles products, do you think such stopwords are useful
in this scenario also? I wouldn't expect a user to search for "keine
hose" or s.th. like this :)
I have seen much worse queries, so you never know ;o)
think of a query like this: "Hose in blau für Herren"
You will definetly want to remove "in" and "für" during searching and it
reduces index size when removed during indexing. Maybe you will even get
better scores when only relevant terms are used. You should optimze the
stopword list based on your data.
Regards,
Tom