Martin Grotzke schrieb:
Try the SnowballPorterFilterFactory with German2 as language attribute first and use synonyms for combined words i.e. "Herrenhose" => "Herren", "Hose".
so you use a combined approach?
Yes, we define the relevant parts of compounded words (keywords only) as synonyms and feed them in a special field that is used for searching and for the product index. I hope there will be a filter that can split compounded word sometimes in the future...
By using stemming you will maybe have some "interesting" results, but it is much better living with them than having no or much less results ;o)
Do you have an example what "interesting" results I can expect, just to
get an idea?
Find more infos on the Snowball stemming algorithms here:

http://snowball.tartarus.org/
Thanx! I also had a look at this site already, but what is missing is a
demo where one can see what's happening. I think I'll play a little with
stemming to get a feeling for this.
I think the Snowball stemmer is very good so I have no practical example for you. Maybe this is of value to see what happens:

http://snowball.tartarus.org/algorithms/german/diffs.txt

If you have mixed languages in your content, which sometimes happens in product data, you might get into some trouble.

Also have a look at the StopFilterFactory, here is a sample stopwordlist for the german language:

http://snowball.tartarus.org/algorithms/german/stop.txt
Our application handles products, do you think such stopwords are useful
in this scenario also? I wouldn't expect a user to search for "keine
hose" or s.th. like this :)
I have seen much worse queries, so you never know ;o)

think of a query like this: "Hose in blau für Herren"

You will definetly want to remove "in" and "für" during searching and it reduces index size when removed during indexing. Maybe you will even get better scores when only relevant terms are used. You should optimze the stopword list based on your data.

Regards,

Tom

Reply via email to