Currently it is not possible and it is not normal... I do not know if QDBM (which stores file names associated with keywords) can be set to split string like "ziegler-nichols" into "ziegler" and "nichols" automatically for searching or if we need to split strings ourselves.
This question were raised before, regarding filenames with dashes and underscores in them. That time, the reply were that in C-code, dashes and underscores often have a meaning. I think we'll need to be context sensitive in this case, where regular documents and filenames usually require word-splitting, while sourcecode usually don't. (However, in c, the string "difference=alpha-beta" actually have three interesting lexemes and a dash should neither here create the lexeme "alpha-beta".) What I also dislike with libstemmer (which aims to "reduce" strings to
radicals to ignore plural for instance) is that it does not ignore accentuated characters, so if I have a file which contains "éléphant", then "élephant" or "elephant" will not be found. "éléphant" is the correct orthography but it happens very often that french people miss some accents or add superflus ones... and it is the same problem in other languages.
Unfortunately, this is not always applicable. For instance in Swedish, there's a big difference in the words "öst" and "ost", where the meanings is "east" and "cheese", respectively. However, "café" is often spelled "cafe", with the same meaning. I'm not sure at all how to handle this.
_______________________________________________ tracker-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/tracker-list
