> This example still doesn't seem very convincing --- why would you not > merely attach the stopword list to the pl_ispell dictionary?
Because the ispell-based dictionaries first stem the lexeme and then search for it in the stopwords file. The situation here is that a stopword is first stemmed to produce another lexeme (which is not in the stopwords file, as it's a perfectly valid word) and then gets indexed, instead of being discarded. To restate: the word 'od' in Polish is both a preposition and a declined form of the noun 'oda'. The ispell dictionary when passed the lexeme 'od' first stems it to produce 'oda' and then fails to find it in the stopwords file. If I'd include the word 'oda' in the stopwords file, I'd be losing information about the noun 'oda' appearing in documents. I'm still trying to find an English example, as I'm sure it would be easier to understand by most readers of this list. Nothing comes to my mind, however - I guess some languages just have rotten luck with their grammar. > If there is a use-case for it, IMHO it'd be better to add a boolean > accept-or-pass-on parameter to the "simple" dictionary than to add a > whole new dictionary type. Ah, I never thought of it. You may be very right - it does look like an easier solution. However, it would require coding some basic parsing logic into the dex_init procedure, because right now the 'simple' dictionary expects dict_initoption to be a path to the stopwords file. Do you mean something like 'StopFile="/path/to/stopwords", AcceptUnknown=0'" ? Regards, Jan Urbanski -- Jan Urbanski GPG key ID: E583D7D2 ouden estin
signature.asc
Description: OpenPGP digital signature