Hi Lars, and all,
The current German dictionary maintained by Björn Jacke has 80,000
basic forms which expand to 300,000 variations, for a factor of
3.75. Swedish/Danish/Norwegian have the same way to form basic
words (with compounds) as German. Basic words can often be
translated syllable by syllable, so the number of basic forms
should be about the same. But the Scandinavian languages use
endings instead of the definite article (the/der/die/das),
resulting in a larger number of expanded variations.
If we're into statistics, then the Polish dictionary has something like
3.5 million expanded forms, and about 300.000 base forms. The quality of
the dictionary is excellent.
How was that achieved? Simple, set up a local scrabble-like community
and develop a scrabble dictionary using scrabble players linguistic
competence. It's incredibly efficient.
Then you simply tweak the Scrabble dict to your needs (like removing
rare and confusing forms).
I recommend this kind of technique to all l10 teams and dict developers.
Look at www.kurnik.pl to see how the site is managed, and in
www.kurnik.pl/dictionary there is some info on the dict.
Best regards, and happy holidays,
Marcin
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]