[native-lang] Re: [lingu-dev] Re: [native-lang] Status update season!

2006-12-26 Thread Lars Aronsson
Marcin Miłkowski wrote:

 If we're into statistics, then the Polish dictionary has something like 3.5
 million expanded forms, and about 300.000 base forms. The quality of the
 dictionary is excellent.
 [...]
 I recommend this kind of technique to all l10 teams and dict developers. Look
 at www.kurnik.pl to see how the site is managed, and in
 www.kurnik.pl/dictionary there is some info on the dict.

This is excellent, but we'd have to learn Polish before we 
understand the full details of your method.  Is there any text in 
English (or French or German) that describes how this initiative 
was started and what problems it has met, and how these problems 
were overcome?  That's the kind of guideline that would be useful 
from Peru to Kazakstan, and from Greenland to Malawi.

The Swedish scrabble community has a policy to use the dictionary 
of the Swedish Academy (SAOL), which unfortunately is copyrighted 
and not available for free download.


-- 
  Lars Aronsson ([EMAIL PROTECTED])
  Aronsson Datateknik - http://aronsson.se

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[native-lang] Re: [lingu-dev] Re: [native-lang] Status update season!

2006-12-22 Thread Marcin Miłkowski

Hi Lars, and all,


The current German dictionary maintained by Björn Jacke has 80,000 
basic forms which expand to 300,000 variations, for a factor of 
3.75.  Swedish/Danish/Norwegian have the same way to form basic 
words (with compounds) as German.  Basic words can often be 
translated syllable by syllable, so the number of basic forms 
should be about the same. But the Scandinavian languages use 
endings instead of the definite article (the/der/die/das), 
resulting in a larger number of expanded variations.


If we're into statistics, then the Polish dictionary has something like 
3.5 million expanded forms, and about 300.000 base forms. The quality of 
the dictionary is excellent.


How was that achieved? Simple, set up a local scrabble-like community 
and develop a scrabble dictionary using scrabble players linguistic 
competence. It's incredibly efficient.


Then you simply tweak the Scrabble dict to your needs (like removing 
rare and confusing forms).


I recommend this kind of technique to all l10 teams and dict developers. 
Look at www.kurnik.pl to see how the site is managed, and in 
www.kurnik.pl/dictionary there is some info on the dict.


Best regards, and happy holidays,
Marcin

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]