For my personal use, I would like to acquire electronic dictionaries, 
principally for the major European languages, with the following 
characteristics:

- reputable source

- "raw" datafiles accessible - I appreciate the interfaces that 
dictionary vendors may provide, but I want to be able to write my own 
code to find the data I am looking for

- the wordlist is the principal aspect; I can live without definitions.

- "markup" about the structure of words, for things like hyphenation, 
etc. (or from which hyphenation can be derived)

- some form of frequency count would be nice

For example, I'd like to compute something like: "the average French 
character occupies x bytes in UTF-8", with average defined in sync with 
the frequency count. And I'd like to compute things like spelling 
changes introduced by hyphenation in Dutch.

Any pointers?

Thanks,
Eric.



Reply via email to