Thomas Lange - Sun Germany - ham02 - Hamburg wrote:

> 
> Yes using UTF-8 as charset encoding will show the strings properly since
> the WBSWG6 binary file format uses that encoding for the strings.
> 
> But even though the strings are UTF-8 encoded the dictionary file itself
> is not! It is a binary format. That is you can not expect all characters
> (that is here the non-string parts of that file) to be properly
> displayed or even read!
> And thus it is particularly unlikely that saving such a binary file from
> within a text editor after modifying it will be a good idea.
> 
> To sum it up: As long as you do not save the edited file as personal
> dictionary again I see no problem with your approach.
> 
> 
> Regards,
> Thomas

Thanks Thomas

I have been working with Kelvin Eldridge, the maintainer of the
Australian dictionary to produce a medical dictionary - currently about
5500 words. We have now reached the level of munching the strings with
an affix file using hunspell, so extracting the strings from the binary
file is a very good source of useful words.

Regards

Russell

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to