Thomas Lange - Sun Germany - ham02 - Hamburg wrote: > > Yes using UTF-8 as charset encoding will show the strings properly since > the WBSWG6 binary file format uses that encoding for the strings. > > But even though the strings are UTF-8 encoded the dictionary file itself > is not! It is a binary format. That is you can not expect all characters > (that is here the non-string parts of that file) to be properly > displayed or even read! > And thus it is particularly unlikely that saving such a binary file from > within a text editor after modifying it will be a good idea. > > To sum it up: As long as you do not save the edited file as personal > dictionary again I see no problem with your approach. > > > Regards, > Thomas
Thanks Thomas I have been working with Kelvin Eldridge, the maintainer of the Australian dictionary to produce a medical dictionary - currently about 5500 words. We have now reached the level of munching the strings with an affix file using hunspell, so extracting the strings from the binary file is a very good source of useful words. Regards Russell --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
