Hi all!

> OS: Ubuntu 7.04, OOo: 2.2.0-1ubuntu-3
> 
> This happened with a Project Gutenberg Distributed Proofreaders file.
> The text file opened with 'Western Europe (ISO-8859-1)' Character set is
> giving proper accent marks and diacritical marks. While spell-checking I
> added those words to a special personal dictionary 'dd.dic'. Now, when I
> am opening this 'dd.dic' from '/home/dd/.openoffice.org2/user/wordbook/'
> I am getting a mixed up and mangled text: some of the accents and marks
> are showing, that too in a mixed up way, and some are not showing at
> all. Some of the words cannot be even recognized. I tried all the
> Character Sets, obviously starting from the 'Western Europe
> (ISO-8859-1)'. 
> 
> If anyone needs I can send the relevant files or the screen-shots.
> 
> Can anyone help about the file format of the '.dic' files?

If you actually mean those files that you can find in user/wordbook AND
edit via "Tools/Options/Language Settings/Writing Aids" it is somewhat
complicated since it is actually binary format.
The strings for the words itself though are UTF-8 encoded.

Since the format also needs to takes care of several different file
format versions over the past 12+ years the best complete documentation
would be the code. There are more than 5 different file format versions
support by now.

The latest change was a patch provided by Michael Meeks to allow for a
tagged version of the file format to be read.
It is probably the best to use for you.
Have a look in the respective issue:
  http://www.openoffice.org/issues/show_bug.cgi?id=60698
There are also sample dictionaries attached. Download them and see how
the tagged file format looks like.

An existing dictionary is usually written in the very same file format
version it is found when loading it. Thus you should be fine with
creating a tagged version and later on editing it via the UI.

If you need to know more details you have to look at the source code
http://sw.openoffice.org/source/browse/*checkout*/sw/linguistic/source/dicimp.cxx?rev=1.22
Look for the DictionaryNeo::loadEntries function. The code for the
tagged file format is the one where "nDicVersion" is set to 7.


Please note though that these .dic files are completely different from
the files the HunSpell implementation uses!


Regards,
Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to