Re: [users] Format of .dic Files

Thomas Lange - Sun Germany - ham02 - Hamburg Thu, 05 Jul 2007 02:03:47 -0700


Hi again,


I forgot to mention that the tagged file format uses also UTF-8
encoding. Thus you'll need a UTF-8 capable text editor to properly view
and edit those files.

Also just in case:
The string following the language tag refers to ISO locale of the
language the dictionary is to be uded with.
E.g.  en-US would be English (USA) and de-CH would be German (Swiss)...
And the line
lang: <none>
will be used for dictionaries that are to be used for all languages.

Please be aware that in this file format spaces do matter!
Have the wrong number of spaces, use tabs, or add additional spaces at
the end and it may not work.


Thomas




> Hi all!
> 
>> OS: Ubuntu 7.04, OOo: 2.2.0-1ubuntu-3
>> 
>> This happened with a Project Gutenberg Distributed Proofreaders file.
>> The text file opened with 'Western Europe (ISO-8859-1)' Character set is
>> giving proper accent marks and diacritical marks. While spell-checking I
>> added those words to a special personal dictionary 'dd.dic'. Now, when I
>> am opening this 'dd.dic' from '/home/dd/.openoffice.org2/user/wordbook/'
>> I am getting a mixed up and mangled text: some of the accents and marks
>> are showing, that too in a mixed up way, and some are not showing at
>> all. Some of the words cannot be even recognized. I tried all the
>> Character Sets, obviously starting from the 'Western Europe
>> (ISO-8859-1)'. 
>> 
>> If anyone needs I can send the relevant files or the screen-shots.
>> 
>> Can anyone help about the file format of the '.dic' files?
> 
> If you actually mean those files that you can find in user/wordbook AND
> edit via "Tools/Options/Language Settings/Writing Aids" it is somewhat
> complicated since it is actually binary format.
> The strings for the words itself though are UTF-8 encoded.
> 
> Since the format also needs to takes care of several different file
> format versions over the past 12+ years the best complete documentation
> would be the code. There are more than 5 different file format versions
> support by now.
> 
> The latest change was a patch provided by Michael Meeks to allow for a
> tagged version of the file format to be read.
> It is probably the best to use for you.
> Have a look in the respective issue:
>   http://www.openoffice.org/issues/show_bug.cgi?id=60698
> There are also sample dictionaries attached. Download them and see how
> the tagged file format looks like.
> 
> An existing dictionary is usually written in the very same file format
> version it is found when loading it. Thus you should be fine with
> creating a tagged version and later on editing it via the UI.
> 
> If you need to know more details you have to look at the source code
> http://sw.openoffice.org/source/browse/*checkout*/sw/linguistic/source/dicimp.cxx?rev=1.22
> Look for the DictionaryNeo::loadEntries function. The code for the
> tagged file format is the one where "nDicVersion" is set to 7.
> 
> 
> Please note though that these .dic files are completely different from
> the files the HunSpell implementation uses!
> 
> 
> Regards,
> Thomas
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [users] Format of .dic Files

Reply via email to