On 19.06.2012 05:07, Ariel Constenla-Haile wrote:
Hi there,

there have been some reports of users complaining that the Thesaurus
does not work.

The root of the issue is in the dictionary extensions we are shipping:
two of them collide due to lack of uniqueness in the configuration node
name, namely dict-en.oxt (the generic EN dictionary) and
dict-en-nz-2008-12-03.oxt. The conflict happens on the Thesaurus node:

* dict-en.oxt:

<node oor:name="ThesDic_en-US" oor:op="fuse">
     <prop oor:name="Locations" oor:type="oor:string-list">
         <value>%origin%/th_en_US_v2.dat</value>
     </prop>
     <prop oor:name="Format" oor:type="xs:string">
         <value>DICT_THES</value>
     </prop>
     <prop oor:name="Locales" oor:type="oor:string-list">
         <value>en-GB en-US en-ZA en-AU en-CA</value>
     </prop>
</node>

* dict-en-nz-2008-12-03.oxt:

<node oor:name="ThesDic_en-US" oor:op="fuse">
     <prop oor:name="Locations" oor:type="oor:string-list">
         <value>%origin%/th_en_US_v2.dat</value>
     </prop>
     <prop oor:name="Format" oor:type="xs:string">
         <value>DICT_THES</value>
     </prop>
     <prop oor:name="Locales" oor:type="oor:string-list">
         <value>en-NZ</value>
     </prop>
</node>

As you see, they have the same name, "ThesDic_en-US", despite the fact
that the official documentation states clearly that dictionary extension
developers should use a unique node name, see
http://wiki.services.openoffice.org/wiki/Extension_Dictionaries#Dictionary_entries_.28must_be_provided.29
specially "About node names for the dictionaries".

The thesaurus file in dict-en-au-2008-12-15 did rename the thesaurus file to th_en_AU_v2.dat. That avoids the conflict but still wastes 18MB of disk space.


I didn't research what the fuse operation is *supposed* to do there
(it's applied to the node, not to the properties), but the documentation
is clear in stating that the node name must be unique. And the result is
that the properties are not fused but replaced, having as effect that
the en-NZ dictionary installed disables the thesaurus for en-US.

As this bug has its root in the dictionary extensions, the only thing we
can do to fix it is just provide only one extension, in this case
dict-en.oxt.

Dropping the other english dictionaries is a good idea for other reasons, too. Issue 119272 (https://issues.apache.org/ooo/show_bug.cgi?id=119272) describes the problem of all dictionaries using more than 160MB, most of this are the large thesaurus files. Including only one english dictionary would reduce this number considerably. Besides, it contains support for most variants of English anyway.

-Andre



Note that I only discovered this bug in the English dictionary
extensions, I didn't check other languages, but we should do so in the
cases where we're providing more than one dictionary extension.


Regards


Reply via email to