On Fri, 2011-12-30 at 14:43 -0500, webmaster for Kracked Press Productions wrote: > I need some "guidance" with the extent of my dictionary files for > LibreOffice and OOo. > > My largest dictionaries are about 638,000 words in the spelling word > .dic file. I need to know how large it too large. > > I found out this morning that if I compare that word list with a > combined list for chemical and medical words, over 98,000 words from > that combined list is not in the current .dic word list[s]. > > Now here it the issue, how far should I take this project? > > I am going to add all the "missing" words that are part of the > open-source community's lexicon that are not in the current lists, but > where do I stop, and how should I format the "finalized" files? > > Should there be one super large list, or should I break it up into > sub-lists? Should the "standard" words go into one .dic file, while > medical, chemistry, and computer/tech words each have their own .dic > file within the .oxt file? > > Right now, there is an English dictionary [default one?] that includes > US, British, Canadian, and some other versions of English put together > as one .oxt file, but separate .dic files. I was wondering if that > would be the route I should go with my super-size dictionaries. > > To be honest, 20 years ago the spelling dictionary project I was working > on has about 177,000 words and I was told that the English language was > about 250,000 words. Now I have looked at a combined word list and it > has about 737K words in it and there are more words/terms still needing > to be checked. The largest book style dictionary now has 25+ volumes to > it when it was only 15 about 15-20 years ago. So I really think the > final super-sized dictionary word list could one day go over one million > in the next year or two. I just have to figure out if it is worth > building a list for LO to that size. > > Your input would help me make the best US, British, and Canadian English > dictionaries out there for LibreOffice. This is for our users to use, > so it would be nice for users to let me know what they think. Something to remember: the main dictionary for the language used by LO is a binary file kept in the Installation folder. If a language pack is added, this language is also binary and kept in the same place. These are large files. User created dictionary files (.dic) are kept in the personal settings folder. These are text files. Some time ago, someone asked about dictionary file sizes referring to the user created .dic files. The reply was 22K or less per file seemed like a good number. It was mentioned that OOo would not use a dictionary file if it was too large. The dictionary files .dic) are text documents with the first four lines very important as far as content is concerned. Below is the first four lines for an English user created .dic file followed by a German user created .dic file.
OOoUserDict1 OOoUserDict1 lang: en-US OR lang: de-DE type: positive type: positive --- --- It appears like the second line is the one that has to be changed from language to language. the letters before the hyphen are the language (en,English; de, Deutch) and the letters afterward are the country (US, USA; GB, Great Britain; etc.) But with the number of entries you have, you need to find some way to make a binary file that LO can read as a .dic file. From what I remember about the creation of the Austrialian dictionary, it is very time consuming to create the binary files. --Dan -- For unsubscribe instructions e-mail to: users+h...@global.libreoffice.org Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.libreoffice.org/global/users/ All messages sent to this list will be publicly archived and cannot be deleted