There are 797866 lines in the .dic file with the top one the number of words. The rest of the lines are one word each. The .dic file treats each line, except the first, as an individual word.

Each line is a correct spelling of a word. The first part of the list are the capitalized words and the rest are the lowercased ones.

"timed" and "timing" are two forms of a single root word and are not considered the same word as "time". If you create a word list of a document, for all of the words used, time, timed, and timing, are three individually listed words. Just because they share the same root word does not mean they are the same word.

Also, for a spell checker, a word that has the first letter uppercased and a word with that same letter lowercased are treated differently. When not as the first word in a sentence, there are words that are allowed, or even need the first letter to be uppercased, while other will be misspelled if the first letter is uppercased. That is defined in the spell checking .dic file.

You can either take a word and list each version or you can figure out all the control "options" to follow that word so it would also define all of those prefixed and suffixed versions of that word. Since I do not know those control codes, I listed each form or version of the word out in the list so I could also give a "good" word count.

So the 797,865 words in the .dic file is correct.

Would you like to deal with my unpublished 3,068,588 word .dic file that has even more versions and correct spellings of "en_US" words? This contains many, many, suffix and prefix versions that are rarely seen but technically spelled correctly. I just created that version to see how massive it could go. But, I will not publish it as a single dictionary. It would be divided up into "common" and "rare" files to be enabled/disabled as the user would choose. For now, the spell checking extension project is not going to be continued till a lot of other projects are finished - LO projects and many more non-LO projects.


On 05/21/2014 03:20 PM, Tom Davies wrote:
Hi :)
It's interesting that i believed it until i saw who posted it.  Now i have
no idea but think it's unlikely.  I could believe the US trying to dumb
things or be less confusing by removing words so that people have fewer to
choose from.
Regards from
Tom :)


On 21 May 2014 18:09, Urmas <[email protected]> wrote:

"Kracked_P_P---webmaster":

  I might suggest he try the en_US dictionary that contains over 797
thousand words in its list,

That dictionary contains just 476898 words actually.



--
To unsubscribe e-mail to: [email protected]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-
unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be
deleted




--
To unsubscribe e-mail to: [email protected]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Reply via email to