There are 797866 lines in the .dic file with the top one the number of
words. The rest of the lines are one word each. The .dic file treats
each line, except the first, as an individual word.
Each line is a correct spelling of a word. The first part of the list
are the capitalized words and the rest are the lowercased ones.
"timed" and "timing" are two forms of a single root word and are not
considered the same word as "time". If you create a word list of a
document, for all of the words used, time, timed, and timing, are three
individually listed words. Just because they share the same root word
does not mean they are the same word.
Also, for a spell checker, a word that has the first letter uppercased
and a word with that same letter lowercased are treated differently.
When not as the first word in a sentence, there are words that are
allowed, or even need the first letter to be uppercased, while other
will be misspelled if the first letter is uppercased. That is defined
in the spell checking .dic file.
You can either take a word and list each version or you can figure out
all the control "options" to follow that word so it would also define
all of those prefixed and suffixed versions of that word. Since I do not
know those control codes, I listed each form or version of the word out
in the list so I could also give a "good" word count.
So the 797,865 words in the .dic file is correct.
Would you like to deal with my unpublished 3,068,588 word .dic file that
has even more versions and correct spellings of "en_US" words? This
contains many, many, suffix and prefix versions that are rarely seen but
technically spelled correctly. I just created that version to see how
massive it could go. But, I will not publish it as a single
dictionary. It would be divided up into "common" and "rare" files to be
enabled/disabled as the user would choose. For now, the spell checking
extension project is not going to be continued till a lot of other
projects are finished - LO projects and many more non-LO projects.
On 05/21/2014 03:20 PM, Tom Davies wrote:
Hi :)
It's interesting that i believed it until i saw who posted it. Now i have
no idea but think it's unlikely. I could believe the US trying to dumb
things or be less confusing by removing words so that people have fewer to
choose from.
Regards from
Tom :)
On 21 May 2014 18:09, Urmas <[email protected]> wrote:
"Kracked_P_P---webmaster":
I might suggest he try the en_US dictionary that contains over 797
thousand words in its list,
That dictionary contains just 476898 words actually.
--
To unsubscribe e-mail to: [email protected]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-
unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be
deleted
--
To unsubscribe e-mail to: [email protected]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted