reminds me of "and the longest word in the English language is ... "

          or is it supercalifragilisticespialidocious  ;-)

From: Kracked_P_P---webmaster <>
Date: Thu, May 22, 2014 at 8:58 AM
Subject: Re: [libreoffice-users] Re: Spell Check Dictionary

There are 797866 lines in the .dic file with the top one the number of
words.  The rest of the lines are one word each.  The .dic file treats each
line, except the first, as an individual word.

Each line is a correct spelling of a word.  The first part of the list are
the capitalized words and the rest are the lowercased ones.

"timed" and "timing" are two forms of a single root word and are not
considered the same word as "time".  If you create a word list of a
document, for all of the words used, time, timed, and timing, are three
individually listed words.  Just because they share the same root word does
not mean they are the same word.

Also, for a spell checker, a word that has the first letter uppercased and
a word with that same letter lowercased are treated differently.   When not
as the first word in a sentence, there are words that are allowed, or even
need the first letter to be uppercased, while other will be misspelled if
the first letter is uppercased.  That is defined in the spell checking .dic

You can either take a word and list each version or you can figure out all
the control "options" to follow that word so it would also define all of
those prefixed and suffixed versions of that word. Since I do not know
those control codes, I listed each form or version of the word out in the
list so I could also give a "good" word count.

So the 797,865 words in the .dic file is correct.

Would you like to deal with my unpublished 3,068,588 word .dic file that
has even more versions and correct spellings of "en_US" words?  This
contains many, many, suffix and prefix versions that are rarely seen but
technically spelled correctly.  I just created that version to see how
massive it could go.  But, I will not publish it as a single dictionary.
 It would be divided up into "common" and "rare" files to be
enabled/disabled as the user would choose.  For now, the spell checking
extension project is not going to be continued till a lot of other projects
are finished - LO projects and many more non-LO projects.

To unsubscribe e-mail to:
Posting guidelines + more:
List archive:
All messages sent to this list will be publicly archived and cannot be deleted

Reply via email to