Re: spell checker issue

2012-10-26 Thread Németh László
Hi,

2012/10/25 Caolán McNamara caol...@redhat.com:
 On Mon, 2012-10-15 at 09:37 +0200, Németh László wrote:
 Hi,

 Adding a simple new item to the en_US.dic, like

 men's

 will extend the dictionary. The biggest plus in the American English
 dictionary of LibreOffice is the morphological data (also based on
 Kevin's data and maybe WordNet) for stemming and morphological
 generation in thesaurus suggestions, see the attached conversion
 script in https://issues.apache.org/ooo/show_bug.cgi?id=19563.

 So basically one attractive route to go would be to build our dictionary
 at LibreOffice build time ourselves from wordnet +
 custom-libreoffice-words patch + that script. Which would give us
 something we can easily sync whenever wordnet gets updated without
 losing the extra morphological data. Or is there any gotchas with doing
 that ?

Only a small part of Wordnet – the list of the irregular forms – used
by the script. But the thesaurus of LibreOffice is based on the full
Wordnet, so it would be fine to add the thesaurus generation to the
building process. We would be able to add some attractive thesaurus
improvements, too, like Unicode symbols as synonyms: eg. alpha - α,
skull - ☠, as in the Hungarian thesaurus.

Gotchas: there were some manual fixes (documented in the
README_en_US.txt) to handle Unicode apostrophes and ligatures.
Adding a small list with the most urgent words would be easier for me.

I also tried to find an old OpenOffice.org issue about the quality
analysis/extension of the (American) English dictionary, but I have
found only the
en-GB-oed dictionary for international organizations, see
https://issues.apache.org/ooo/show_bug.cgi?id=51093,
http://ftp.nluug.nl/office/openoffice/contrib/dictionaries/README_en_GB-oed.txt.

Best regards,
László



 C.

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: spell checker issue

2012-10-25 Thread Caolán McNamara
On Mon, 2012-10-15 at 09:37 +0200, Németh László wrote:
 Hi,
 
 Adding a simple new item to the en_US.dic, like
 
 men's
 
 will extend the dictionary. The biggest plus in the American English
 dictionary of LibreOffice is the morphological data (also based on
 Kevin's data and maybe WordNet) for stemming and morphological
 generation in thesaurus suggestions, see the attached conversion
 script in https://issues.apache.org/ooo/show_bug.cgi?id=19563.

So basically one attractive route to go would be to build our dictionary
at LibreOffice build time ourselves from wordnet +
custom-libreoffice-words patch + that script. Which would give us
something we can easily sync whenever wordnet gets updated without
losing the extra morphological data. Or is there any gotchas with doing
that ?

C.

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: spell checker issue

2012-10-15 Thread Németh László
Hi,

Adding a simple new item to the en_US.dic, like

men's

will extend the dictionary. The biggest plus in the American English
dictionary of LibreOffice is the morphological data (also based on
Kevin's data and maybe WordNet) for stemming and morphological
generation in thesaurus suggestions, see the attached conversion
script in https://issues.apache.org/ooo/show_bug.cgi?id=19563.

By the way, Firefox or Google Chrome
(http://src.chromium.org/viewvc/chrome/trunk/deps/third_party/hunspell_dictionaries/en_US.dic_delta?revision=138928view=markup)
have got some new words, too, as patches.

Regards,
László

2012/10/11 Caolán McNamara caol...@redhat.com:
 On Sun, 2012-09-30 at 12:47 -0700, Steven Howe wrote:
 Who deals with spell checker dictionary issues?

 I'm using the work  men's ; the spell checker thinks this is wrong,
 although spell checker for gmail does not. I've visited webster's
 dictionary online. men's appears to be the correct spelling.

 English - US, right ? Best in general to submit a bug about these
 things. But it does bring up the general case as to what's the
 canonical upstream for the English dictionaries.

 e.g. for Fedora I consider Kevin's wordlist at
 http://wordlist.sourceforge.net/ as the upstream of the en-US dictionary
 and in that light I've submitted
 https://sourceforge.net/tracker/?func=detailaid=3576342group_id=10079atid=1014602
 which would allow men's, women's and other possessive of irregular
 plural nouns.

 I'm not entirely sure of the provenance of the en-US dictionaries we
 have in LibreOffice. I mean, IIRC they are derived ultimately from
 Kevin's list, but I don't know if they are resynced occasionally or if
 Nemeth is maintaining them in some source format somewhere else. Or if
 they have accidentally forked themselves over time.

 They definitely appear to be at least affix compressed or something into
 something sufficiently unreadable I can't trivially see the right way to
 add men's, women's to the copies we have in our tree :-)

 C.

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: spell checker issue

2012-10-11 Thread Caolán McNamara
On Sun, 2012-09-30 at 12:47 -0700, Steven Howe wrote:
 Who deals with spell checker dictionary issues?
 
 I'm using the work  men's ; the spell checker thinks this is wrong,
 although spell checker for gmail does not. I've visited webster's
 dictionary online. men's appears to be the correct spelling.

English - US, right ? Best in general to submit a bug about these
things. But it does bring up the general case as to what's the
canonical upstream for the English dictionaries.

e.g. for Fedora I consider Kevin's wordlist at
http://wordlist.sourceforge.net/ as the upstream of the en-US dictionary
and in that light I've submitted
https://sourceforge.net/tracker/?func=detailaid=3576342group_id=10079atid=1014602
which would allow men's, women's and other possessive of irregular
plural nouns.

I'm not entirely sure of the provenance of the en-US dictionaries we
have in LibreOffice. I mean, IIRC they are derived ultimately from
Kevin's list, but I don't know if they are resynced occasionally or if
Nemeth is maintaining them in some source format somewhere else. Or if
they have accidentally forked themselves over time.

They definitely appear to be at least affix compressed or something into
something sufficiently unreadable I can't trivially see the right way to
add men's, women's to the copies we have in our tree :-)

C.

___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


spell checker issue

2012-09-30 Thread Steven Howe
Who deals with spell checker dictionary issues?

I'm using the work  men's ; the spell checker thinks this is wrong,
although spell checker for gmail does not. I've visited webster's
dictionary online. men's appears to be the correct spelling.
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice