[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2023-11-12 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

Lars Jødal  changed:

   What|Removed |Added

 CC||l...@rn.dk

--- Comment #10 from Lars Jødal  ---
I do not read Dutch, but as I understand the discussion, the problem derives
from compounding of words that in themselves are correct words, but which
become gibberish when compounded. Right? This is not the fault of Hunspell or
LibreOffice, but it is still something we would like to handle.

Here follows some tips from the Danish dictionary.

Danish has a similar problem: Very many correct possibilities of compounding,
but also very many possibilities of nonsense within the normal compound rules.
To minimize the problem, the Danish dictionary has for many years (since 2012)
switched off compounded words from suggestions by this Hunspell option (in the
.aff file):

MAXCPDSUGS 0

This instructs Hunspell to give at most zero (i.e., none) compound words in the
list of suggestions. 

As an example, "kaffe" (coffee) and "klaver" (piano) are both in the dictonary,
and "kaffe" is marked for compounding with no added letters. If the user writes
"kaffeklaver" (coffee piano), then it is accepted as a possible word, but
"kaffeklaver" will not be suggested by the dictionary. The quite common
compound word "kaffebønne" (coffee bean) is in the dictionary, so "kaffebønne"
can be suggested, not a as a compound word, but as word in itself.

More recently, as a current developer of the Danish dictionary, I have myself
added this option to the .aff file:

CHECKCOMPOUNDREP

This option asks Hunspell to check a possible compound word against the REP
list of common mistakes (as defined in the .aff file). If a possible
compounding differs only by a common mistake from a word that is already in the
dictionary (.dic file), then the word is rejected, even though it follows the
compound rules.

Adding this rule will reduce the number of nonsense-compoundings that are
recognized. It will also once in a while weed out a valid compound word, but
these valid words can be added to the dictionary. I.e., if you are a dictionary
developer, it is worth checking the dictionary on a group of texts after adding
the CHECKCOMPOUNDREP option, to catch common words that was earlier recognized
as compounds, but which should better be added as dictionary words.


As for the new LibreOffice option, I consider it a valuable possibility to give
the normal user the possibility to disallow compounding. Dictionary developers
like myself do our best to make good dictionary, including compounding, but it
will always be the case that some users will have different needs than another,
so adding this choice to the individual user is a sure bonus. Thanks to  László
Németh for this option.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2023-01-03 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

--- Comment #9 from Telesto  ---
(In reply to László Németh from comment #8)
1) I do get that there is no ideal solution for the problem, especially
rule-based methodology.

2) The next step will be to remove them from the suggestions, too. 
Oh, well work in progress: nice :-). 

3) So the O-rule mentioned in bug 139319 comment 7 isn't involved? 

4) The LibreOffice 4.2 suggestion often better, in the sense of not containing
gibberish . The suggestion for: "bovnmatige"(Dutch) doesn't contain
"boonmatige" (gibberish) using 4.2

LibreOffice 4.4.0.3 *does* suggest "boonmatige" (and all newer versions do)
Same for the example at bug 139319. No suggestion "sprachgebundene,
sprachgebunden" in 4.2. 

It's hard to say something sensible/generalized. LibreOffice 4.2 isn't
consistently better. Sometimes it's better, sometimes same oddity today as
before. Sometimes it's worse, compared today.

Good in 7.6 bad in 4.2
opinipeilingen
vacinatieprogramma 

Odd suggestions but no invented words/ gibberish with 7.6 as with 4.2
overengekomen 
gedetalleerde 

Gibberish with 7.6 not seen with 4.2
hoofdlettergevoligheid [gibberish: 'hoofdlettergelovigheid']
bvenmatig [gibberish: 'beenmatig' 'boenmatig' 'ovenmatig']
bovnmatige [gibberish: 'boonmatige']

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2023-01-03 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

László Németh  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #8 from László Németh  ---
(In reply to Telesto from comment #6)
> (In reply to Commit Notification from comment #5)
> > László Németh committed a patch related to this issue.
> > It has been pushed to "master":
> > 
> > https://git.libreoffice.org/core/commit/
> > 57d79744c77eef96b4c2bd3b16e0a04317ffcf9e
> > 
> > tdf#136306 offapi linguistic: add options to disable rule-based compounding
> 
> I'm not noticing any differences. It the commit actually related to this bug?

Previously the suggested words were accepted, as correct words. Now it's
possible to reject them, including the English hyphenated compound words with
the new spell-checking options. The next step will be to remove them from the
suggestions, too.

Hunspell 1.7.2 update improved the strange suggestions a little bit: if there
is a dictionary or 2-word rule-based dictionary words, rule-based closed
compound words with 3 or more words won't be suggested.

Note: There is no ideal solution for the problem, especially because limiting
the suggestions can be very slow. I've added a limitation for it, but I had to
remove it, see the code part 

-rv = pAMgr->compound_check(word, 0, 0, 100, 0, NULL,
(hentry**), 0, 1, 0);  // EXT
+int info = (cpdsuggest == 1) ? SPELL_COMPOUND_2 : 0;
+rv = pAMgr->compound_check(word, 0, 0, 100, 0, NULL,
(hentry**), 0, 1, );  // EXT
+// TODO filter 3-word or more compound words, as in spell()
+// (it's too slow to call suggest() here for all possible compound
words)

in
https://github.com/hunspell/hunspell/commit/ff3591b0f76950f13d73123d03a03edd9a892945

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2022-12-31 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

--- Comment #7 from Telesto  ---
Created attachment 184412
  --> https://bugs.documentfoundation.org/attachment.cgi?id=184412=edit
Additional example file (English)

Another example - in this case US English with lots of noise.. even present in
LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2022-12-31 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

Telesto  changed:

   What|Removed |Added

 CC||nem...@numbertext.org

--- Comment #6 from Telesto  ---
(In reply to Commit Notification from comment #5)
> László Németh committed a patch related to this issue.
> It has been pushed to "master":
> 
> https://git.libreoffice.org/core/commit/
> 57d79744c77eef96b4c2bd3b16e0a04317ffcf9e
> 
> tdf#136306 offapi linguistic: add options to disable rule-based compounding

I'm not noticing any differences. It the commit actually related to this bug?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2022-12-31 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

Telesto  changed:

   What|Removed |Added

   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=13
   ||9319

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2022-12-30 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

--- Comment #5 from Commit Notification 
 ---
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/57d79744c77eef96b4c2bd3b16e0a04317ffcf9e

tdf#136306 offapi linguistic: add options to disable rule-based compounding

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2022-12-30 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

Commit Notification  changed:

   What|Removed |Added

 Whiteboard||target:7.6.0

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2021-07-27 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

--- Comment #4 from Telesto  ---
(In reply to Buovjaga from comment #3)
> Source seems to be https://github.com/OpenTaal/opentaal-hunspell

I posted a ticket at github but well no response. And well this isn't unique to
Dutch. German dictionary has also odd results (as far I recall, but have try
around to find some odd stuff)

I can't asses what causes the issue: hunspell dictionary or hunspell itself.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2021-07-26 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

QA Administrators  changed:

   What|Removed |Added

 Whiteboard| QA:needsComment|

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2021-07-26 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

--- Comment #3 from Buovjaga  ---
Source seems to be https://github.com/OpenTaal/opentaal-hunspell

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2020-09-29 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

QA Administrators  changed:

   What|Removed |Added

 Whiteboard|| QA:needsComment

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2020-09-15 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

BogdanB  changed:

   What|Removed |Added

 Blocks||96000
 Whiteboard| QA:needsComment|
 CC||buzea.bog...@libreoffice.or
   ||g


Referenced Bugs:

https://bugs.documentfoundation.org/show_bug.cgi?id=96000
[Bug 96000] [META] Spelling and grammar checking bugs and enhancements
-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2020-09-13 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

QA Administrators  changed:

   What|Removed |Added

 Whiteboard|| QA:needsComment

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2020-08-30 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

Telesto  changed:

   What|Removed |Added

 CC||c...@nouenoff.nl

--- Comment #2 from Telesto  ---
@Cor
Not to happy with the spelling suggestions. The compositions are way off, if
you ask me. But not sure how the whole dictionary thing works.. So to blame it
on Hunspell or the dictionary. Already seen in 3.5.0.3

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 136306] Dutch spell checker produces debatable suggestions

2020-08-30 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=136306

--- Comment #1 from Telesto  ---
Created attachment 164896
  --> https://bugs.documentfoundation.org/attachment.cgi?id=164896=edit
Example file

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs