Re: iconv warnings

2017-09-10 Thread Uwe Stöhr

El 10.09.2017 a las 14:18, Uwe Stöhr escribió:

I just compiled the current 2.3.x branch to perform some final tests. I 
noted some avoidable compiler warnings in libiconv, see below


I reported the warnings to the email address and got already a reply. 
They will analyze the signed/unsigned mismatch warnings for future 
releases meanwhile I can ignore them. the other 2 sorts of warnings are 
not explainable in their opinion and I should ignore them too.


regards Uwe


iconv warnings

2017-09-10 Thread Uwe Stöhr
I just compiled the current 2.3.x branch to perform some final tests. I 
noted some avoidable compiler warnings in libiconv, see below. The 
question is if we should or could fix this? If we don't do this, does 
anybody know where I can or should report this?

There is a github page:
https://github.com/win-iconv/win-iconv/issues
which seems to be inactive
but also
https://www.gnu.org/software/libiconv/ that suggest a mail address to 
report bugs.


thanks and regards
Uwe

  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\utf7.h(162): warning 
C4018: '<': signed/unsigned mismatch (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\utf7.h(331): warning 
C4018: '<': signed/unsigned mismatch (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\hz.h(39): warning C4018: 
'<': signed/unsigned mismatch (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\hz.h(51): warning C4018: 
'<': signed/unsigned mismatch (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\hz.h(57): warning C4018: 
'<': signed/unsigned mismatch (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\hz.h(65): warning C4018: 
'<': signed/unsigned mismatch (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\hz.h(80): warning C4018: 
'<': signed/unsigned mismatch (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\loop_unicode.h(47): 
warning C4018: '<=': signed/unsigned mismatch (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\loop_unicode.h(91): 
warning C4018: '<=': signed/unsigned mismatch (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\loop_unicode.h(142): 
warning C4018: '<=': signed/unsigned mismatch (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\loop_unicode.h(258): 
warning C4018: '<=': signed/unsigned mismatch (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\loop_unicode.h(418): 
warning C4018: '<=': signed/unsigned mismatch (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\loop_unicode.h(422): 
warning C4018: '<=': signed/unsigned mismatch (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\loop_unicode.h(503): 
warning C4018: '<=': signed/unsigned mismatch (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\loop_unicode.h(519): 
warning C4018: '<=': signed/unsigned mismatch (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
  d:\lyxgit\2.3.x\3rdparty\libiconv\1.14\lib\loop_wchar.h(40): warning 
C4273: 'mbrtowc': inconsistent dll linkage (compiling source file 
D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c) 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]
 D:\LyXGit\2.3.x\3rdparty\libiconv\1.14\lib\iconv.c(427): warning 
C4090: 'function': different 'const' qualifiers 
[D:\LyXGit\2.3.x\compile-2015\3rdparty\libiconv\iconv.vcxproj]


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Vincent van Ravesteijn
On Fri, Apr 11, 2014 at 1:40 AM, Cyrille Artho c.ar...@aist.go.jp wrote:
 Regarding the idea I just mentioned before, there is a major flaw.

 Asian languages do not have spaces. Tokenizing a text into words requires a
 dictionary and is a non-trivial problem (due to inflection: different verb
 forms need to be recognized, etc.). We can therefore not just scan for
 whitespaces and forward anything in between to a spell checker, unless we
 restrict that workaround to Western languages.


Are there spellcheckers for e.g. Chinese ? It sounds a bit
contradicting as they don't have any spelling. Of course there are
words consisting of multiple characters, but these characters can also
be used on their own.

Vincent


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Stephan Witt
Am 11.04.2014 um 01:36 schrieb Cyrille Artho c.ar...@aist.go.jp:

 I agree that it would be good to have all dictionaries in utf-8, but I'm not 
 sure if this is feasible for a typical user/installation.
 
 Another option would be for LyX to tokenize the text and forward it word by 
 word to the spell checker.

That's the way the hunspell and aspell spell checker backends work.

For Mac builds there is another one - the native OS service for spell 
checking.
The latter passes the complete paragraph to the spell checker engine.
This results in a) an improved performance and b) better results because of
the builtin automatic language detection. So there are less false positives.

The paragraph passing mode can be used for languages without easily detectable
word boundaries. Perhaps that way LyX is already able to spell check Chinese 
text
on Mac. I never tried that and I'm unable to judge the result.

Stephan

 
 This way, we could handle Ignore All in LyX itself rather than let the 
 spell checker ignore the word. LyX would never forward ignored words to the 
 spell checker but all the remaining words would be handled by the spell 
 checker.
 
 Jürgen Spitzmüller wrote:
 2014-04-10 14:18 GMT+02:00 Jean-Marc Lasgouttes lasgout...@lyx.org
 mailto:lasgout...@lyx.org:
 
 
The point is that users cannot do something sensible with such marked
words (except for adding them into the personal dictionary).
 
 
Sure, but the same holds for Lasgouttes, doesn't it?
 
 
 No, if the encoding fits, I can hit Ignore all and only ignore you (or
 your name's spelling, for that matter) in the current document (which is
 what I do for names usually, except for very recurrent names). If the
 encoding does not fit, hitting Ignore all just would not work. I think we
 would need to at least disable the ignore all button/menu entry in that
 case, otherwise users would rightly complain about that bug (they would
 also, probably, not understand why the function is disabled for specific
 names.).
 
 So, to sum up: I agree with all of you that strings from non-matching
 encodings should be marked as unknown, but only if we can provide sensible
 action.
 
 Jürgen
 
 BTW German hunspell suggests Ausgelastet for Lasgouttes, which means
 fully occupied or snowed with work.
 
 
JMarc
 
 
 
 -- 
 Regards,
 Cyrille Artho - http://artho.com/
 The opposite of a correct statement is a false statement. But the
 opposite of a profound truth may well be another profound truth.
   -- Niels Bohr



Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Jürgen Spitzmüller
2014-04-10 20:43 GMT+02:00 Georg Baum:

 Does this mean that we need to maintain our own versions? If not then it is
 probably the best solution, if yes then I'd rather not do it.


The Chromium project does maintain utf8 versions (or deltas, for that
matter):
http://www.chromium.org/developers/how-tos/editing-the-spell-checking-dictionaries

Jürgen




 Georg





Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Jürgen Spitzmüller
2014-04-10 22:30 GMT+02:00 Stephan Witt:

 Am 10.04.2014 um 20:43 schrieb Georg Baum:
   It is probably not difficult to implement sensible behaviour for
 ignore
  and ignore all for these words: HunspellChecker has already a member
  variable ignored_ which tracks ignored words, so if words which created
 an
  encoding error on spell checking would be kept in a different list as
 well,
  then ignore and ignore all could simply add the affceted words to the
  ignored list.

 Like another personal word list, but not a persistent one.


Yes, this sounds like a good idea.

Jürgen


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Jean-Marc Lasgouttes

11/04/2014 08:23, Stephan Witt:

Am 11.04.2014 um 01:36 schrieb Cyrille Artho c.ar...@aist.go.jp:


I agree that it would be good to have all dictionaries in utf-8, but I'm not 
sure if this is feasible for a typical user/installation.

Another option would be for LyX to tokenize the text and forward it word by 
word to the spell checker.


That's the way the hunspell and aspell spell checker backends work.

For Mac builds there is another one - the native OS service for spell 
checking.
The latter passes the complete paragraph to the spell checker engine.
This results in a) an improved performance and b) better results because of
the builtin automatic language detection. So there are less false positives.


So do you mean that if I write in an English text somme instead of 
some, if will be considered an OK work because somme exists in 
French? Is that supposed to be a feature?


Recently I did some proofreading of a paper written with TeXShop (LaTeX 
editor for Mac). It turned out that the text was peppered with french 
words. From what I understand, this horror was a joint work of automatic 
correction and automatic language detection :(


JMarc



Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Vincent van Ravesteijn
 So do you mean that if I write in an English text somme instead of some,
 if will be considered an OK work because somme exists in French? Is that
 supposed to be a feature?


I guess that the guessing is done for the whole paragraph

 Recently I did some proofreading of a paper written with TeXShop (LaTeX
 editor for Mac). It turned out that the text was peppered with french words.
 From what I understand, this horror was a joint work of automatic correction
 and automatic language detection :(

I always thought that the French did this on purpose to not surrender
to the fact that the English language dominates the world.

Vincent


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Jean-Marc Lasgouttes

11/04/2014 10:31, Vincent van Ravesteijn:

Recently I did some proofreading of a paper written with TeXShop (LaTeX
editor for Mac). It turned out that the text was peppered with french words.
 From what I understand, this horror was a joint work of automatic correction
and automatic language detection :(


I always thought that the French did this on purpose to not surrender
to the fact that the English language dominates the world.


In this case, it was just some evil North American programmer trying to 
undermine our international credibility. Now you understand what we have 
to endure.


JMarc



Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Cyrille Artho

I always thought that the French did this on purpose to not surrender
to the fact that the English language dominates the world.

Vincent

They did this to the keyboard layout, too. If you ever tried to use a 
computer in a French Internet cafe, good luck typing your password (even 
your username will be a challenge to type)! ;-)

--
Regards,
Cyrille Artho - http://artho.com/
Those who will not reason, are bigots, those who cannot,
are fools, and those who dare not, are slaves.
-- George Gordon Noel Byron


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Jean-Marc Lasgouttes

11/04/2014 11:06, Cyrille Artho:

I always thought that the French did this on purpose to not surrender
to the fact that the English language dominates the world.

Vincent


They did this to the keyboard layout, too. If you ever tried to use a
computer in a French Internet cafe, good luck typing your password (even
your username will be a challenge to type)! ;-)


There is something worse than that: trying to program on a Mac with a 
french keyboard layout. Characters like \, [, ] or | require 
Shifp+Option modifier. They probably did not have French coders working 
there.


JMarc


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Stephan Witt
Am 11.04.2014 um 10:23 schrieb Jean-Marc Lasgouttes lasgout...@lyx.org:

 11/04/2014 08:23, Stephan Witt:
 Am 11.04.2014 um 01:36 schrieb Cyrille Artho c.ar...@aist.go.jp:
 
 I agree that it would be good to have all dictionaries in utf-8, but I'm 
 not sure if this is feasible for a typical user/installation.
 
 Another option would be for LyX to tokenize the text and forward it word by 
 word to the spell checker.
 
 That's the way the hunspell and aspell spell checker backends work.
 
 For Mac builds there is another one - the native OS service for spell 
 checking.
 The latter passes the complete paragraph to the spell checker engine.
 This results in a) an improved performance and b) better results because of
 the builtin automatic language detection. So there are less false positives.
 
 So do you mean that if I write in an English text somme instead of some, 
 if will be considered an OK work because somme exists in French? Is that 
 supposed to be a feature?


Indeed. Following is the debug output of text input while instant-spellchecking 
is enabled:

AppleSpellChecker.cpp (95): spellCheck: so = OK, lang = en_US
AppleSpellChecker.cpp (95): spellCheck: som = FAILED, lang = en_US
Paragraph.cpp (4115): misspelled word: som [518..520]
AppleSpellChecker.cpp (95): spellCheck: somm = FAILED, lang = en_US
Paragraph.cpp (4115): misspelled word: somm [518..521]
AppleSpellChecker.cpp (95): spellCheck: somme = OK, lang = en_US
AppleSpellChecker.cpp (95): spellCheck: somme  = OK, lang = en_US

Stephan

Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Jean-Marc Lasgouttes

11/04/2014 11:23, Stephan Witt:

So do you mean that if I write in an English text somme instead
of some, if will be considered an OK work because somme exists
in French? Is that supposed to be a feature?


Indeed. Following is the debug output of text input while
instant-spellchecking is enabled:

AppleSpellChecker.cpp (95): spellCheck: so = OK, lang = en_US
AppleSpellChecker.cpp (95): spellCheck: som = FAILED, lang = en_US
Paragraph.cpp (4115): misspelled word: som [518..520]
AppleSpellChecker.cpp (95): spellCheck: somm = FAILED, lang =
en_US Paragraph.cpp (4115): misspelled word: somm [518..521]
AppleSpellChecker.cpp (95): spellCheck: somme = OK, lang = en_US
AppleSpellChecker.cpp (95): spellCheck: somme  = OK, lang = en_US


Is there a way to avoid this feature?

JMarc


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Stephan Witt
Am 11.04.2014 um 11:56 schrieb Jean-Marc Lasgouttes lasgout...@lyx.org:

 11/04/2014 11:23, Stephan Witt:
 So do you mean that if I write in an English text somme instead
 of some, if will be considered an OK work because somme exists
 in French? Is that supposed to be a feature?
 
 Indeed. Following is the debug output of text input while
 instant-spellchecking is enabled:
 
 AppleSpellChecker.cpp (95): spellCheck: so = OK, lang = en_US
 AppleSpellChecker.cpp (95): spellCheck: som = FAILED, lang = en_US
 Paragraph.cpp (4115): misspelled word: som [518..520]
 AppleSpellChecker.cpp (95): spellCheck: somm = FAILED, lang =
 en_US Paragraph.cpp (4115): misspelled word: somm [518..521]
 AppleSpellChecker.cpp (95): spellCheck: somme = OK, lang = en_US
 AppleSpellChecker.cpp (95): spellCheck: somme  = OK, lang = en_US
 
 Is there a way to avoid this feature?

I don't know. It's a black box. It's a OS service.
Perhaps it can be configured somewhere, via System Preferences or API.

With LyX you can use hunspell as the spell checker backend instead.

Stephan


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Jean-Marc Lasgouttes

11/04/2014 12:01, Stephan Witt:

Is there a way to avoid this feature?


I don't know. It's a black box. It's a OS service.
Perhaps it can be configured somewhere, via System Preferences or API.

With LyX you can use hunspell as the spell checker backend instead.

Stephan



It looks liike there is at least some control.
http://macs.about.com/od/OSXLion107/qt/Os-X-Lion-Automatic-Spelling-Correction.htm

This feature is really scary. Can we limit the allowed languages? In 
this case you could maybe only send strings with same language.


JMarc


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Vincent van Ravesteijn
On Fri, Apr 11, 2014 at 1:40 AM, Cyrille Artho  wrote:
> Regarding the idea I just mentioned before, there is a major flaw.
>
> Asian languages do not have spaces. Tokenizing a text into words requires a
> dictionary and is a non-trivial problem (due to inflection: different verb
> forms need to be recognized, etc.). We can therefore not just scan for
> whitespaces and forward anything in between to a spell checker, unless we
> restrict that workaround to Western languages.
>

Are there spellcheckers for e.g. Chinese ? It sounds a bit
contradicting as they don't have any "spelling". Of course there are
words consisting of multiple characters, but these characters can also
be used on their own.

Vincent


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Stephan Witt
Am 11.04.2014 um 01:36 schrieb Cyrille Artho :

> I agree that it would be good to have all dictionaries in utf-8, but I'm not 
> sure if this is feasible for a typical user/installation.
> 
> Another option would be for LyX to tokenize the text and forward it word by 
> word to the spell checker.

That's the way the hunspell and aspell spell checker backends work.

For Mac builds there is another one - the "native" OS service for spell 
checking.
The latter passes the complete paragraph to the spell checker engine.
This results in a) an improved performance and b) better results because of
the builtin automatic language detection. So there are less false positives.

The paragraph passing mode can be used for languages without easily detectable
word boundaries. Perhaps that way LyX is already able to spell check Chinese 
text
on Mac. I never tried that and I'm unable to judge the result.

Stephan

> 
> This way, we could handle "Ignore All" in LyX itself rather than let the 
> spell checker ignore the word. LyX would never forward ignored words to the 
> spell checker but all the remaining words would be handled by the spell 
> checker.
> 
> Jürgen Spitzmüller wrote:
>> 2014-04-10 14:18 GMT+02:00 Jean-Marc Lasgouttes > >:
>> 
>> 
>>The point is that users cannot do something sensible with such marked
>>words (except for adding them into the personal dictionary).
>> 
>> 
>>Sure, but the same holds for "Lasgouttes", doesn't it?
>> 
>> 
>> No, if the encoding fits, I can hit "Ignore all" and only ignore you (or
>> your name's spelling, for that matter) in the current document (which is
>> what I do for names usually, except for very recurrent names). If the
>> encoding does not fit, hitting "Ignore all" just would not work. I think we
>> would need to at least disable the ignore all button/menu entry in that
>> case, otherwise users would rightly complain about that bug (they would
>> also, probably, not understand why the function is disabled for specific
>> names.).
>> 
>> So, to sum up: I agree with all of you that strings from non-matching
>> encodings should be marked as unknown, but only if we can provide sensible
>> action.
>> 
>> Jürgen
>> 
>> BTW German hunspell suggests "Ausgelastet" for "Lasgouttes", which means
>> "fully occupied" or "snowed with work".
>> 
>> 
>>JMarc
>> 
>> 
> 
> -- 
> Regards,
> Cyrille Artho - http://artho.com/
> The opposite of a correct statement is a false statement. But the
> opposite of a profound truth may well be another profound truth.
>   -- Niels Bohr



Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Jürgen Spitzmüller
2014-04-10 20:43 GMT+02:00 Georg Baum:

> Does this mean that we need to maintain our own versions? If not then it is
> probably the best solution, if yes then I'd rather not do it.
>

The Chromium project does maintain utf8 versions (or "deltas", for that
matter):
http://www.chromium.org/developers/how-tos/editing-the-spell-checking-dictionaries

Jürgen


>
>
> Georg
>
>
>


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Jürgen Spitzmüller
2014-04-10 22:30 GMT+02:00 Stephan Witt:

> Am 10.04.2014 um 20:43 schrieb Georg Baum:
>  > It is probably not difficult to implement sensible behaviour for
> "ignore"
> > and "ignore all" for these words: HunspellChecker has already a member
> > variable ignored_ which tracks ignored words, so if words which created
> an
> > encoding error on spell checking would be kept in a different list as
> well,
> > then "ignore" and "ignore all" could simply add the affceted words to the
> > ignored list.
>
> Like another personal word list, but not a persistent one.
>

Yes, this sounds like a good idea.

Jürgen


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Jean-Marc Lasgouttes

11/04/2014 08:23, Stephan Witt:

Am 11.04.2014 um 01:36 schrieb Cyrille Artho :


I agree that it would be good to have all dictionaries in utf-8, but I'm not 
sure if this is feasible for a typical user/installation.

Another option would be for LyX to tokenize the text and forward it word by 
word to the spell checker.


That's the way the hunspell and aspell spell checker backends work.

For Mac builds there is another one - the "native" OS service for spell 
checking.
The latter passes the complete paragraph to the spell checker engine.
This results in a) an improved performance and b) better results because of
the builtin automatic language detection. So there are less false positives.


So do you mean that if I write in an English text "somme" instead of 
"some", if will be considered an OK work because "somme" exists in 
French? Is that supposed to be a feature?


Recently I did some proofreading of a paper written with TeXShop (LaTeX 
editor for Mac). It turned out that the text was peppered with french 
words. From what I understand, this horror was a joint work of automatic 
correction and automatic language detection :(


JMarc



Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Vincent van Ravesteijn
> So do you mean that if I write in an English text "somme" instead of "some",
> if will be considered an OK work because "somme" exists in French? Is that
> supposed to be a feature?
>

I guess that the guessing is done for the whole paragraph

> Recently I did some proofreading of a paper written with TeXShop (LaTeX
> editor for Mac). It turned out that the text was peppered with french words.
> From what I understand, this horror was a joint work of automatic correction
> and automatic language detection :(

I always thought that the French did this on purpose to not surrender
to the fact that the English language dominates the world.

Vincent


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Jean-Marc Lasgouttes

11/04/2014 10:31, Vincent van Ravesteijn:

Recently I did some proofreading of a paper written with TeXShop (LaTeX
editor for Mac). It turned out that the text was peppered with french words.
 From what I understand, this horror was a joint work of automatic correction
and automatic language detection :(


I always thought that the French did this on purpose to not surrender
to the fact that the English language dominates the world.


In this case, it was just some evil North American programmer trying to 
undermine our international credibility. Now you understand what we have 
to endure.


JMarc



Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Cyrille Artho

I always thought that the French did this on purpose to not surrender
to the fact that the English language dominates the world.

Vincent

They did this to the keyboard layout, too. If you ever tried to use a 
computer in a French Internet cafe, good luck typing your password (even 
your username will be a challenge to type)! ;-)

--
Regards,
Cyrille Artho - http://artho.com/
Those who will not reason, are bigots, those who cannot,
are fools, and those who dare not, are slaves.
-- George Gordon Noel Byron


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Jean-Marc Lasgouttes

11/04/2014 11:06, Cyrille Artho:

I always thought that the French did this on purpose to not surrender
to the fact that the English language dominates the world.

Vincent


They did this to the keyboard layout, too. If you ever tried to use a
computer in a French Internet cafe, good luck typing your password (even
your username will be a challenge to type)! ;-)


There is something worse than that: trying to program on a Mac with a 
french keyboard layout. Characters like \, [, ] or | require 
Shifp+Option modifier. They probably did not have French coders working 
there.


JMarc


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Stephan Witt
Am 11.04.2014 um 10:23 schrieb Jean-Marc Lasgouttes :

> 11/04/2014 08:23, Stephan Witt:
>> Am 11.04.2014 um 01:36 schrieb Cyrille Artho :
>> 
>>> I agree that it would be good to have all dictionaries in utf-8, but I'm 
>>> not sure if this is feasible for a typical user/installation.
>>> 
>>> Another option would be for LyX to tokenize the text and forward it word by 
>>> word to the spell checker.
>> 
>> That's the way the hunspell and aspell spell checker backends work.
>> 
>> For Mac builds there is another one - the "native" OS service for spell 
>> checking.
>> The latter passes the complete paragraph to the spell checker engine.
>> This results in a) an improved performance and b) better results because of
>> the builtin automatic language detection. So there are less false positives.
> 
> So do you mean that if I write in an English text "somme" instead of "some", 
> if will be considered an OK work because "somme" exists in French? Is that 
> supposed to be a feature?


Indeed. Following is the debug output of text input while instant-spellchecking 
is enabled:

AppleSpellChecker.cpp (95): spellCheck: "so" = OK, lang = en_US
AppleSpellChecker.cpp (95): spellCheck: "som" = FAILED, lang = en_US
Paragraph.cpp (4115): misspelled word: "som" [518..520]
AppleSpellChecker.cpp (95): spellCheck: "somm" = FAILED, lang = en_US
Paragraph.cpp (4115): misspelled word: "somm" [518..521]
AppleSpellChecker.cpp (95): spellCheck: "somme" = OK, lang = en_US
AppleSpellChecker.cpp (95): spellCheck: "somme " = OK, lang = en_US

Stephan

Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Jean-Marc Lasgouttes

11/04/2014 11:23, Stephan Witt:

So do you mean that if I write in an English text "somme" instead
of "some", if will be considered an OK work because "somme" exists
in French? Is that supposed to be a feature?


Indeed. Following is the debug output of text input while
instant-spellchecking is enabled:

AppleSpellChecker.cpp (95): spellCheck: "so" = OK, lang = en_US
AppleSpellChecker.cpp (95): spellCheck: "som" = FAILED, lang = en_US
Paragraph.cpp (4115): misspelled word: "som" [518..520]
AppleSpellChecker.cpp (95): spellCheck: "somm" = FAILED, lang =
en_US Paragraph.cpp (4115): misspelled word: "somm" [518..521]
AppleSpellChecker.cpp (95): spellCheck: "somme" = OK, lang = en_US
AppleSpellChecker.cpp (95): spellCheck: "somme " = OK, lang = en_US


Is there a way to avoid this "feature"?

JMarc


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Stephan Witt
Am 11.04.2014 um 11:56 schrieb Jean-Marc Lasgouttes :

> 11/04/2014 11:23, Stephan Witt:
>>> So do you mean that if I write in an English text "somme" instead
>>> of "some", if will be considered an OK work because "somme" exists
>>> in French? Is that supposed to be a feature?
>> 
>> Indeed. Following is the debug output of text input while
>> instant-spellchecking is enabled:
>> 
>> AppleSpellChecker.cpp (95): spellCheck: "so" = OK, lang = en_US
>> AppleSpellChecker.cpp (95): spellCheck: "som" = FAILED, lang = en_US
>> Paragraph.cpp (4115): misspelled word: "som" [518..520]
>> AppleSpellChecker.cpp (95): spellCheck: "somm" = FAILED, lang =
>> en_US Paragraph.cpp (4115): misspelled word: "somm" [518..521]
>> AppleSpellChecker.cpp (95): spellCheck: "somme" = OK, lang = en_US
>> AppleSpellChecker.cpp (95): spellCheck: "somme " = OK, lang = en_US
> 
> Is there a way to avoid this "feature"?

I don't know. It's a black box. It's a OS service.
Perhaps it can be configured somewhere, via System Preferences or API.

With LyX you can use hunspell as the spell checker backend instead.

Stephan


Re: [patch for 2.2] silence iconv warnings

2014-04-11 Thread Jean-Marc Lasgouttes

11/04/2014 12:01, Stephan Witt:

Is there a way to avoid this "feature"?


I don't know. It's a black box. It's a OS service.
Perhaps it can be configured somewhere, via System Preferences or API.

With LyX you can use hunspell as the spell checker backend instead.

Stephan



It looks liike there is at least some control.
http://macs.about.com/od/OSXLion107/qt/Os-X-Lion-Automatic-Spelling-Correction.htm

This feature is really scary. Can we limit the allowed languages? In 
this case you could maybe only send strings with same language.


JMarc


Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Cyrille Artho
How is the call to iconv implemented? On the application level, the 
interface is probably not flexible enough; it is easy to ignore 
non-convertible characters, but they are just removed from the output.


The C library interface is probably richer. Is it possible to convert text 
word by word, find out which words are not convertible, and ignore the 
dictionary for those words? (The user could still choose to ignore/add them 
to the custom dictionary.)


Maybe this requires a different way of integrating spell checkers?

Jürgen Spitzmüller wrote:

2014-04-10 1:02 GMT+02:00 Cyrille Artho c.ar...@aist.go.jp
mailto:c.ar...@aist.go.jp:

I think we have to use Unicode for all the given operations and (a)
either risk a mismatch for each word that is not learned/ignored, or
(b) up-convert words in the dictionary before they are matched. The
latter solution implies that the dictionary tool supports this; does
anyone know if that is the case (for at least one tool)?


This is the problem here. Hunspell dictionaries are often not
unicode-encoded. So we are stuck with non-unicode encodings.

Jürgen


--
Regards,
Cyrille Artho - http://artho.com/
No problem is so formidable that you can't just walk away from it.
-- C. Schulz


Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Stephan Witt
Am 10.04.2014 um 08:57 schrieb Cyrille Artho c.ar...@aist.go.jp:

 How is the call to iconv implemented? On the application level, the interface 
 is probably not flexible enough; it is easy to ignore non-convertible 
 characters, but they are just removed from the output.
 
 The C library interface is probably richer. Is it possible to convert text 
 word by word, find out which words are not convertible, and ignore the 
 dictionary for those words? (The user could still choose to ignore/add them 
 to the custom dictionary.)

The ignore operation is part of the spell checker API. 

You have to use the dictionary encoding for it, IMO.
At least it is safe to do so. The behavior for not
using the dictionary encoding when adding words at run 
time is not documented.

 Maybe this requires a different way of integrating spell checkers?

Ideally the dictionaries should be converted to UTF-8, IMHO.
But they are not provided by the LyX developers.

Stephan

 Jürgen Spitzmüller wrote:
 2014-04-10 1:02 GMT+02:00 Cyrille Artho c.ar...@aist.go.jp
 mailto:c.ar...@aist.go.jp:
 
I think we have to use Unicode for all the given operations and (a)
either risk a mismatch for each word that is not learned/ignored, or
(b) up-convert words in the dictionary before they are matched. The
latter solution implies that the dictionary tool supports this; does
anyone know if that is the case (for at least one tool)?
 
 
 This is the problem here. Hunspell dictionaries are often not
 unicode-encoded. So we are stuck with non-unicode encodings.
 
 Jürgen
 
 -- 
 Regards,
 Cyrille Artho - http://artho.com/
 No problem is so formidable that you can't just walk away from it.
   -- C. Schulz



Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread JeanMarc Lasgouttes
There is something that I do not understand. If the word is not representable 
in the German dictionary, presumably it is not part of the language. 
Lasgoittes is perfectly representable in any latin encoding and yet the 
spell-checker will mark it as misspelled. Why should it be different for a name 
with weird accents?

JMarc

On 10 avril 2014 08:53:36 UTC+02:00, Jürgen Spitzmüller sp...@lyx.org wrote:
2014-04-10 1:02 GMT+02:00 Cyrille Artho c.ar...@aist.go.jp:

 I think we have to use Unicode for all the given operations and (a)
either
 risk a mismatch for each word that is not learned/ignored, or (b)
 up-convert words in the dictionary before they are matched. The
latter
 solution implies that the dictionary tool supports this; does anyone
know
 if that is the case (for at least one tool)?


This is the problem here. Hunspell dictionaries are often not
unicode-encoded. So we are stuck with non-unicode encodings.

Jürgen


Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Jürgen Spitzmüller
2014-04-10 10:11 GMT+02:00 JeanMarc Lasgouttes lasgout...@lyx.org:

 There is something that I do not understand. If the word is not
 representable in the German dictionary, presumably it is not part of the
 language. Lasgoittes is perfectly representable in any latin encoding and
 yet the spell-checker will mark it as misspelled. Why should it be
 different for a name with weird accents?


The point is that users cannot do something sensible with such marked words
(except for adding them into the personal dictionary).

Actually, I tend to convert all hunspell dictionaries to utf8. This seems
the only proper solution to this problem.

Jürgen


Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Jean-Marc Lasgouttes

10/04/2014 14:14, Jürgen Spitzmüller:

There is something that I do not understand. If the word is not
representable in the German dictionary, presumably it is not part of
the language. Lasgoittes is perfectly representable in any latin
encoding and yet the spell-checker will mark it as misspelled. Why
should it be different for a name with weird accents?


The point is that users cannot do something sensible with such marked
words (except for adding them into the personal dictionary).


Sure, but the same holds for Lasgouttes, doesn't it?

JMarc



Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Jürgen Spitzmüller
2014-04-10 14:18 GMT+02:00 Jean-Marc Lasgouttes lasgout...@lyx.org:


 The point is that users cannot do something sensible with such marked
 words (except for adding them into the personal dictionary).


 Sure, but the same holds for Lasgouttes, doesn't it?


No, if the encoding fits, I can hit Ignore all and only ignore you (or
your name's spelling, for that matter) in the current document (which is
what I do for names usually, except for very recurrent names). If the
encoding does not fit, hitting Ignore all just would not work. I think we
would need to at least disable the ignore all button/menu entry in that
case, otherwise users would rightly complain about that bug (they would
also, probably, not understand why the function is disabled for specific
names.).

So, to sum up: I agree with all of you that strings from non-matching
encodings should be marked as unknown, but only if we can provide sensible
action.

Jürgen

BTW German hunspell suggests Ausgelastet for Lasgouttes, which means
fully occupied or snowed with work.



 JMarc




Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Jean-Marc Lasgouttes

10/04/2014 14:29, Jürgen Spitzmüller:

No, if the encoding fits, I can hit Ignore all and only ignore you (or
your name's spelling, for that matter) in the current document (which is
what I do for names usually, except for very recurrent names). If the
encoding does not fit, hitting Ignore all just would not work. I think
we would need to at least disable the ignore all button/menu entry in
that case, otherwise users would rightly complain about that bug (they
would also, probably, not understand why the function is disabled for
specific names.).


I see.



BTW German hunspell suggests Ausgelastet for Lasgouttes, which means
fully occupied or snowed with work.


It is not so bad for a word that really does not look like the original. 
Does Hunspell know me or what?


JMarc



Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Georg Baum
Jürgen Spitzmüller wrote:

 The point is that users cannot do something sensible with such marked
 words (except for adding them into the personal dictionary).

It is probably not difficult to implement sensible behaviour for ignore 
and ignore all for these words: HunspellChecker has already a member 
variable ignored_ which tracks ignored words, so if words which created an 
encoding error on spell checking would be kept in a different list as well, 
then ignore and ignore all could simply add the affceted words to the 
ignored list.

 Actually, I tend to convert all hunspell dictionaries to utf8. This seems
 the only proper solution to this problem.

Does this mean that we need to maintain our own versions? If not then it is 
probably the best solution, if yes then I'd rather not do it.


Georg




Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Stephan Witt
Am 10.04.2014 um 20:43 schrieb Georg Baum georg.b...@post.rwth-aachen.de:

 Jürgen Spitzmüller wrote:
 
 The point is that users cannot do something sensible with such marked
 words (except for adding them into the personal dictionary).
 
 It is probably not difficult to implement sensible behaviour for ignore 
 and ignore all for these words: HunspellChecker has already a member 
 variable ignored_ which tracks ignored words, so if words which created an 
 encoding error on spell checking would be kept in a different list as well, 
 then ignore and ignore all could simply add the affceted words to the 
 ignored list.

Like another personal word list, but not a persistent one.

BTW: it depends on the spellchecker how it works.

This is the debug output of the Apple builtin spell checker: 

AppleSpellChecker.cpp (95): spellCheck: This is mixing languages with writing 
systems, IMHO. In fact language sometimes has an implication on the spelling of 
names (if it comes to transliteration), but with rather surpring effects. For 
instance, the Russian name Воло́шинов is usually written Vološinov in German, 
but Voloshinov in English. Is š a German character?  = FAILED, lang = en_US
Paragraph.cpp (4115): misspelled word: surpring [174..181]
Paragraph.cpp (4115): misspelled word: Vološinov [253..261]
Paragraph.cpp (4115): misspelled word: Voloshinov [278..287]

The ignore button simply works.

 Actually, I tend to convert all hunspell dictionaries to utf8. This seems
 the only proper solution to this problem.
 
 Does this mean that we need to maintain our own versions? If not then it is 
 probably the best solution, if yes then I'd rather not do it.

+1

Stephan

Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Cyrille Artho
I agree that it would be good to have all dictionaries in utf-8, but I'm 
not sure if this is feasible for a typical user/installation.


Another option would be for LyX to tokenize the text and forward it word by 
word to the spell checker.


This way, we could handle Ignore All in LyX itself rather than let the 
spell checker ignore the word. LyX would never forward ignored words to the 
spell checker but all the remaining words would be handled by the spell 
checker.


Jürgen Spitzmüller wrote:

2014-04-10 14:18 GMT+02:00 Jean-Marc Lasgouttes lasgout...@lyx.org
mailto:lasgout...@lyx.org:


The point is that users cannot do something sensible with such marked
words (except for adding them into the personal dictionary).


Sure, but the same holds for Lasgouttes, doesn't it?


No, if the encoding fits, I can hit Ignore all and only ignore you (or
your name's spelling, for that matter) in the current document (which is
what I do for names usually, except for very recurrent names). If the
encoding does not fit, hitting Ignore all just would not work. I think we
would need to at least disable the ignore all button/menu entry in that
case, otherwise users would rightly complain about that bug (they would
also, probably, not understand why the function is disabled for specific
names.).

So, to sum up: I agree with all of you that strings from non-matching
encodings should be marked as unknown, but only if we can provide sensible
action.

Jürgen

BTW German hunspell suggests Ausgelastet for Lasgouttes, which means
fully occupied or snowed with work.


JMarc




--
Regards,
Cyrille Artho - http://artho.com/
The opposite of a correct statement is a false statement. But the
opposite of a profound truth may well be another profound truth.
-- Niels Bohr


Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Cyrille Artho

Regarding the idea I just mentioned before, there is a major flaw.

Asian languages do not have spaces. Tokenizing a text into words requires a 
dictionary and is a non-trivial problem (due to inflection: different verb 
forms need to be recognized, etc.). We can therefore not just scan for 
whitespaces and forward anything in between to a spell checker, unless we 
restrict that workaround to Western languages.


(Unfortunately we use gmail, which filters out my own messages on mailing 
lists, so I can't reply to my own message...)

--
Regards,
Cyrille Artho - http://artho.com/
The opposite of a correct statement is a false statement. But the
opposite of a profound truth may well be another profound truth.
-- Niels Bohr


Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Cyrille Artho
How is the call to iconv implemented? On the application level, the 
interface is probably not flexible enough; it is easy to ignore 
non-convertible characters, but they are just removed from the output.


The C library interface is probably richer. Is it possible to convert text 
word by word, find out which words are not convertible, and ignore the 
dictionary for those words? (The user could still choose to ignore/add them 
to the custom dictionary.)


Maybe this requires a different way of integrating spell checkers?

Jürgen Spitzmüller wrote:

2014-04-10 1:02 GMT+02:00 Cyrille Artho >:

I think we have to use Unicode for all the given operations and (a)
either risk a mismatch for each word that is not learned/ignored, or
(b) up-convert words in the dictionary before they are matched. The
latter solution implies that the dictionary tool supports this; does
anyone know if that is the case (for at least one tool)?


This is the problem here. Hunspell dictionaries are often not
unicode-encoded. So we are stuck with non-unicode encodings.

Jürgen


--
Regards,
Cyrille Artho - http://artho.com/
No problem is so formidable that you can't just walk away from it.
-- C. Schulz


Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Stephan Witt
Am 10.04.2014 um 08:57 schrieb Cyrille Artho :

> How is the call to iconv implemented? On the application level, the interface 
> is probably not flexible enough; it is easy to ignore non-convertible 
> characters, but they are just removed from the output.
> 
> The C library interface is probably richer. Is it possible to convert text 
> word by word, find out which words are not convertible, and ignore the 
> dictionary for those words? (The user could still choose to ignore/add them 
> to the custom dictionary.)

The ignore operation is part of the spell checker API. 

You have to use the dictionary encoding for it, IMO.
At least it is safe to do so. The behavior for not
using the dictionary encoding when adding words at run 
time is not documented.

> Maybe this requires a different way of integrating spell checkers?

Ideally the dictionaries should be converted to UTF-8, IMHO.
But they are not provided by the LyX developers.

Stephan

> Jürgen Spitzmüller wrote:
>> 2014-04-10 1:02 GMT+02:00 Cyrille Artho > >:
>> 
>>I think we have to use Unicode for all the given operations and (a)
>>either risk a mismatch for each word that is not learned/ignored, or
>>(b) up-convert words in the dictionary before they are matched. The
>>latter solution implies that the dictionary tool supports this; does
>>anyone know if that is the case (for at least one tool)?
>> 
>> 
>> This is the problem here. Hunspell dictionaries are often not
>> unicode-encoded. So we are stuck with non-unicode encodings.
>> 
>> Jürgen
> 
> -- 
> Regards,
> Cyrille Artho - http://artho.com/
> No problem is so formidable that you can't just walk away from it.
>   -- C. Schulz



Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread JeanMarc Lasgouttes
There is something that I do not understand. If the word is not representable 
in the German dictionary, presumably it is not part of the language. 
"Lasgoittes" is perfectly representable in any latin encoding and yet the 
spell-checker will mark it as misspelled. Why should it be different for a name 
with weird accents?

JMarc

On 10 avril 2014 08:53:36 UTC+02:00, "Jürgen Spitzmüller"  wrote:
>2014-04-10 1:02 GMT+02:00 Cyrille Artho :
>
>> I think we have to use Unicode for all the given operations and (a)
>either
>> risk a mismatch for each word that is not learned/ignored, or (b)
>> up-convert words in the dictionary before they are matched. The
>latter
>> solution implies that the dictionary tool supports this; does anyone
>know
>> if that is the case (for at least one tool)?
>
>
>This is the problem here. Hunspell dictionaries are often not
>unicode-encoded. So we are stuck with non-unicode encodings.
>
>Jürgen


Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Jürgen Spitzmüller
2014-04-10 10:11 GMT+02:00 JeanMarc Lasgouttes :

> There is something that I do not understand. If the word is not
> representable in the German dictionary, presumably it is not part of the
> language. "Lasgoittes" is perfectly representable in any latin encoding and
> yet the spell-checker will mark it as misspelled. Why should it be
> different for a name with weird accents?
>

The point is that users cannot do something sensible with such marked words
(except for adding them into the personal dictionary).

Actually, I tend to convert all hunspell dictionaries to utf8. This seems
the only proper solution to this problem.

Jürgen


Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Jean-Marc Lasgouttes

10/04/2014 14:14, Jürgen Spitzmüller:

There is something that I do not understand. If the word is not
representable in the German dictionary, presumably it is not part of
the language. "Lasgoittes" is perfectly representable in any latin
encoding and yet the spell-checker will mark it as misspelled. Why
should it be different for a name with weird accents?


The point is that users cannot do something sensible with such marked
words (except for adding them into the personal dictionary).


Sure, but the same holds for "Lasgouttes", doesn't it?

JMarc



Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Jürgen Spitzmüller
2014-04-10 14:18 GMT+02:00 Jean-Marc Lasgouttes :

>
> The point is that users cannot do something sensible with such marked
>> words (except for adding them into the personal dictionary).
>>
>
> Sure, but the same holds for "Lasgouttes", doesn't it?
>

No, if the encoding fits, I can hit "Ignore all" and only ignore you (or
your name's spelling, for that matter) in the current document (which is
what I do for names usually, except for very recurrent names). If the
encoding does not fit, hitting "Ignore all" just would not work. I think we
would need to at least disable the ignore all button/menu entry in that
case, otherwise users would rightly complain about that bug (they would
also, probably, not understand why the function is disabled for specific
names.).

So, to sum up: I agree with all of you that strings from non-matching
encodings should be marked as unknown, but only if we can provide sensible
action.

Jürgen

BTW German hunspell suggests "Ausgelastet" for "Lasgouttes", which means
"fully occupied" or "snowed with work".


>
> JMarc
>
>


Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Jean-Marc Lasgouttes

10/04/2014 14:29, Jürgen Spitzmüller:

No, if the encoding fits, I can hit "Ignore all" and only ignore you (or
your name's spelling, for that matter) in the current document (which is
what I do for names usually, except for very recurrent names). If the
encoding does not fit, hitting "Ignore all" just would not work. I think
we would need to at least disable the ignore all button/menu entry in
that case, otherwise users would rightly complain about that bug (they
would also, probably, not understand why the function is disabled for
specific names.).


I see.



BTW German hunspell suggests "Ausgelastet" for "Lasgouttes", which means
"fully occupied" or "snowed with work".


It is not so bad for a word that really does not look like the original. 
Does Hunspell know me or what?


JMarc



Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Georg Baum
Jürgen Spitzmüller wrote:

> The point is that users cannot do something sensible with such marked
> words (except for adding them into the personal dictionary).

It is probably not difficult to implement sensible behaviour for "ignore" 
and "ignore all" for these words: HunspellChecker has already a member 
variable ignored_ which tracks ignored words, so if words which created an 
encoding error on spell checking would be kept in a different list as well, 
then "ignore" and "ignore all" could simply add the affceted words to the 
ignored list.

> Actually, I tend to convert all hunspell dictionaries to utf8. This seems
> the only proper solution to this problem.

Does this mean that we need to maintain our own versions? If not then it is 
probably the best solution, if yes then I'd rather not do it.


Georg




Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Stephan Witt
Am 10.04.2014 um 20:43 schrieb Georg Baum :

> Jürgen Spitzmüller wrote:
> 
>> The point is that users cannot do something sensible with such marked
>> words (except for adding them into the personal dictionary).
> 
> It is probably not difficult to implement sensible behaviour for "ignore" 
> and "ignore all" for these words: HunspellChecker has already a member 
> variable ignored_ which tracks ignored words, so if words which created an 
> encoding error on spell checking would be kept in a different list as well, 
> then "ignore" and "ignore all" could simply add the affceted words to the 
> ignored list.

Like another personal word list, but not a persistent one.

BTW: it depends on the spellchecker how it works.

This is the debug output of the Apple builtin spell checker: 

AppleSpellChecker.cpp (95): spellCheck: "This is mixing languages with writing 
systems, IMHO. In fact language sometimes has an implication on the spelling of 
names (if it comes to transliteration), but with rather surpring effects. For 
instance, the Russian name Воло́шинов is usually written Vološinov in German, 
but Voloshinov in English. Is "š" a "German" character? " = FAILED, lang = en_US
Paragraph.cpp (4115): misspelled word: "surpring" [174..181]
Paragraph.cpp (4115): misspelled word: "Vološinov" [253..261]
Paragraph.cpp (4115): misspelled word: "Voloshinov" [278..287]

The "ignore" button simply works.

>> Actually, I tend to convert all hunspell dictionaries to utf8. This seems
>> the only proper solution to this problem.
> 
> Does this mean that we need to maintain our own versions? If not then it is 
> probably the best solution, if yes then I'd rather not do it.

+1

Stephan

Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Cyrille Artho
I agree that it would be good to have all dictionaries in utf-8, but I'm 
not sure if this is feasible for a typical user/installation.


Another option would be for LyX to tokenize the text and forward it word by 
word to the spell checker.


This way, we could handle "Ignore All" in LyX itself rather than let the 
spell checker ignore the word. LyX would never forward ignored words to the 
spell checker but all the remaining words would be handled by the spell 
checker.


Jürgen Spitzmüller wrote:

2014-04-10 14:18 GMT+02:00 Jean-Marc Lasgouttes >:


The point is that users cannot do something sensible with such marked
words (except for adding them into the personal dictionary).


Sure, but the same holds for "Lasgouttes", doesn't it?


No, if the encoding fits, I can hit "Ignore all" and only ignore you (or
your name's spelling, for that matter) in the current document (which is
what I do for names usually, except for very recurrent names). If the
encoding does not fit, hitting "Ignore all" just would not work. I think we
would need to at least disable the ignore all button/menu entry in that
case, otherwise users would rightly complain about that bug (they would
also, probably, not understand why the function is disabled for specific
names.).

So, to sum up: I agree with all of you that strings from non-matching
encodings should be marked as unknown, but only if we can provide sensible
action.

Jürgen

BTW German hunspell suggests "Ausgelastet" for "Lasgouttes", which means
"fully occupied" or "snowed with work".


JMarc




--
Regards,
Cyrille Artho - http://artho.com/
The opposite of a correct statement is a false statement. But the
opposite of a profound truth may well be another profound truth.
-- Niels Bohr


Re: [patch for 2.2] silence iconv warnings

2014-04-10 Thread Cyrille Artho

Regarding the idea I just mentioned before, there is a major flaw.

Asian languages do not have spaces. Tokenizing a text into words requires a 
dictionary and is a non-trivial problem (due to inflection: different verb 
forms need to be recognized, etc.). We can therefore not just scan for 
whitespaces and forward anything in between to a spell checker, unless we 
restrict that workaround to Western languages.


(Unfortunately we use gmail, which filters out my own messages on mailing 
lists, so I can't reply to my own message...)

--
Regards,
Cyrille Artho - http://artho.com/
The opposite of a correct statement is a false statement. But the
opposite of a profound truth may well be another profound truth.
-- Niels Bohr


Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Jürgen Spitzmüller
2014-04-08 22:42 GMT+02:00 Georg Baum:

 The change in src/support/unicode.cpp is problematic: It disables all error
 handling, not only the lyxerr output. Also, if you now throw an exception
 there, you need to make sure that all callers can cope with that. Maybe one
 solution would be to throw the exception at the very end (after the error
 handling), and give it exactly the error message which is now written to
 lyxerr. Then each caller can decide what it wants to do with the error
 message.


Thanks. I feared that. I put the exception that early in order to suppress
the lyxerr message. In this case we need to audit all callers. Will
postpone this.



 I would also propose to treat an encoding error as a spelling error: If the
 word can't be encoded in the dictionary of the current language, then it
 can't be correct, since we assume that Hunspell does not choose an encoding
 for the dictionary of a certain lnguage which does not cover all words of
 that language.


But then, with instant spellchecker, the word will be underlined and the
user can not change that.

Jürgen





 Georg





Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Stephan Witt

Am 09.04.2014 um 08:14 schrieb Jürgen Spitzmüller sp...@lyx.org:

 2014-04-08 22:42 GMT+02:00 Georg Baum:
 The change in src/support/unicode.cpp is problematic: It disables all error
 handling, not only the lyxerr output. Also, if you now throw an exception
 there, you need to make sure that all callers can cope with that. Maybe one
 solution would be to throw the exception at the very end (after the error
 handling), and give it exactly the error message which is now written to
 lyxerr. Then each caller can decide what it wants to do with the error
 message.
 
 Thanks. I feared that. I put the exception that early in order to suppress 
 the lyxerr message. In this case we need to audit all callers. Will postpone 
 this.
  
 
 I would also propose to treat an encoding error as a spelling error: If the
 word can't be encoded in the dictionary of the current language, then it
 can't be correct, since we assume that Hunspell does not choose an encoding
 for the dictionary of a certain lnguage which does not cover all words of
 that language.
 
 But then, with instant spellchecker, the word will be underlined and the user 
 can not change that.

Can you provide an example, please? If the word cannot be converted to Hunspell 
dictionary encoding 
the dictionary is broken or the language is not correct, isn't it?

You're right, the user has not many options to get rid of the misspelled marker.
S(he) can change the language of the word or add it to the personal word list 
for the language.
The personal word list uses UTF-8, it should be possible to store it there.

Stephan

Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Jürgen Spitzmüller
2014-04-09 8:40 GMT+02:00 Stephan Witt st.w...@gmx.net:

 Can you provide an example, please? If the word cannot be converted to
 Hunspell dictionary encoding
 the dictionary is broken or the language is not correct, isn't it?


Depends on how you define language. Think of names.


 You're right, the user has not many options to get rid of the misspelled
 marker.
 S(he) can change the language of the word or add it to the personal word
 list for the language.
 The personal word list uses UTF-8, it should be possible to store it there.


Maybe. Didn't test.

Jürgen



 Stephan


Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Jean-Marc Lasgouttes

09/04/2014 08:14, Jürgen Spitzmüller:

2014-04-08 22:42 GMT+02:00 Georg Baum:

The change in src/support/unicode.cpp is problematic: It disables
all error
handling, not only the lyxerr output. Also, if you now throw an
exception
there, you need to make sure that all callers can cope with that.
Maybe one
solution would be to throw the exception at the very end (after the
error
handling), and give it exactly the error message which is now written to
lyxerr. Then each caller can decide what it wants to do with the error
message.


Thanks. I feared that. I put the exception that early in order to
suppress the lyxerr message. In this case we need to audit all callers.
Will postpone this.


I think the lyxerr message could be rewritten to be at least useful. Who 
found a use for the use hex dump anyways?


JMarc



Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Jürgen Spitzmüller
2014-04-09 9:52 GMT+02:00 Jean-Marc Lasgouttes lasgout...@lyx.org:

 I think the lyxerr message could be rewritten to be at least useful. Who
 found a use for the use hex dump anyways?


Agreed.

Jürgen



 JMarc




Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Stephan Witt
Am 09.04.2014 um 08:53 schrieb Jürgen Spitzmüller sp...@lyx.org:

 2014-04-09 8:40 GMT+02:00 Stephan Witt st.w...@gmx.net:
 Can you provide an example, please? If the word cannot be converted to 
 Hunspell dictionary encoding
 the dictionary is broken or the language is not correct, isn't it?
 
 Depends on how you define language. Think of names.

That's a good example. So, my parents are from Hungarian and named me István.
Let's assume the á isn't valid in german iso encoding. Then 
* I can change my name to Stephan - to avoid to spell my name on every formal 
occasion 
* if I don't like that I can add István to my german personal word list (I 
didn't test it either)
* or I can change the language of the word István to hungarian
* or I have to live with the red misspelled marker

It's not me, BTW :) It's only a fake on purpose.

Stephan

 You're right, the user has not many options to get rid of the misspelled 
 marker.
 S(he) can change the language of the word or add it to the personal word list 
 for the language.
 The personal word list uses UTF-8, it should be possible to store it there.
 
 Maybe. Didn't test.



Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Georg Baum
Jürgen Spitzmüller wrote:

 2014-04-09 9:52 GMT+02:00 Jean-Marc Lasgouttes lasgout...@lyx.org:
 
 I think the lyxerr message could be rewritten to be at least useful. Who
 found a use for the use hex dump anyways?

 
 Agreed.

Me too. I think this output is still unchanged from the time when we had 
bugs in our own unicode code.

However, I still think that there is a problem unrelated to the error 
output. I understand that it is annyoing if names are underlined, but on the 
other hand it is also annyoing if a misspelled word is not underlined. 
Unfortunately I have no solution.



Georg



Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Stephan Witt
Am 09.04.2014 um 11:12 schrieb Jürgen Spitzmüller sp...@lyx.org:

 2014-04-09 10:59 GMT+02:00 Stephan Witt st.w...@gmx.net:
 
 That's a good example. So, my parents are from Hungarian and named me István.
 Let's assume the á isn't valid in german iso encoding. Then
 * I can change my name to Stephan - to avoid to spell my name on every formal 
 occasion
 * if I don't like that I can add István to my german personal word list (I 
 didn't test it either)
 * or I can change the language of the word István to hungarian
 * or I have to live with the red misspelled marker
 
 This is mixing languages with writing systems, IMHO. In fact language 
 sometimes has an implication on the spelling of names (if it comes to 
 transliteration), but with rather surpring effects. For instance, the Russian 
 name Воло́шинов is usually written Vološinov in German, but Voloshinov in 
 English. Is š a German character?

I'm not a linguist and my knowledge about these things is limited. 
The change of language is the only possibility I know of to get out
of the broken dictionary encoding scenario.

 Also, I think that marking István as Hungarian absurds the language concept.
 
 More technically, I think it will be irritating for users that they can add 
 István to the personal dictionary, while Ignore and Ignore all just 
 won't work.

Yes, I agree.

With the given example István and having á in the dictionary encoding
the word is most probably mark as misspelled. But then it's possible to
Ignore it? Isn't there the option to discard the characters that cannot
be converted silently or replace them with something similar for the
dictionary lookup? Not quite correct, I know - but perhaps the better
strategy for the user?

Stephan

Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Cyrille Artho
Usually given names are not in a language dictionary, although many 
(translation) services have separate dictionaries for proper/given names.


We have two problems here:

(1) Language: I think most users are OK with proper names not being 
accepted by the spell checker (before learning them). However, other 
options such as Ignore should work, too.


(2) Encoding: Words having characters that are not part of the normal 
character set in a given language, should behave in the same way as words 
that are. This includes István, Vološinov, etc. So we have to use UTF-8 
to look up words.


When down-converting text to the character set of the target language, we 
can ignore non-convertible characters silently, but


echo 'István' | iconv -c -f utf-8 -t ascii

yields Istvn, which is not very useful.

I think we have to use Unicode for all the given operations and (a) either 
risk a mismatch for each word that is not learned/ignored, or (b) 
up-convert words in the dictionary before they are matched. The latter 
solution implies that the dictionary tool supports this; does anyone know 
if that is the case (for at least one tool)?




This is mixing languages with writing systems, IMHO. In fact language
sometimes has an implication on the spelling of names (if it comes to
transliteration), but with rather surpring effects. For instance, the
Russian name Воло́шинов is usually written Vološinov in German, but
Voloshinov in English. Is š a German character?


I'm not a linguist and my knowledge about these things is limited. The
change of language is the only possibility I know of to get out of the
broken dictionary encoding scenario.


Also, I think that marking István as Hungarian absurds the language
concept.

More technically, I think it will be irritating for users that they
can add István to the personal dictionary, while Ignore and
Ignore all just won't work.


Yes, I agree.

With the given example István and having á in the dictionary encoding
the word is most probably mark as misspelled. But then it's possible to
Ignore it? Isn't there the option to discard the characters that cannot
be converted silently or replace them with something similar for the
dictionary lookup? Not quite correct, I know - but perhaps the better
strategy for the user?

Stephan



--
Regards,
Cyrille Artho - http://artho.com/
Perilous to all of us are the devices of an art deeper than we
ourselves possess.
-- Gandalf the Grey [Tolkien, Lord of the Rings]


Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Jürgen Spitzmüller
2014-04-08 22:42 GMT+02:00 Georg Baum:

> The change in src/support/unicode.cpp is problematic: It disables all error
> handling, not only the lyxerr output. Also, if you now throw an exception
> there, you need to make sure that all callers can cope with that. Maybe one
> solution would be to throw the exception at the very end (after the error
> handling), and give it exactly the error message which is now written to
> lyxerr. Then each caller can decide what it wants to do with the error
> message.
>

Thanks. I feared that. I put the exception that early in order to suppress
the lyxerr message. In this case we need to audit all callers. Will
postpone this.


>
> I would also propose to treat an encoding error as a spelling error: If the
> word can't be encoded in the dictionary of the current language, then it
> can't be correct, since we assume that Hunspell does not choose an encoding
> for the dictionary of a certain lnguage which does not cover all words of
> that language.
>

But then, with instant spellchecker, the word will be underlined and the
user can not change that.

Jürgen


>
>
>
> Georg
>
>
>


Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Stephan Witt

Am 09.04.2014 um 08:14 schrieb Jürgen Spitzmüller :

> 2014-04-08 22:42 GMT+02:00 Georg Baum:
> The change in src/support/unicode.cpp is problematic: It disables all error
> handling, not only the lyxerr output. Also, if you now throw an exception
> there, you need to make sure that all callers can cope with that. Maybe one
> solution would be to throw the exception at the very end (after the error
> handling), and give it exactly the error message which is now written to
> lyxerr. Then each caller can decide what it wants to do with the error
> message.
> 
> Thanks. I feared that. I put the exception that early in order to suppress 
> the lyxerr message. In this case we need to audit all callers. Will postpone 
> this.
>  
> 
> I would also propose to treat an encoding error as a spelling error: If the
> word can't be encoded in the dictionary of the current language, then it
> can't be correct, since we assume that Hunspell does not choose an encoding
> for the dictionary of a certain lnguage which does not cover all words of
> that language.
> 
> But then, with instant spellchecker, the word will be underlined and the user 
> can not change that.

Can you provide an example, please? If the word cannot be converted to Hunspell 
dictionary encoding 
the dictionary is broken or the language is not correct, isn't it?

You're right, the user has not many options to get rid of the misspelled marker.
S(he) can change the language of the word or add it to the personal word list 
for the language.
The personal word list uses UTF-8, it should be possible to store it there.

Stephan

Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Jürgen Spitzmüller
2014-04-09 8:40 GMT+02:00 Stephan Witt :

> Can you provide an example, please? If the word cannot be converted to
> Hunspell dictionary encoding
> the dictionary is broken or the language is not correct, isn't it?
>

Depends on how you define "language". Think of names.


> You're right, the user has not many options to get rid of the misspelled
> marker.
> S(he) can change the language of the word or add it to the personal word
> list for the language.
> The personal word list uses UTF-8, it should be possible to store it there.
>

Maybe. Didn't test.

Jürgen


>
> Stephan


Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Jean-Marc Lasgouttes

09/04/2014 08:14, Jürgen Spitzmüller:

2014-04-08 22:42 GMT+02:00 Georg Baum:

The change in src/support/unicode.cpp is problematic: It disables
all error
handling, not only the lyxerr output. Also, if you now throw an
exception
there, you need to make sure that all callers can cope with that.
Maybe one
solution would be to throw the exception at the very end (after the
error
handling), and give it exactly the error message which is now written to
lyxerr. Then each caller can decide what it wants to do with the error
message.


Thanks. I feared that. I put the exception that early in order to
suppress the lyxerr message. In this case we need to audit all callers.
Will postpone this.


I think the lyxerr message could be rewritten to be at least useful. Who 
found a use for the use hex dump anyways?


JMarc



Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Jürgen Spitzmüller
2014-04-09 9:52 GMT+02:00 Jean-Marc Lasgouttes :

> I think the lyxerr message could be rewritten to be at least useful. Who
> found a use for the use hex dump anyways?
>

Agreed.

Jürgen


>
> JMarc
>
>


Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Stephan Witt
Am 09.04.2014 um 08:53 schrieb Jürgen Spitzmüller :

> 2014-04-09 8:40 GMT+02:00 Stephan Witt :
> Can you provide an example, please? If the word cannot be converted to 
> Hunspell dictionary encoding
> the dictionary is broken or the language is not correct, isn't it?
> 
> Depends on how you define "language". Think of names.

That's a good example. So, my parents are from Hungarian and named me István.
Let's assume the á isn't valid in german iso encoding. Then 
* I can change my name to Stephan - to avoid to spell my name on every formal 
occasion 
* if I don't like that I can add István to my "german" personal word list (I 
didn't test it either)
* or I can change the language of the word "István" to hungarian
* or I have to live with the red misspelled marker

It's not me, BTW :) It's only a fake on purpose.

Stephan

> You're right, the user has not many options to get rid of the misspelled 
> marker.
> S(he) can change the language of the word or add it to the personal word list 
> for the language.
> The personal word list uses UTF-8, it should be possible to store it there.
> 
> Maybe. Didn't test.



Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Georg Baum
Jürgen Spitzmüller wrote:

> 2014-04-09 9:52 GMT+02:00 Jean-Marc Lasgouttes :
> 
>> I think the lyxerr message could be rewritten to be at least useful. Who
>> found a use for the use hex dump anyways?
>>
> 
> Agreed.

Me too. I think this output is still unchanged from the time when we had 
bugs in our own unicode code.

However, I still think that there is a problem unrelated to the error 
output. I understand that it is annyoing if names are underlined, but on the 
other hand it is also annyoing if a misspelled word is not underlined. 
Unfortunately I have no solution.



Georg



Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Stephan Witt
Am 09.04.2014 um 11:12 schrieb Jürgen Spitzmüller :

> 2014-04-09 10:59 GMT+02:00 Stephan Witt :
> 
> That's a good example. So, my parents are from Hungarian and named me István.
> Let's assume the á isn't valid in german iso encoding. Then
> * I can change my name to Stephan - to avoid to spell my name on every formal 
> occasion
> * if I don't like that I can add István to my "german" personal word list (I 
> didn't test it either)
> * or I can change the language of the word "István" to hungarian
> * or I have to live with the red misspelled marker
> 
> This is mixing languages with writing systems, IMHO. In fact language 
> sometimes has an implication on the spelling of names (if it comes to 
> transliteration), but with rather surpring effects. For instance, the Russian 
> name Воло́шинов is usually written Vološinov in German, but Voloshinov in 
> English. Is "š" a "German" character?

I'm not a linguist and my knowledge about these things is limited. 
The change of language is the only possibility I know of to get out
of the "broken" dictionary encoding scenario.

> Also, I think that marking István as "Hungarian" absurds the language concept.
> 
> More technically, I think it will be irritating for users that they can add 
> "István" to the personal dictionary, while "Ignore" and "Ignore all" just 
> won't work.

Yes, I agree.

With the given example "István" and having á in the dictionary encoding
the word is most probably mark as misspelled. But then it's possible to
Ignore it? Isn't there the option to discard the characters that cannot
be converted silently or replace them with something similar for the
dictionary lookup? Not quite correct, I know - but perhaps the better
strategy for the user?

Stephan

Re: [patch for 2.2] silence iconv warnings

2014-04-09 Thread Cyrille Artho
Usually given names are not in a language dictionary, although many 
(translation) services have separate dictionaries for proper/given names.


We have two problems here:

(1) Language: I think most users are OK with proper names not being 
accepted by the spell checker (before learning them). However, other 
options such as "Ignore" should work, too.


(2) Encoding: Words having characters that are not part of the normal 
character set in a given language, should behave in the same way as words 
that are. This includes "István", "Vološinov", etc. So we have to use UTF-8 
to look up words.


When down-converting text to the character set of the target language, we 
can ignore non-convertible characters silently, but


echo 'István' | iconv -c -f utf-8 -t ascii

yields "Istvn", which is not very useful.

I think we have to use Unicode for all the given operations and (a) either 
risk a mismatch for each word that is not learned/ignored, or (b) 
up-convert words in the dictionary before they are matched. The latter 
solution implies that the dictionary tool supports this; does anyone know 
if that is the case (for at least one tool)?




This is mixing languages with writing systems, IMHO. In fact language
sometimes has an implication on the spelling of names (if it comes to
transliteration), but with rather surpring effects. For instance, the
Russian name Воло́шинов is usually written Vološinov in German, but
Voloshinov in English. Is "š" a "German" character?


I'm not a linguist and my knowledge about these things is limited. The
change of language is the only possibility I know of to get out of the
"broken" dictionary encoding scenario.


Also, I think that marking István as "Hungarian" absurds the language
concept.

More technically, I think it will be irritating for users that they
can add "István" to the personal dictionary, while "Ignore" and
"Ignore all" just won't work.


Yes, I agree.

With the given example "István" and having á in the dictionary encoding
the word is most probably mark as misspelled. But then it's possible to
Ignore it? Isn't there the option to discard the characters that cannot
be converted silently or replace them with something similar for the
dictionary lookup? Not quite correct, I know - but perhaps the better
strategy for the user?

Stephan



--
Regards,
Cyrille Artho - http://artho.com/
Perilous to all of us are the devices of an art deeper than we
ourselves possess.
-- Gandalf the Grey [Tolkien, "Lord of the Rings"]


Re: [patch for 2.2] silence iconv warnings

2014-04-08 Thread Georg Baum
Jürgen Spitzmüller wrote:

 I am not very familiar with iconv/unicode and exception handling, so I
 would appreciate a critical review.

The change in src/support/unicode.cpp is problematic: It disables all error 
handling, not only the lyxerr output. Also, if you now throw an exception 
there, you need to make sure that all callers can cope with that. Maybe one 
solution would be to throw the exception at the very end (after the error 
handling), and give it exactly the error message which is now written to 
lyxerr. Then each caller can decide what it wants to do with the error 
message.

I would also propose to treat an encoding error as a spelling error: If the 
word can't be encoded in the dictionary of the current language, then it 
can't be correct, since we assume that Hunspell does not choose an encoding 
for the dictionary of a certain lnguage which does not cover all words of 
that language.



Georg




Re: [patch for 2.2] silence iconv warnings

2014-04-08 Thread Georg Baum
Jürgen Spitzmüller wrote:

> I am not very familiar with iconv/unicode and exception handling, so I
> would appreciate a critical review.

The change in src/support/unicode.cpp is problematic: It disables all error 
handling, not only the lyxerr output. Also, if you now throw an exception 
there, you need to make sure that all callers can cope with that. Maybe one 
solution would be to throw the exception at the very end (after the error 
handling), and give it exactly the error message which is now written to 
lyxerr. Then each caller can decide what it wants to do with the error 
message.

I would also propose to treat an encoding error as a spelling error: If the 
word can't be encoded in the dictionary of the current language, then it 
can't be correct, since we assume that Hunspell does not choose an encoding 
for the dictionary of a certain lnguage which does not cover all words of 
that language.



Georg




[patch for 2.2] silence iconv warnings

2014-04-06 Thread Jürgen Spitzmüller
When scrolling through a document while instant-spellchecking is enabled
and Hunspell used, LyX spits out iconv errors if a word appears which is
not in the Hunspell dictionary's encoding. E.g.:

Error returned from iconv
EILSEQ An invalid multibyte sequence has been encountered in the input.
When converting from UCS-4LE to ISO8859-1.
Input: 0xc4 0x3 0x0 0x0 0xcd 0x3 0x0 0x0 0xc0 0x3 0x0 0x0 0xbf 0x3 0x0 0x0
0xc2 0x3 0x0 0x0

Since these messages are not very informative and also rather frightening,
the attached patch attempts to catch the error and output something more
understandable in debug mode.

I am not very familiar with iconv/unicode and exception handling, so I
would appreciate a critical review.

Thanks
Jürgen
diff --git a/src/HunspellChecker.cpp b/src/HunspellChecker.cpp
index ea4b88c..05dbae9 100644
--- a/src/HunspellChecker.cpp
+++ b/src/HunspellChecker.cpp
@@ -21,6 +21,7 @@
 #include support/debug.h
 #include support/docstring_list.h
 #include support/filetools.h
+#include support/unicode.h
 #include support/Package.h
 #include support/FileName.h
 #include support/gettext.h
@@ -47,6 +48,19 @@ typedef mapstd::string, PersonalWordList * 
LangPersonalWordList;
 
 typedef vectorWordLangTuple IgnoreList;
 
+
+string encode_if_possible(docstring const word, string encoding)
+{
+   string result;
+   try {
+   result = to_iconv_encoding(word, encoding);
+   } catch (iconv_convert_failure const /* e */) {
+   LYXERR(Debug::GUI, Word not encodable in the spell checker 
dictionary's encoding (
+   encoding  ):   to_utf8(word));
+   }
+   return result;
+}
+
 } // anon namespace
 
 
@@ -287,7 +301,7 @@ void HunspellChecker::Private::remove(WordLangTuple const  
wl)
if (!h)
return;
string const encoding = h-get_dic_encoding();
-   string const word_to_check = to_iconv_encoding(wl.word(), encoding);
+   string const word_to_check = encode_if_possible(wl.word(), encoding);
h-remove(word_to_check.c_str());
PersonalWordList * pd = personal_[wl.lang()-lang()];
if (!pd)
@@ -302,7 +316,7 @@ void HunspellChecker::Private::insert(WordLangTuple const  
wl)
if (!h)
return;
string const encoding = h-get_dic_encoding();
-   string const word_to_check = to_iconv_encoding(wl.word(), encoding);
+   string const word_to_check = encode_if_possible(wl.word(), encoding);
h-add(word_to_check.c_str());
PersonalWordList * pd = personal_[wl.lang()-lang()];
if (!pd)
@@ -342,7 +356,10 @@ SpellChecker::Result HunspellChecker::check(WordLangTuple 
const  wl)
int info;
 
string const encoding = h-get_dic_encoding();
-   string const word_to_check = to_iconv_encoding(wl.word(), encoding);
+   string const word_to_check = encode_if_possible(wl.word(), encoding);
+
+   if (word_to_check.empty())
+   return WORD_OK;
 
LYXERR(Debug::GUI, spellCheck: \ 
   wl.word()  \, lang =   wl.lang()-lang()) ;
@@ -400,7 +417,7 @@ void HunspellChecker::suggest(WordLangTuple const  wl,
if (!h)
return;
string const encoding = h-get_dic_encoding();
-   string const word_to_check = to_iconv_encoding(wl.word(), encoding);
+   string const word_to_check = encode_if_possible(wl.word(), encoding);
char ** suggestion_list;
int const suggestion_number = h-suggest(suggestion_list, 
word_to_check.c_str());
if (suggestion_number = 0)
@@ -419,7 +436,7 @@ void HunspellChecker::stem(WordLangTuple const  wl,
if (!h)
return;
string const encoding = h-get_dic_encoding();
-   string const word_to_check = to_iconv_encoding(wl.word(), encoding);
+   string const word_to_check = encode_if_possible(wl.word(), encoding);
char ** suggestion_list;
int const suggestion_number = h-stem(suggestion_list, 
word_to_check.c_str());
if (suggestion_number = 0)
diff --git a/src/support/unicode.cpp b/src/support/unicode.cpp
index 343b8de..29e8d0c 100644
--- a/src/support/unicode.cpp
+++ b/src/support/unicode.cpp
@@ -151,6 +151,7 @@ int IconvProcessor::convert(char const * buf, size_t buflen,
return maxoutsize - outbytesleft;
 
// There are some errors in the conversion
+   throw iconv_convert_failure();
lyxerr  Error returned from iconv  endl;
switch (errno) {
case E2BIG:
diff --git a/src/support/unicode.h b/src/support/unicode.h
index ddcd6c8..a702b6e 100644
--- a/src/support/unicode.h
+++ b/src/support/unicode.h
@@ -21,6 +21,16 @@
 
 namespace lyx {
 
+/// Exception thrown by to_convert if iconv returns an error
+class iconv_convert_failure : public std::exception {
+public:
+   virtual ~iconv_convert_failure() throw() {}
+   virtual const char* what() const throw()
+   {
+   return Iconv returned an error;
+   }
+};
+
 

[patch for 2.2] silence iconv warnings

2014-04-06 Thread Jürgen Spitzmüller
When scrolling through a document while instant-spellchecking is enabled
and Hunspell used, LyX spits out iconv errors if a word appears which is
not in the Hunspell dictionary's encoding. E.g.:

Error returned from iconv
EILSEQ An invalid multibyte sequence has been encountered in the input.
When converting from UCS-4LE to ISO8859-1.
Input: 0xc4 0x3 0x0 0x0 0xcd 0x3 0x0 0x0 0xc0 0x3 0x0 0x0 0xbf 0x3 0x0 0x0
0xc2 0x3 0x0 0x0

Since these messages are not very informative and also rather frightening,
the attached patch attempts to catch the error and output something more
understandable in debug mode.

I am not very familiar with iconv/unicode and exception handling, so I
would appreciate a critical review.

Thanks
Jürgen
diff --git a/src/HunspellChecker.cpp b/src/HunspellChecker.cpp
index ea4b88c..05dbae9 100644
--- a/src/HunspellChecker.cpp
+++ b/src/HunspellChecker.cpp
@@ -21,6 +21,7 @@
 #include "support/debug.h"
 #include "support/docstring_list.h"
 #include "support/filetools.h"
+#include "support/unicode.h"
 #include "support/Package.h"
 #include "support/FileName.h"
 #include "support/gettext.h"
@@ -47,6 +48,19 @@ typedef map 
LangPersonalWordList;
 
 typedef vector IgnoreList;
 
+
+string encode_if_possible(docstring const word, string encoding)
+{
+   string result;
+   try {
+   result = to_iconv_encoding(word, encoding);
+   } catch (iconv_convert_failure const /* e */) {
+   LYXERR(Debug::GUI, "Word not encodable in the spell checker 
dictionary's encoding ("
+  << encoding << "): " << to_utf8(word));
+   }
+   return result;
+}
+
 } // anon namespace
 
 
@@ -287,7 +301,7 @@ void HunspellChecker::Private::remove(WordLangTuple const & 
wl)
if (!h)
return;
string const encoding = h->get_dic_encoding();
-   string const word_to_check = to_iconv_encoding(wl.word(), encoding);
+   string const word_to_check = encode_if_possible(wl.word(), encoding);
h->remove(word_to_check.c_str());
PersonalWordList * pd = personal_[wl.lang()->lang()];
if (!pd)
@@ -302,7 +316,7 @@ void HunspellChecker::Private::insert(WordLangTuple const & 
wl)
if (!h)
return;
string const encoding = h->get_dic_encoding();
-   string const word_to_check = to_iconv_encoding(wl.word(), encoding);
+   string const word_to_check = encode_if_possible(wl.word(), encoding);
h->add(word_to_check.c_str());
PersonalWordList * pd = personal_[wl.lang()->lang()];
if (!pd)
@@ -342,7 +356,10 @@ SpellChecker::Result HunspellChecker::check(WordLangTuple 
const & wl)
int info;
 
string const encoding = h->get_dic_encoding();
-   string const word_to_check = to_iconv_encoding(wl.word(), encoding);
+   string const word_to_check = encode_if_possible(wl.word(), encoding);
+
+   if (word_to_check.empty())
+   return WORD_OK;
 
LYXERR(Debug::GUI, "spellCheck: \"" <<
   wl.word() << "\", lang = " << wl.lang()->lang()) ;
@@ -400,7 +417,7 @@ void HunspellChecker::suggest(WordLangTuple const & wl,
if (!h)
return;
string const encoding = h->get_dic_encoding();
-   string const word_to_check = to_iconv_encoding(wl.word(), encoding);
+   string const word_to_check = encode_if_possible(wl.word(), encoding);
char ** suggestion_list;
int const suggestion_number = h->suggest(_list, 
word_to_check.c_str());
if (suggestion_number <= 0)
@@ -419,7 +436,7 @@ void HunspellChecker::stem(WordLangTuple const & wl,
if (!h)
return;
string const encoding = h->get_dic_encoding();
-   string const word_to_check = to_iconv_encoding(wl.word(), encoding);
+   string const word_to_check = encode_if_possible(wl.word(), encoding);
char ** suggestion_list;
int const suggestion_number = h->stem(_list, 
word_to_check.c_str());
if (suggestion_number <= 0)
diff --git a/src/support/unicode.cpp b/src/support/unicode.cpp
index 343b8de..29e8d0c 100644
--- a/src/support/unicode.cpp
+++ b/src/support/unicode.cpp
@@ -151,6 +151,7 @@ int IconvProcessor::convert(char const * buf, size_t buflen,
return maxoutsize - outbytesleft;
 
// There are some errors in the conversion
+   throw iconv_convert_failure();
lyxerr << "Error returned from iconv" << endl;
switch (errno) {
case E2BIG:
diff --git a/src/support/unicode.h b/src/support/unicode.h
index ddcd6c8..a702b6e 100644
--- a/src/support/unicode.h
+++ b/src/support/unicode.h
@@ -21,6 +21,16 @@
 
 namespace lyx {
 
+/// Exception thrown by to_convert if iconv returns an error
+class iconv_convert_failure : public std::exception {
+public:
+   virtual ~iconv_convert_failure() throw() {}
+   virtual const char* what() const throw()
+   {
+   return "Iconv