Re: ISO-8859-1 character in manpage

2011-03-17 Thread Boudewijn Dijkstra

Op Thu, 17 Mar 2011 00:55:32 +0100 schreef Ingo Schwarze
schwa...@usta.de:

Anthony J. Bentley wrote on Wed, Mar 16, 2011 at 01:37:50PM -0600:

$ mandoc -Tlint /usr/src/share/man/man4/udl.4
/usr/src/share/man/man4/udl.4:42:6:
  ERROR: skipping bad character: ignoring byte


Thanks for reporting!

Indeed, that character had to be replaced, ...


The o with diaeresis should be replaced with the \(:o escape.
(See mandoc_char(7).)


.. however the German noun Koenig does not contain o diaeresis,
but o umlaut.  As this is a common source of confusion even
among native Germans, here is my commit message to explain the
situation:

  Using mandoc_char(7) escapes like K\(:onig for German umlauts
  is incorrect.  The escape sequence \(:o represents o diaeresis,
  not o umlaut.  These are two very different phonological phenomena
  that only happen to be represented by the same diacritic mark.


This implies that it was a silly decision to use the same mark, which is
arguable.


  In -Tascii mode, all renderers correctly render \(:o (o diaeresis)
  as plain o, but that rendering is incorrect for o umlaut, which
  must be transliterated to the digraph oe in -Tascii.


That is not due to incorrect conversion, but due to missing language
information.  Your phrase must be is only true in a pure German-language
context, which is not applicable here.  In Dutch language for example,
when converting to ASCII it would be more correct to remove the diaeresis
for German loanwords without adding an 'e', due to different pronunciation
rules.  Regardless, the Kvnig website http://www.koniggaming.com/ uses
both Kvnig and Konig, but not Koenig.


  There is no mandoc_char(7) escape for o umlaut,


That is no wonder, because Unicode, since version 1.0 has decided not to
distinguish between diaeresis and umlaut.  See the specification for
U+0308 and the Unicode mail list archive.  According to your explanation,
every single German text on the Internet is encoded wrong.  But there is
no alternative[1].  Suck it up and use the diaeresis like everybody[1]
else.






[1] Yes, the exception is ISO 5426, but that is only used for collating
(sorting); it is unlikely that any widely recognized browser will ever
support it.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:
http://www.opera.com/mail/
(Remove the obvious prefix to reply.)



Re: ISO-8859-1 character in manpage

2011-03-16 Thread Ingo Schwarze
Hi Anthony,

Anthony J. Bentley wrote on Wed, Mar 16, 2011 at 01:37:50PM -0600:

 $ mandoc -Tlint /usr/src/share/man/man4/udl.4  
 /usr/src/share/man/man4/udl.4:42:6:
   ERROR: skipping bad character: ignoring byte

Thanks for reporting!

Indeed, that character had to be replaced, ...

 The o with diaeresis should be replaced with the \(:o escape.
 (See mandoc_char(7).)

... however the German noun Koenig does not contain o diaeresis,
but o umlaut.  As this is a common source of confusion even
among native Germans, here is my commit message to explain the
situation:

  Using mandoc_char(7) escapes like K\(:onig for German umlauts
  is incorrect.  The escape sequence \(:o represents o diaeresis,
  not o umlaut.  These are two very different phonological phenomena
  that only happen to be represented by the same diacritic mark.
  In -Tascii mode, all renderers correctly render \(:o (o diaeresis)
  as plain o, but that rendering is incorrect for o umlaut, which
  must be transliterated to the digraph oe in -Tascii.
  There is no mandoc_char(7) escape for o umlaut, so we must give
  the digraph as plain text in the mdoc(7) source code.

  For manuals, ASCII rendering is clearly much more important than
  PostScript or HTML rendering.  Besides, we should not sacrifice
  correct rendering in any mode in order to get slightly nicer rendering
  in some other mode.

 Would send a patch, but I've never had luck sending high-bit files
 to the list.

No sweat, that fix was easy enough without a patch.

Thanks,
  Ingo