Thanks for your report. This is primarily a bug in the col program (in
the bsdmainutils package), which is used by man to filter some special
characters out of groff output when writing to a file. Unfortunately col
does not deal correctly with UTF-8, and the result is a file containing
invalid UTF-8 which editors will quite reasonably refuse to treat as
UTF-8 and certainly not to automatically detect as UTF-8 (although some
editors provide a way to force the issue). This is another symptom of
the same problem reported in Debian as http://bugs.debian.org/cgi-
bin/bugreport.cgi?bug=319952.

In this case, groff outputs the UTF-8-encoded sequence of Unicode
codepoints U+2010 U+0008 U+2010 to represent an overstruck (i.e. bold)
continuation hyphen. col mangles that into the byte sequence E2 80 E2 80
90, constructed by removing the last byte from the UTF-8 representation
of U+2010 and then appending the full representation of that same
character. Correct behaviour would be for the U+0008 (backspace)
character to backspace over the whole first character, not just part of
it.

I'm leaving a bug task open on man-db at a lower importance because I do
think man-db bears some responsibility for the tools it uses, even if
they're clearly buggy. Given the historical problems with col, I have
been wondering for a while if I shouldn't produce a miniature
implementation of it and embed it into man. Normally duplication is bad,
and it makes me feel uncomfortable, but in this case col's
implementation is pretty stable and unlikely to need to vary
significantly among systems.

** Changed in: bsdmainutils (Ubuntu)
   Importance: Undecided => High
       Status: New => Triaged

** Changed in: man-db (Ubuntu)
   Importance: Undecided => Low
       Status: New => Triaged

** Also affects: bsdmainutils (Debian) via
   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=319952
   Importance: Unknown
       Status: Unknown

-- 
"man man > man.txt" produces invalid characters
https://bugs.launchpad.net/bugs/320842
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to