Public bug reported:

Binary package hint: groff-base

The "man" command displays man pages with cool-looking Unicode quotation
marks, hyphens and more.  This very often leads to incorrect content,
when the page actually tries to explain the meaning of an ASCII symbol.

Few examples, starting with manual page name, continued with line number
(assuming 80 columns) and description:

bash L1428 and many more places:   ‘command‘ (U+2018) instead of
`command` (U+0060) demonstrates command substitution.

bash L274 and many other places use │ (U+2502) instead of the standard
pipe symbol: | .

gawk L605, L608:  \‘ instead of \`, \’ instead of \' as possible escapes
for regexps.  L1348 and others use ’ (U+2019) instead of ' making the
examples wrong.

links L33 talks about the ‐‐enable‐graphic option to ./configure, I'm
pretty sure the configure script wouldn't understand those U+2010
dashes.

There are *lot* more man pages suffering from these kinds of problems.

I haven't checked the specification of man pages' format, I don't know
whether these particular man pages are buggy, or the rendering software.
Oh, by the way, this one is my favorite:

groff L503 yet again uses ‘ (U+2018) instead of the old-fashioned
backtick.  This means that groff itself fails to properly render its own
manual page.  Sigh...


These bugs make these manual pages
- incorrect;
- misleading;
- not suitable for copy-pasting;
- not searchable for these particular special characters;
- even more incorrect if the terminal has limited font displaying capabilities 
(such as the Linux console with a font that completely lacks these Unicode 
symbols).


One of the possible solution would be to fix all these manpages (my guess is 
that there are some hundreds of these).  I don't think this approach is 
feasible.

Another possible solution is to patch groff to be less eager to use Unicode 
stuff.  We've chosen this approach in the distribution I used to be a 
maintainer of, and we've come up with this patch, which you might want to 
consider applying:
https://svn.uhulinux.hu/packages/2.1/groff/patches/02-sane-ascii-characters.patch


Note that there's one more problem with the handling all these UTF-8 stuff:  If 
one of these symbols is bold or underlined, and you redirect the output of 
"man" into a file, then you get some garbage (invalid UTF-8) there instead of 
the simple non-highlighted version.


Don't get me wrong: I'm a great fan of proper typesetting as well as Unicode 
and always try to use the proper quotation marks, proper hyphens and so all.  I 
just think that there are places when this is not so necessary.  Manual pages 
formatted in terminals are usually for slightly more power users, not for those 
who only use some fancy graphical apps.  Here getting the quote marks and 
hyphens typographically incorrect is not such a big issue, it's much more 
important that the characters displayed are actually those the man pages are 
talking about.  UI strings of Gnome, KDE, OpenOffice.org and so on are proper 
places to all these fancy Unicode characters—but I just think they are 
shamelessly not used properly there, I wonder why...  For manual pages they are 
simply not important at all IMHO.


I'm using Hardy 8.04.1, including groff-base 1.18.1.1-16 and man-db 2.5.1-3.

** Affects: groff (Ubuntu)
     Importance: Undecided
         Status: New

-- 
Man pages show wrong Unicode characters instead of ASCII
https://bugs.launchpad.net/bugs/272290
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to