Re: Markup inside verbatim blocks in POD (was Re: Reasons to not use quote signs directly?)

2020-05-10 Thread Russ Allbery
Guillem Jover  writes:

> While fiddling with this I stumbled over a behavior in Pod::Man that
> gives exactly what I want, but it might be "undefined behavior" that
> I should probably not be relying on? (I'd love to be wrong here :).

> ,---
> =head1 NAME

> verbatim - test verbatim formatted hack

> =head1 EXAMPLE

> Some text here.

> The verbatim formatted hack:
>  Z<>
>  This is a C
>  with B and I,
>  and F or
>  even L references.
> `---

> The key here is the first line in the paragraph starting at column 0,
> while the rest having leading spaces. Pod::Man then outputs these lines
> as is, respecting the spacing, only formatting the text, which makes
> groff add the usual line breaks at those leading space points (this can
> be changed with the .lsm macro). Also po4a also parser this as desired
> and marks this paragraph as 'no-wrap' in the resulting msgid.  The first
> line and the Z<> are a bit of wart, but oh well.

> This of course does not work with other formatters, but then I'm not
> sure I care about those as the purpose in this case is to just create
> man pages.

> Is this something I could rely on? Because that'd be lovely. :D

Oh, huh.  Interesting.

I think you can rely on the preservation of the line breaks and formatting
through Pod::Man.  I philosophically don't believe in changing things like
that when it can be avoided and try to pass through the original file as
much as possible while making markup transformations.

The whitespace there has no semantic meaning in POD, so it's *possible*
that a future version of Pod::Simple, which is doing the underlying
parsing, might throw away the whitespace for some reason.  But it seems
relatively unlikely.

So this is undefined behavior, but I suspect in practice it's relatively
unlikely to break.

-- 
Russ Allbery (r...@debian.org)  



Markup inside verbatim blocks in POD (was Re: Reasons to not use quote signs directly?)

2020-05-10 Thread Guillem Jover
Hi!

Coming back to unearth this now that I'm looking again at converting
the man pages to POD, and most of the hurdles except this one have been
fixed in supporting tools or in the man pages.

On Thu, 2016-10-27 at 16:58:38 -0700, Russ Allbery wrote:
> Guillem Jover  writes:
> > In deb-changelog(5) there is currently this:
> 
> > ,---
> > .nf
> > \fIpackage\fP (\fIversion\fP) \fIdistributions\fP; \fImetadata\fP
> >   [optional blank line(s), stripped]
> >   * \fIchange-details\fP
> > \fImore-change-details\fP
> >   [blank line(s), included in output of 
> > \fBdpkg\-parsechangelog\fP(1)]
> >   * \fIeven-more-change-details\fP
> >   [optional blank line(s), stripped]
> >  \-\- \fImaintainer-name\fP <\fIemail-address\fP>  \fIdate\fP
> > .fi
> > `---
> 
> > which I had to convert by surrounding with «=begin man» and «=end man».
> > If you know of a better way, I'm interested!
> 
> Oh, markup inside verbatim.  Yeah, this is a topic of some discussion in
> the Perl community.  There was occasionally some talk of a =begin verbatim
> block that would act like a verbatim block but markup sequences would be
> allowed, but nothing really came of it.

While fiddling with this I stumbled over a behavior in Pod::Man that
gives exactly what I want, but it might be "undefined behavior" that
I should probably not be relying on? (I'd love to be wrong here :).

,---
=head1 NAME

verbatim - test verbatim formatted hack

=head1 EXAMPLE

Some text here.

The verbatim formatted hack:
 Z<>
 This is a C
 with B and I,
 and F or
 even L references.
`---

The key here is the first line in the paragraph starting at column 0,
while the rest having leading spaces. Pod::Man then outputs these
lines as is, respecting the spacing, only formatting the text, which
makes groff add the usual line breaks at those leading space points
(this can be changed with the .lsm macro). Also po4a also parser this
as desired and marks this paragraph as 'no-wrap' in the resulting msgid.
The first line and the Z<> are a bit of wart, but oh well.

This of course does not work with other formatters, but then I'm not
sure I care about those as the purpose in this case is to just create
man pages.

Is this something I could rely on? Because that'd be lovely. :D

Thanks,
Guillem



Re: Reasons to not use quote signs directly?

2016-12-03 Thread Russ Allbery
Guillem Jover  writes:

> Ah right, indeed it does. And it's explained in that same man page I
> referred. O:) The escape sequence would be something like \[u0021] or
> \[u0041_0300].

Oh!  So, if I can just convert all Unicode characters to their numeric
codes, this becomes very easy to do.  No tables and other machinery
required.

I'm a little worried about the \[u0041_0300] form, though.  Does that mean
that \[u0041]\[u0300] does not work, and Pod::Man has to know whether
characters are combining or not?  I suppose that's possible with the Perl
Unicode support, if necessary.

Are the numbers there the hex digits of a Unicode code point?  The
groff_char man page is maddeningly light on details about this escape
form, mentioning it only in a REFERENCE section.

>> For Pod::Man usage, the output format I'd want would be a hash mapping
>> Unicode code points to the correct groff escape.  Or, in an absolutely
>> ideal world, to have an Encode encoding for groff escapes, similar to how
>> the Encode::MIME::Header encoding works to generate RFC 2047 strings.

> I happened to stumble over an old patch by Brendan O'Dea that might be
> helpful, including a reference here to not lose track of that:

>   
> 

Oh, aha, that's basically the table I was looking for, although that's
very limited compared to all Unicode characters, so it seems easier to
just do a straight conversion to the \[u] form.

>> B<> and I<> could just be surrounding normal words that should use
>> normal hyphens.  L is a link to a section in the same
>> document entitled some-command, so the assumption there is also that it
>> could be a regular English word.

> Oh, at least perlpod(1) says that L links to a Perl manual page,
> so I'd expect it to be equivalent to the L style when
> processing minus chars, and L does the inter-section linking?

Oh, sorry, yes, I was thinking of L.  So the idea is that
L should always use \- for all embedded hyphens?

>> As you say, though, I'm not entirely sure the distinction is worth all
>> the trouble we've put into it over the years.  nroff at least seems to
>> have just given up and maps them all to "-" in the output anyway.  That
>> used to be a Debian-specific change, but it looks like upstream has
>> switched to treating - as \-, I think?  For HTML output, upstream maps
>> \- to  and Debian still overrides that to - instead.  (If
>> upstream thinks \- is a minus sign and not ASCII 45, I'm really
>> confused what's going on with this, though.)

> We should probably ask Colin about this. :)

Yes, please -- Colin, do you have any idea what the current best practice
is here?  I'm trying to figure out what to have Pod::Man do.

-- 
Russ Allbery (r...@debian.org)   



Re: Reasons to not use quote signs directly?

2016-11-29 Thread Guillem Jover
[ Colin CCed for some input on groff vs minus situation.  ]

On Thu, 2016-10-27 at 17:10:59 -0700, Russ Allbery wrote:
> Guillem Jover  writes:
> > For the current conversion in dpkg, I've taken most of the common
> > symbols from groff_char(7) and created a very simple sed script, I'm not
> > sure if you were thinking about something along those lines (although in
> > proper perl)?
> 
> >   
> > 
> 
> Yeah, that would work, although aren't there quite a few more sequences
> than that?  Does groff have a way of representing an arbitrary Unicode
> code point?

Ah right, indeed it does. And it's explained in that same man page I
referred. O:) The escape sequence would be something like \[u0021] or
\[u0041_0300].

> For Pod::Man usage, the output format I'd want would be a hash mapping
> Unicode code points to the correct groff escape.  Or, in an absolutely
> ideal world, to have an Encode encoding for groff escapes, similar to how
> the Encode::MIME::Header encoding works to generate RFC 2047 strings.

I happened to stumble over an old patch by Brendan O'Dea that might be
helpful, including a reference here to not lose track of that:

  


> > If you could specify exactly which symbols you'd like to see supported I
> > might take a stab at this, when I have some spare time. Say everything
> > in groff_char(7) or similar. :)
> 
> As much as possible is of course ideal, but I'm happy to take partial
> work!  :)

Ok! :)

> > The other major issue are commands, which I'm not sure are so easy to
> > detect. Maybe they could get to use the \- minus if they are inside some
> > other markup. I see that C escapes them, as does
> > L, but L does not (any reason?), which
> > could be handy to use I guess. Filenames are also safe with
> > F. The only problem is using the proper markup that
> > also preserves the same output as the current man pages.
> 
> B<> and I<> could just be surrounding normal words that should use normal
> hyphens.  L is a link to a section in the same document
> entitled some-command, so the assumption there is also that it could be a
> regular English word.

Oh, at least perlpod(1) says that L links to a Perl manual page,
so I'd expect it to be equivalent to the L style when
processing minus chars, and L does the inter-section linking?

> As you say, though, I'm not entirely sure the distinction is worth all the
> trouble we've put into it over the years.  nroff at least seems to have
> just given up and maps them all to "-" in the output anyway.  That used to
> be a Debian-specific change, but it looks like upstream has switched to
> treating - as \-, I think?  For HTML output, upstream maps \- to 
> and Debian still overrides that to - instead.  (If upstream thinks \- is a
> minus sign and not ASCII 45, I'm really confused what's going on with
> this, though.)

We should probably ask Colin about this. :)

> > I've always found the AUTHORS, COPYRIGHT or LICENSE sections to be
> > distracting, and in dpkg we got rid of all of them, because in addition
> > they were getting usually out-of-sync with the actual copyright
> > statements, and required adding names and updating years in two places.
> 
> Yeah, that part is irritating.  The alternative, which I use in my
> packages these days, is to have these reflect the authors, copyright, and
> license of the *manual page*, but that's also weird.

Right, that's what dpkg used to have. But even then I've still found this
distracting.

> =for license, resulting in a comment in the generated man page, seems like
> a better general solution (and then it probably makes sense for this to
> always reflect the license of the documentation file itself, not the
> larger package).

Yeah.

Thanks,
Guillem



Re: Reasons to not use quote signs directly?

2016-10-27 Thread Russ Allbery
Guillem Jover  writes:

> Yeah the Xs were really annoying. On the AIX and Mac OS X systems I
> tested on, AFAIR they produced garbage when rendering, but I can recheck
> to be sure. I think I might have also tested on a system that used man
> (w/o Unicode support) instead of man-db, but I'd need to reverify. And I
> think the various BSDs use groff, but it might need checking too.

Oh, okay, so proprietary UNIX is still a problem for just using Unicode
everywhere, but Linux and BSD may be okay.

> Just to clarify (because I think I was a bit vague previously), on
> systems that didn't support Unicode using the groff macros produced no
> output (so no garbage), which is better IMO than the Xs or garbage. :)

Still not great, though.  :(  Sigh.  So there's no silver bullet still.
But I think the scale has tipped at this point to the degree where it's
worth having good output with groff, even if that means one gets bad
output without groff.

> For the current conversion in dpkg, I've taken most of the common
> symbols from groff_char(7) and created a very simple sed script, I'm not
> sure if you were thinking about something along those lines (although in
> proper perl)?

>   
> 

Yeah, that would work, although aren't there quite a few more sequences
than that?  Does groff have a way of representing an arbitrary Unicode
code point?

For Pod::Man usage, the output format I'd want would be a hash mapping
Unicode code points to the correct groff escape.  Or, in an absolutely
ideal world, to have an Encode encoding for groff escapes, similar to how
the Encode::MIME::Header encoding works to generate RFC 2047 strings.

If groff doesn't have a way of encoding arbitrary Unicode code points,
what do you think Pod::Man should do with characters that don't have a
mapping (Chinese characters, for instance)?

> If you could specify exactly which symbols you'd like to see supported I
> might take a stab at this, when I have some spare time. Say everything
> in groff_char(7) or similar. :)

As much as possible is of course ideal, but I'm happy to take partial
work!  :)

> I guess field names might be easy to spot as they have the standard form
> Field-Name(-Other)* which is probably not common for English words?
> This might trip over on other languages such as German for example which
> tends to capitalize many words.

A bit tricky for, say, book titles, too.  :(

> The other major issue are commands, which I'm not sure are so easy to
> detect. Maybe they could get to use the \- minus if they are inside some
> other markup. I see that C escapes them, as does
> L, but L does not (any reason?), which
> could be handy to use I guess. Filenames are also safe with
> F. The only problem is using the proper markup that
> also preserves the same output as the current man pages.

B<> and I<> could just be surrounding normal words that should use normal
hyphens.  L is a link to a section in the same document
entitled some-command, so the assumption there is also that it could be a
regular English word.

As you say, though, I'm not entirely sure the distinction is worth all the
trouble we've put into it over the years.  nroff at least seems to have
just given up and maps them all to "-" in the output anyway.  That used to
be a Debian-specific change, but it looks like upstream has switched to
treating - as \-, I think?  For HTML output, upstream maps \- to 
and Debian still overrides that to - instead.  (If upstream thinks \- is a
minus sign and not ASCII 45, I'm really confused what's going on with
this, though.)

> I've always found the AUTHORS, COPYRIGHT or LICENSE sections to be
> distracting, and in dpkg we got rid of all of them, because in addition
> they were getting usually out-of-sync with the actual copyright
> statements, and required adding names and updating years in two places.

Yeah, that part is irritating.  The alternative, which I use in my
packages these days, is to have these reflect the authors, copyright, and
license of the *manual page*, but that's also weird.

=for license, resulting in a comment in the generated man page, seems like
a better general solution (and then it probably makes sense for this to
always reflect the license of the documentation file itself, not the
larger package).

-- 
Russ Allbery (r...@debian.org)   



Re: Reasons to not use quote signs directly?

2016-10-27 Thread Russ Allbery
Guillem Jover  writes:

> In deb-changelog(5) there is currently this:

> ,---
> .nf
> \fIpackage\fP (\fIversion\fP) \fIdistributions\fP; \fImetadata\fP
>   [optional blank line(s), stripped]
>   * \fIchange-details\fP
> \fImore-change-details\fP
>   [blank line(s), included in output of \fBdpkg\-parsechangelog\fP(1)]
>   * \fIeven-more-change-details\fP
>   [optional blank line(s), stripped]
>  \-\- \fImaintainer-name\fP <\fIemail-address\fP>  \fIdate\fP
> .fi
> `---

> which I had to convert by surrounding with «=begin man» and «=end man».
> If you know of a better way, I'm interested!

Oh, markup inside verbatim.  Yeah, this is a topic of some discussion in
the Perl community.  There was occasionally some talk of a =begin verbatim
block that would act like a verbatim block but markup sequences would be
allowed, but nothing really came of it.

Perl 6 POD solved this problem with their =begin code block that takes as
an argument a list of sequences to allow.  But no one seems to be using
Perl 6 still?  I haven't really looked at their POD stuff at all.  It
started on a weirdly parallel track without much interaction with those of
us who were maintaining all this stuff for Perl 5.

I generally just give up on this and use the normal text markup
conventions of angle brackets and whatnot, although I see why you don't
want to do that here.

-- 
Russ Allbery (r...@debian.org)   



Re: Reasons to not use quote signs directly?

2016-10-19 Thread Guillem Jover
Hi!

On Wed, 2016-10-19 at 12:54:10 -0700, Russ Allbery wrote:
> Guillem Jover  writes:
> > Using raw UTF-8 in the roff source is not portable, and some (most?)
> > implementations might not be happy about that. But using the escape
> > sequences should always be safe(?). (I've just verified at least on AIX
> > and Mac OS X systems.)
> 
> Internationalization of man pages has a bunch of irritating problems that
> come down to picking which non-portable problem you want to have.

Right. :/

> I know that eight-bit characters in *roff source caused serious problems
> (segfaults, etc.) on very old *roff implementations on proprietary UNIXes
> (Solaris 2.4, that sort of thing), which is why I've always avoided using
> that approach with the output of pod2man without a special flag (-u).  But
> I'm not sure it makes sense to still be that cautious, and the default
> output of pod2man is awful (replacing all non-ASCII characters with X,
> which just isn't acceptable any more).

Yeah the Xs were really annoying. On the AIX and Mac OS X systems I
tested on, AFAIR they produced garbage when rendering, but I can recheck
to be sure. I think I might have also tested on a system that used man
(w/o Unicode support) instead of man-db, but I'd need to reverify. And I
think the various BSDs use groff, but it might need checking too.

> Various people have asked for a groff macro output mode, and I think that
> would be a fine idea, except that it requires some effort to build the
> large table of Unicode code point to groff macro mappings.  I'm not sure
> if it makes sense to have that be the default output mode or to have raw
> Unicode be the default output mode (I want to get rid of the current
> default).  It sounds like from your portability investigation that using
> groff macros as the default output mode might work, which is valuable
> information!

Just to clarify (because I think I was a bit vague previously), on
systems that didn't support Unicode using the groff macros produced no
output (so no garbage), which is better IMO than the Xs or garbage. :)

For the current conversion in dpkg, I've taken most of the common
symbols from groff_char(7) and created a very simple sed script, I'm
not sure if you were thinking about something along those lines
(although in proper perl)?

  


> Needless to say, if anyone wanted to put together the mapping table to
> enable that, I would be very interested.  I'll add it to my personal to-do
> list, but that's quite long and the time I have available to work on free
> software at the moment is sadly limited.

If you could specify exactly which symbols you'd like to see supported
I might take a stab at this, when I have some spare time. Say
everything in groff_char(7) or similar. :)

> > But coming back to the source code, yes, I pretty much agree that roff
> > can be very noisy and non-readable, to the point I've actually gotten
> > bothered enough to check for possible alternatives this last month. The
> > problem is finding a format that is clear, expressive enough, supported
> > by po4a, does not require huge Build-Depends and produces portable and
> > nicely formatted man pages. The obvious candidate is perl's POD, because
> > we are already using that for the perl modules and require perl to
> > build.
> 
> > But I've found some quirks and issues that while not unsurmountable,
> > might need to be looked at first and perhaps fixed or workarounds found
> > to avoid "regressions", and I'm not sure which ones Russ would be happy
> > to get bug reports for? :)
> 
> I'm definitely happy to get bug reports!  I do try to slowly work through
> issues like this (for instance, I've now added separate flags to control
> the left and right quote marks, from a bug report you filed quite some
> time ago).  Obviously, patches make things even faster, and I'm slowly
> trying to modernize and improve the coding style of the podlators code,
> although it's a rather long process.

Ok, noted! Then I'll start filing reports upstream.

> > I'm attaching a PoC conversion (can be tested with «pod2man
> > deb-symbols.pod|man -l -», and is available also from [G]) and here's a
> > list of potential differences/issues:
> 
> >   - References are in italic not bold.
> 
> I can change this (a bug report to remind me to do so is very welcome).
> For the record, italics actually used to be the correct convention
> somewhere (I know I didn't make that up), probably Solaris since I took a
> lot of the conventions from there, but I see that man-pages(7) now
> recommends bold.  This is one of those things that was never standardized,
> but at this point I think the Linux man-pages Project is sufficiently
> widespread and authoritative that, as long as it's not in complete
> disagreement with BSD, I'm happy to go with their conventions.
> Particularly over old 

Re: Reasons to not use quote signs directly?

2016-10-19 Thread Guillem Jover
Hi!

On Wed, 2016-10-19 at 12:55:30 -0700, Russ Allbery wrote:
> Guillem Jover  writes:
> > Can use B(N) instead of L. Which might be needed anyway
> > to reduce the amount of fuzzy strings. The same to using I<> instead
> > of the more semantic F<>.
> 
> I'd definitely prefer to make L do the right thing instead, since
> it would be nice to allow, say, a smart HTML converter to do proper links
> between man pages.

Ah perfect then, because that'd be my preference too!

> > Seems to be really needed, mostly to markup verbatim blocks, otherwise
> > the formatting would need to be dropped. :/
> 
> What sort of verbatim formatting problems have you run into?

In deb-changelog(5) there is currently this:

,---
.nf
\fIpackage\fP (\fIversion\fP) \fIdistributions\fP; \fImetadata\fP
  [optional blank line(s), stripped]
  * \fIchange-details\fP
\fImore-change-details\fP
  [blank line(s), included in output of \fBdpkg\-parsechangelog\fP(1)]
  * \fIeven-more-change-details\fP
  [optional blank line(s), stripped]
 \-\- \fImaintainer-name\fP <\fIemail-address\fP>  \fIdate\fP
.fi
`---

which I had to convert by surrounding with «=begin man» and «=end man».
If you know of a better way, I'm interested!

But after an automated mass conversion and review, it seems it might be
the only instance, so it's not too onerous, although it sucks that it
will not be visible when converting to other output formats. :/

Also the strings to be translated go from the nice POD format that po4a
helpfully uses internally, to the raw roff markup, because the source is
POD and not roff. So we go from the previous msgid to the new one:

,---
#. type: verbatim
#: deb-changelog.5.pod
#, fuzzy, no-wrap
#| msgid ""
#| "I (I) I; I\n"
#| "  [optional blank line(s), stripped]\n"
#| "  * I\n"
#| "I\n"
#| "  [blank line(s), included in output of B(1)]
#| "  * I\n"
#| "  [optional blank line(s), stripped]\n"
#| " -- I EIE  I\n"
msgid ""
".nf\n"
"\\fIpackage\\fP (\\fIversion\\fP) \\fIdistributions\\fP; \\fImetadata\\fP\n"
"  [optional blank line(s), stripped]\n"
"  * \\fIchange-details\\fP\n"
"\\fImore-change-details\\fP\n"
"  [blank line(s), included in output of 
\\fBdpkg-parsechangelog\\fP(1)]\n"
"  * \\fIeven-more-change-details\\fP\n"
"  [optional blank line(s), stripped]\n"
" -- \\fImaintainer-name> <\\fIemail-address\\fP>  \\fIdate\\fP\n"
".fi\n"
"\n"
msgstr ""
`---

which is harder to translate, but as long as it's the only instance I
guess translators can survive. :)

Regards,
Guillem



Re: Reasons to not use quote signs directly?

2016-10-19 Thread Russ Allbery
Guillem Jover  writes:

> Using raw UTF-8 in the roff source is not portable, and some (most?)
> implementations might not be happy about that. But using the escape
> sequences should always be safe(?). (I've just verified at least on AIX
> and Mac OS X systems.)

Internationalization of man pages has a bunch of irritating problems that
come down to picking which non-portable problem you want to have.

groff macros are portable to various different levels of maturity around
Unicode handling... but not to *roff implementations other than groff, as
most of the macros used for Unicode characters seem to be groff inventions
not present in traditional UNIX *roff implementations.  Using Unicode
directly in the *roff source is probably more portable these days, since
groff seems to handle it acceptably and I suspect more *roff
implementations handle that than handle groff-specific escapes.  But I no
longer have access to a wide variety of traditional UNIX platforms to
check.

I know that eight-bit characters in *roff source caused serious problems
(segfaults, etc.) on very old *roff implementations on proprietary UNIXes
(Solaris 2.4, that sort of thing), which is why I've always avoided using
that approach with the output of pod2man without a special flag (-u).  But
I'm not sure it makes sense to still be that cautious, and the default
output of pod2man is awful (replacing all non-ASCII characters with X,
which just isn't acceptable any more).

Various people have asked for a groff macro output mode, and I think that
would be a fine idea, except that it requires some effort to build the
large table of Unicode code point to groff macro mappings.  I'm not sure
if it makes sense to have that be the default output mode or to have raw
Unicode be the default output mode (I want to get rid of the current
default).  It sounds like from your portability investigation that using
groff macros as the default output mode might work, which is valuable
information!

Needless to say, if anyone wanted to put together the mapping table to
enable that, I would be very interested.  I'll add it to my personal to-do
list, but that's quite long and the time I have available to work on free
software at the moment is sadly limited.

> But coming back to the source code, yes, I pretty much agree that roff
> can be very noisy and non-readable, to the point I've actually gotten
> bothered enough to check for possible alternatives this last month. The
> problem is finding a format that is clear, expressive enough, supported
> by po4a, does not require huge Build-Depends and produces portable and
> nicely formatted man pages. The obvious candidate is perl's POD, because
> we are already using that for the perl modules and require perl to
> build.

> But I've found some quirks and issues that while not unsurmountable,
> might need to be looked at first and perhaps fixed or workarounds found
> to avoid "regressions", and I'm not sure which ones Russ would be happy
> to get bug reports for? :)

I'm definitely happy to get bug reports!  I do try to slowly work through
issues like this (for instance, I've now added separate flags to control
the left and right quote marks, from a bug report you filed quite some
time ago).  Obviously, patches make things even faster, and I'm slowly
trying to modernize and improve the coding style of the podlators code,
although it's a rather long process.

> I'm attaching a PoC conversion (can be tested with «pod2man
> deb-symbols.pod|man -l -», and is available also from [G]) and here's a
> list of potential differences/issues:

>   - References are in italic not bold.

I can change this (a bug report to remind me to do so is very welcome).
For the record, italics actually used to be the correct convention
somewhere (I know I didn't make that up), probably Solaris since I took a
lot of the conventions from there, but I see that man-pages(7) now
recommends bold.  This is one of those things that was never standardized,
but at this point I think the Linux man-pages Project is sufficiently
widespread and authoritative that, as long as it's not in complete
disagreement with BSD, I'm happy to go with their conventions.
Particularly over old Solaris conventions, since Solaris is now mostly
dead.

>   - Does not map ‘’, “”, and other UTF-8 quotes to roff escape sequences
> (or have to use non-portable --utf8 option).

See above for a rather extended discussion of that.

>   - Needs raw roff for some formatting, as POD is not expressive enough
> (this will have to do with «=begin man» as pod2man cannot change
> the POD syntax anyway).

Yes.  POD is sadly a somewhat limited syntax, and while there was a Perl 6
take on POD that was trying to expand it, I don't think it ever caught on.
These days, everyone seems to have switched to Markdown or reStructured
Text, which certainly have their merits but which don't seem to be good
fits for man page generation.

So, for things like tables, you're 

Re: Reasons to not use quote signs directly?

2016-10-19 Thread Russ Allbery
Guillem Jover  writes:

> Can use B(N) instead of L. Which might be needed anyway
> to reduce the amount of fuzzy strings. The same to using I<> instead
> of the more semantic F<>.

I'd definitely prefer to make L do the right thing instead, since
it would be nice to allow, say, a smart HTML converter to do proper links
between man pages.

> Seems to be really needed, mostly to markup verbatim blocks, otherwise
> the formatting would need to be dropped. :/

What sort of verbatim formatting problems have you run into?

-- 
Russ Allbery (r...@debian.org)   



Re: Reasons to not use quote signs directly?

2016-10-11 Thread Helge Kreutzmann
Hello Guillem,
On Thu, Oct 06, 2016 at 11:24:22PM +0200, Guillem Jover wrote:
> On Sun, 2016-09-25 at 16:46:58 +0200, Helge Kreutzmann wrote:
> > On Sun, Sep 25, 2016 at 04:21:31PM +0200, Guillem Jover wrote:
> > > On Wed, 2016-09-21 at 01:59:10 +0200, Guillem Jover wrote:
> > > > But I've found some quirks and issues that while not unsurmountable,
> > > > might need to be looked at first and perhaps fixed or workarounds found
> > > > to avoid "regressions", and I'm not sure which ones Russ would be happy
> > > > to get bug reports for? :) I'm attaching a PoC conversion (can be tested
> > > > with «pod2man deb-symbols.pod|man -l -», and is available also from [G])
> > > > and here's a list of potential differences/issues:
> > > 
> > > I've been playing with this a bit, converted few more pages and
> > > updated the build infrastructure, and it might be workable after all.
> > > One ideal goal would be to try to get as less fuzzied strings as
> > > possible after a conversion. Here's a list of alternatives/workarounds
> > > for some of the issues/differences:
> 
> Ok given your comments below, and your earlier comments, I think I
> might go for an alternate solution, which I've tentatively implemented
> locally, which would look like this:
> 
>  * Rename all man pages to foo.man (from foo.1 or similar).
>  * Replace the 3rd and 4th arguments to .TH with placeholders for the
>release-date and version, which will get replaced at build time,
>for both English and translations. This should stop adding fuzzies
>on date updates, as you'll just see something like @RELEASE_DATE@.
>  * Convert all roff escape sequences to proper UTF-8 for the English
>and translations (po files); and map all of these back to escape
>sequences at build time. So you'll have more readable input and
>translations, and we'll have more portable generated man pages, as
>they will be usable even on systems w/o proper UTF-8 support!
> 
> I'll take care of unfuzzing anything involved in the above. Hope this
> sounds like a better plan for now? :)

This sounds like a good plan to me.

> > Thanks for your analysis. Given that we are closing in on a release my
> > request is simple: Delay any update which causes (lots of) fuzzy
> > strings just for formatting to the next cycle.
> > 
> > The formatting updates are simply a pain for translators, and at least
> > in the beginning of the cycle you can review them by pices. This late
> > in the cycle you just are frustrated because many pages (which are
> > translated looking at the content) are failing to translate because of
> > formatting.
> 
> Right, I always feel between a rock and a hard place on this, because
> due to translations I'm sometimes reluctant to do some kinds of changes
> because they might seem like just churn, but at the same time I also
> want to cleanup stuff. :/

I perfectly understand. If this does not happen too often then at
least I can cope with it. 

> > Automatic conversion might quite difficult, because each language has a
> > different status, some (e.g. German) are current, some have already
> > done the latest formatting changes, some only the second latest and
> > some are really old. (Obviously, the last ones might be ignorable).
> > And, of course, some might have blindly followed your formatting
> > (which I did and now start to divert), some might have not or only
> > partially …
> 
> I don't think a possible migration to use POD necessarily implies many
> fuzzied strings, but I'll postpone any such thing for the next major
> dpkg series.

Thanks. 

This (at least to me) naturally leads to the next question: Do you
have any pending major updates for the man pages planned? If not, I
could contact the other translators and ask for updates, as the time
for the freeze gets close, especially since the other languages have
quite some strings to cover and a review usually also takes some time.

Greetings

 Helge
-- 
  Dr. Helge Kreutzmann deb...@helgefjell.de
   Dipl.-Phys.   http://www.helgefjell.de/debian.php
64bit GNU powered gpg signed mail preferred
   Help keep free software "libre": http://www.ffii.de/


signature.asc
Description: Digital signature


Re: Reasons to not use quote signs directly?

2016-10-06 Thread Guillem Jover
Hi!

On Sun, 2016-09-25 at 16:46:58 +0200, Helge Kreutzmann wrote:
> On Sun, Sep 25, 2016 at 04:21:31PM +0200, Guillem Jover wrote:
> > On Wed, 2016-09-21 at 01:59:10 +0200, Guillem Jover wrote:
> > > But I've found some quirks and issues that while not unsurmountable,
> > > might need to be looked at first and perhaps fixed or workarounds found
> > > to avoid "regressions", and I'm not sure which ones Russ would be happy
> > > to get bug reports for? :) I'm attaching a PoC conversion (can be tested
> > > with «pod2man deb-symbols.pod|man -l -», and is available also from [G])
> > > and here's a list of potential differences/issues:
> > 
> > I've been playing with this a bit, converted few more pages and
> > updated the build infrastructure, and it might be workable after all.
> > One ideal goal would be to try to get as less fuzzied strings as
> > possible after a conversion. Here's a list of alternatives/workarounds
> > for some of the issues/differences:

Ok given your comments below, and your earlier comments, I think I
might go for an alternate solution, which I've tentatively implemented
locally, which would look like this:

 * Rename all man pages to foo.man (from foo.1 or similar).
 * Replace the 3rd and 4th arguments to .TH with placeholders for the
   release-date and version, which will get replaced at build time,
   for both English and translations. This should stop adding fuzzies
   on date updates, as you'll just see something like @RELEASE_DATE@.
 * Convert all roff escape sequences to proper UTF-8 for the English
   and translations (po files); and map all of these back to escape
   sequences at build time. So you'll have more readable input and
   translations, and we'll have more portable generated man pages, as
   they will be usable even on systems w/o proper UTF-8 support!

I'll take care of unfuzzing anything involved in the above. Hope this
sounds like a better plan for now? :)

> Thanks for your analysis. Given that we are closing in on a release my
> request is simple: Delay any update which causes (lots of) fuzzy
> strings just for formatting to the next cycle.
> 
> The formatting updates are simply a pain for translators, and at least
> in the beginning of the cycle you can review them by pices. This late
> in the cycle you just are frustrated because many pages (which are
> translated looking at the content) are failing to translate because of
> formatting.

Right, I always feel between a rock and a hard place on this, because
due to translations I'm sometimes reluctant to do some kinds of changes
because they might seem like just churn, but at the same time I also
want to cleanup stuff. :/

> Automatic conversion might quite difficult, because each language has a
> different status, some (e.g. German) are current, some have already
> done the latest formatting changes, some only the second latest and
> some are really old. (Obviously, the last ones might be ignorable).
> And, of course, some might have blindly followed your formatting
> (which I did and now start to divert), some might have not or only
> partially …

I don't think a possible migration to use POD necessarily implies many
fuzzied strings, but I'll postpone any such thing for the next major
dpkg series.

Thanks,
Guillem



Re: Reasons to not use quote signs directly?

2016-09-25 Thread Helge Kreutzmann
Hello Guillem,
On Sun, Sep 25, 2016 at 04:21:31PM +0200, Guillem Jover wrote:
> On Wed, 2016-09-21 at 01:59:10 +0200, Guillem Jover wrote:
> > But I've found some quirks and issues that while not unsurmountable,
> > might need to be looked at first and perhaps fixed or workarounds found
> > to avoid "regressions", and I'm not sure which ones Russ would be happy
> > to get bug reports for? :) I'm attaching a PoC conversion (can be tested
> > with «pod2man deb-symbols.pod|man -l -», and is available also from [G])
> > and here's a list of potential differences/issues:
> 
> I've been playing with this a bit, converted few more pages and
> updated the build infrastructure, and it might be workable after all.
> One ideal goal would be to try to get as less fuzzied strings as
> possible after a conversion. Here's a list of alternatives/workarounds
> for some of the issues/differences:

Thanks for your analysis. Given that we are closing in on a release my
request is simple: Delay any update which causes (lots of) fuzzy
strings just for formatting to the next cycle.

The formatting updates are simply a pain for translators, and at least
in the beginning of the cycle you can review them by pices. This late
in the cycle you just are frustrated because many pages (which are
translated looking at the content) are failing to translate because of
formatting.

I won't have much time in the coming months and I'd rather spend it on
updating the German translation (which started this thread) and/or 
helping other translation teams to get as many translated man pages as 
possible. 

Automatic conversion might quite difficult, because each language has a
different status, some (e.g. German) are current, some have already
done the latest formatting changes, some only the second latest and
some are really old. (Obviously, the last ones might be ignorable).
And, of course, some might have blindly followed your formatting
(which I did and now start to divert), some might have not or only
partially …

Greetings

 Helge
-- 
  Dr. Helge Kreutzmann deb...@helgefjell.de
   Dipl.-Phys.   http://www.helgefjell.de/debian.php
64bit GNU powered gpg signed mail preferred
   Help keep free software "libre": http://www.ffii.de/


signature.asc
Description: Digital signature


Re: Reasons to not use quote signs directly?

2016-09-20 Thread Guillem Jover
[ Russ (CCed), please see below for some inquiries about pod2man. ]

Hi!

On Mon, 2016-09-19 at 18:30:49 +0200, Helge Kreutzmann wrote:
> the dpkg man pages were converted during the recent months from direct
> quote signs to groff marcros for the quote signs.

Right, this was done for multiple reasons, at least:

 * To unify and clarify the formatting.
 * To get nice output characters (if available) when rendering
   (‘’, “”, «», etc).
 * To get rid of the ugly `' pairs.

> When we discussed this on debian-l10n-german, we wondered why you use
> the macros like \\(Fo and not simply the unicode character which it
> produces? In the processed output it does not matter, in the source
> code it is much easier to read and translate e.g.
> 
> or what «Fodate -R» generates
> than
> or what \\(Fodate -R\\(Fc generates

Using raw UTF-8 in the roff source is not portable, and some (most?)
implementations might not be happy about that. But using the escape
sequences should always be safe(?). (I've just verified at least on
AIX and Mac OS X systems.)

But coming back to the source code, yes, I pretty much agree that roff
can be very noisy and non-readable, to the point I've actually gotten
bothered enough to check for possible alternatives this last month. The
problem is finding a format that is clear, expressive enough, supported
by po4a, does not require huge Build-Depends and produces portable and
nicely formatted man pages. The obvious candidate is perl's POD, because
we are already using that for the perl modules and require perl to build.

But I've found some quirks and issues that while not unsurmountable,
might need to be looked at first and perhaps fixed or workarounds found
to avoid "regressions", and I'm not sure which ones Russ would be happy
to get bug reports for? :) I'm attaching a PoC conversion (can be tested
with «pod2man deb-symbols.pod|man -l -», and is available also from [G])
and here's a list of potential differences/issues:

  - References are in italic not bold.
  - Does not map ‘’, “”, and other UTF-8 quotes to roff escape sequences
(or have to use non-portable --utf8 option).
  - Needs raw roff for some formatting, as POD is not expressive enough
(this will have to do with «=begin man» as pod2man cannot change
the POD syntax anyway).
  - Many minus signs are output as hyphens (for example for field names).
  - Default for pod2man is no justified text.
  - The license blurb is only present as a comment on the source.

I should probably try converting a more complex man page to see if there
are other issues. But on the plus side, the source is way way more
readable, and as a side-effect it would also fix the problem with
out-dated version and date in man pages. :)

[G] 


> (I know, I changed that myself because for some reason po4a did not
> like the first part which looks like a bug in po4a or some broken
> encoding somewhere).

Hmm, probably using -M UTF-8 in the po4a.cfg would fix this, but as
stated above, that would probably be a bad idea amyway.

> Btw. the German man page project uses (and relies on) UTF-8 for many
> years already.

Right, I don't mind the translated man pages using raw UTF-8 text, as
otherwise we'd need to use escapes also for accented letters which
would be even more cumbersome. :/ As long as the users on “lesser”
systems can use the English man pages I'm happy enough, though.

> As the outcome of this discussion I will update the quotes in the
> German text to the correct ones, either with groff macros or with
> direct input.

For now, and for translated man pages I'd probably just use whatever
UTF-8 text you think is appropriate, but take into account those will
not be usable on systems w/o UTF-8 support, which TBH we can probably
ignore for this purpose.

Thanks,
Guillem
# dpkg manual page - deb-symbols(5)
#
# Copyright © 2007-2012 Raphaël Hertzog 
# Copyright © 2011, 2013-2015 Guillem Jover 
#
# This is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see .

=encoding utf8

=head1 NAME

deb-symbols - Debian's extended shared library information file

=head1 SYNOPSIS

symbols

=head1 DESCRIPTION

The symbol files are shipped in Debian binary packages, and its format
is a subset of the template symbol files used by L
in Debian source packages.

The format for an