Re: [groff] groff as the basis for comprehensive documentation?

2018-04-20 Thread John Gardner
First, leave performance expectations at the door. The ambitious experiment
I describe below is intended to provide airtight handling for a conversion
medium which is inherently lossy (Roff -> HTML/SVG/CSS/et al, Markdown, and
Markdown with GitHub-flavoured options).


*1. Handling semantics*
We all know you can't draw semantics from cold, low-level formatting
commands. But for certain contexts - hierarchically sorted documents,
consistently indented code-samples and tables marked as tables, I believe
(okay, *hoping)* it's possible to reconstruct meaning from... well, stuff
that looks like this:

n12000 0 V84000 H72000
x X devtag:.NH 1
x font 36 TB
f36s10950V84000H72000


How? See the x X devtag line? That's what inspired this whole landslide of
absurd ambition. I wondered what we could do if more metadata were provided
that way – as device-specific control strings from, say, a preprocessor.

I intend to have a complementary preprocessor (probably named infer)
perform preliminary scans in the document pipeline to unintrusively tag
regions of particular interest. "Particular interest" here refers mainly to
preprocessors like tbl, eqn and pic which generate output that's mangled
beyond recognition.

It also refers to tracking any macro packages like mdoc(7) which *may* carry
semantic meaning with their command-set. Bear in mind these are really just
hints it's dropping for the post-processor phase: it certainly doesn't
attempt to go any further than recognising unparsed requests and macro
calls. It's not trying to be a genius. It's just annotating context for
more reliable interpretation.

Now, about that...

*2. We're gonna abuse metrics as a cloudy way to predict what the reader is
supposed to see*
We know the widths and heights of each mounted device-font, their
kerning-pairs, ligatures, and lord knows what else. We milk this for all
it's worth: by plotting each glyph's bounding box in a scaled space
representing the output medium, we identify the most obvious constructs
first.

This is actually where it becomes impossible to continue explaining without
illustrations or diagrams, and the whole process I'm envisioning is very
indirect, and influenced by numerous assumptions about output.

Now, this might turn out and be fantastic. Or it might be a flop. One way
or the other, I'm gonna have a hell of a lot of fun seeing how far I can
get and whether it's possible.


Re: [groff] groff as the basis for comprehensive documentation?

2018-04-20 Thread John Gardner
>
> *Hmm, that must be new in CSS (i stopped at CSS2).*


Do you mean attribute selectors?

these[ones-like$="this?"] { }


They've actually enjoyed universal support for quite some time now... =)
They were included in the first revision of the CSS2 specification, IIRC.

*But that has nothing to do with roff, let alone groff, right?*


It does, but not directly. What I intend to write is a
*post-processor. *Something
which will accept the following:

$ troff -Tutf8 /path/to/manpage.1 -Z | webroff

$ troff -T pdf /some/book.me -Z | w3conv


I haven't decided on what to name the actual executable (you see two
different names above). You'll also notice I'm piping output intended for
-Tutf8 and -T pdf into a foreign postprocessor. That's the point where I
divert the stream of intermediate output commands (which I've nicknamed
"DITROFF DATA" for slang, because damn that mouthful).

Unfortunately, this means it won't support mandoc because it lacks a
similar intermedia output language  (I *did* email Ingo some time ago about
a stable AST format I could work with... he probably thought I was nuts
after I said I was interested in writing my own HTML processor

*I referred to for example mdoc(7)'s .Sx command.*


Yes, that will be possible in the sense of a conventional HTML anchor. To
use a frivolous example:

Yeah John, go back to that thing


Hrm, I'm pretty sure I can hear the thoughts of somebody on this list
reading this email...

*"Hah! Good luck building your magical, semantic-detection from
> pixel-drawing commands, kid!"*


Brace yourselves for the gory details of how I'm gonna have a crack at
this...

On 21 April 2018 at 09:04, Steffen Nurpmeso  wrote:

> John Gardner  wrote:
>  |Every instance of the "SHOUTED" headings can be uppercased too, even when
>  |used outside their role as a heading.
>  |
>  |The CSS to achieve this:
>  |
>  |a[href="#name"],
>  |a[href="#description"],
>  |a[href="#authors"] {
>  |
>  |text-transform: uppercase;
>  |}
>  |
>  |Will typecast any link pointing to  in majuscule
> "NAME".
>  |It's all CSS. =)
>
> Hmm, that must be new in CSS (i stopped at CSS2).
> But that has nothing to do with roff, let alone groff, right?
> I referred to for example mdoc(7)'s .Sx command.  And i think even
> Kristap's and Ingo's mandoc C parse tree will not automatically
> perform this adjustment (so that the tag for less(1) that mandoc
> can generate, somehow, is correct), but i have not verified that.
>
> --steffen
> |
> |Der Kragenbaer,The moon bear,
> |der holt sich munter   he cheerfully and one by one
> |einen nach dem anderen runter  wa.ks himself off
> |(By Robert Gernhardt)
>
>
> -- Forwarded message --
> From: John Gardner 
> To: groff 
> Cc:
> Bcc:
> Date: Sat, 21 Apr 2018 08:39:36 +1000
> Subject: Re: [groff] groff as the basis for comprehensive documentation?
> Every instance of the "SHOUTED" headings can be uppercased too, even when
> used outside their role as a heading.
>
> The CSS to achieve this:
>
> a[href="#name"],
> a[href="#description"],
> a[href="#authors"] {
>
> text-transform: uppercase;
> }
>
> Will typecast any link pointing to  in majuscule "NAME".
> It's all CSS. =)
>
> On 21 April 2018 at 08:29, Steffen Nurpmeso  wrote:
>
> > Ralph Corderoy  wrote:
> >  |Ingo wrote:
> >  |> The name of that standard section in man(7) and mdoc(7) is "EXIT
> >  |> STATUS", not "Exit Status" nor "Exit status" nor "exit status".
> >  |
> >  |The shouting section heading makes it easier to find that heading
> rather
> >  |than the same word occurring elsewhere, e.g. `ENVIRONMENT'.
> >
> > Alternatively you have active links and an index and can jump to
> > whatever section or anchor there is.  For print-outs normal case
> > would be much nicer.
> >
> >  |And if the .SH's parameter isn't shouting then perhaps there's a reason
> >  |for it and it should be preserved, even if it just shows up the bug to
> >  |fix.
> >
> > Much nicer.
> >
> > --steffen
> > |
> > |Der Kragenbaer,The moon bear,
> > |der holt sich munter   he cheerfully and one by one
> > |einen nach dem anderen runter  wa.ks himself off
> > |(By Robert Gernhardt)
> >
> >
>
>
>


Re: [groff] groff as the basis for comprehensive documentation?

2018-04-20 Thread Steffen Nurpmeso
John Gardner  wrote:
 |Every instance of the "SHOUTED" headings can be uppercased too, even when
 |used outside their role as a heading.
 |
 |The CSS to achieve this:
 |
 |a[href="#name"],
 |a[href="#description"],
 |a[href="#authors"] {
 |
 |text-transform: uppercase;
 |}
 |
 |Will typecast any link pointing to  in majuscule "NAME".
 |It's all CSS. =)

Hmm, that must be new in CSS (i stopped at CSS2).
But that has nothing to do with roff, let alone groff, right?
I referred to for example mdoc(7)'s .Sx command.  And i think even
Kristap's and Ingo's mandoc C parse tree will not automatically
perform this adjustment (so that the tag for less(1) that mandoc
can generate, somehow, is correct), but i have not verified that.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
--- Begin Message ---
Every instance of the "SHOUTED" headings can be uppercased too, even when
used outside their role as a heading.

The CSS to achieve this:

a[href="#name"],
a[href="#description"],
a[href="#authors"] {

text-transform: uppercase;
}

Will typecast any link pointing to  in majuscule "NAME".
It's all CSS. =)

On 21 April 2018 at 08:29, Steffen Nurpmeso  wrote:

> Ralph Corderoy  wrote:
>  |Ingo wrote:
>  |> The name of that standard section in man(7) and mdoc(7) is "EXIT
>  |> STATUS", not "Exit Status" nor "Exit status" nor "exit status".
>  |
>  |The shouting section heading makes it easier to find that heading rather
>  |than the same word occurring elsewhere, e.g. `ENVIRONMENT'.
>
> Alternatively you have active links and an index and can jump to
> whatever section or anchor there is.  For print-outs normal case
> would be much nicer.
>
>  |And if the .SH's parameter isn't shouting then perhaps there's a reason
>  |for it and it should be preserved, even if it just shows up the bug to
>  |fix.
>
> Much nicer.
>
> --steffen
> |
> |Der Kragenbaer,The moon bear,
> |der holt sich munter   he cheerfully and one by one
> |einen nach dem anderen runter  wa.ks himself off
> |(By Robert Gernhardt)
>
>

--- End Message ---


Re: [groff] groff as the basis for comprehensive documentation?

2018-04-20 Thread John Gardner
Every instance of the "SHOUTED" headings can be uppercased too, even when
used outside their role as a heading.

The CSS to achieve this:

a[href="#name"],
a[href="#description"],
a[href="#authors"] {

text-transform: uppercase;
}

Will typecast any link pointing to  in majuscule "NAME".
It's all CSS. =)

On 21 April 2018 at 08:29, Steffen Nurpmeso  wrote:

> Ralph Corderoy  wrote:
>  |Ingo wrote:
>  |> The name of that standard section in man(7) and mdoc(7) is "EXIT
>  |> STATUS", not "Exit Status" nor "Exit status" nor "exit status".
>  |
>  |The shouting section heading makes it easier to find that heading rather
>  |than the same word occurring elsewhere, e.g. `ENVIRONMENT'.
>
> Alternatively you have active links and an index and can jump to
> whatever section or anchor there is.  For print-outs normal case
> would be much nicer.
>
>  |And if the .SH's parameter isn't shouting then perhaps there's a reason
>  |for it and it should be preserved, even if it just shows up the bug to
>  |fix.
>
> Much nicer.
>
> --steffen
> |
> |Der Kragenbaer,The moon bear,
> |der holt sich munter   he cheerfully and one by one
> |einen nach dem anderen runter  wa.ks himself off
> |(By Robert Gernhardt)
>
>


Re: [groff] groff as the basis for comprehensive documentation?

2018-04-20 Thread Steffen Nurpmeso
Ralph Corderoy  wrote:
 |Ingo wrote:
 |> The name of that standard section in man(7) and mdoc(7) is "EXIT
 |> STATUS", not "Exit Status" nor "Exit status" nor "exit status".
 |
 |The shouting section heading makes it easier to find that heading rather
 |than the same word occurring elsewhere, e.g. `ENVIRONMENT'.

Alternatively you have active links and an index and can jump to
whatever section or anchor there is.  For print-outs normal case
would be much nicer.

 |And if the .SH's parameter isn't shouting then perhaps there's a reason
 |for it and it should be preserved, even if it just shows up the bug to
 |fix.

Much nicer.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: [groff] groff as the basis for comprehensive documentation?

2018-04-20 Thread John Gardner
>
> Begging your pardon ... who's pdfmark macros?
>

Ahaha, my bad. I recall well the credit Deri gave in pdf.tmac:


> *Much of the code in this macro has come from the excellent original work
> by*
> *Keith Marshall (see attribution in the pdfmark.tmac file). I, however,**am
> solely responsible for any bugs I may have introduced into this file*.


I apologise for my dimwittedness nonetheless...

@Ralph The shouting section heading makes it easier to find that heading
> rather
> than the same word occurring elsewhere, e.g. `ENVIRONMENT'.
> And if the .SH's parameter isn't shouting then perhaps there's a reason
> for it and it should be preserved, even if it just shows up the bug to
> fix.


They're still presented as such. CSS is used to stylise the capitalisation
so the heading assumes uppercase. The effect is a purely presentational one
to readers, but one that can have an adverse effect on screen-readers and
other accessibility software which interpret  ALLCAPS  as a list of letters
to read back to the user, one by one...


On 21 April 2018 at 07:43, Keith Marshall 
wrote:

> On 20/04/18 19:19, John Gardner wrote:
> > And as if that weren't enough, the renderer includes first-class
> > support for Deri Jame's pdfmark macros ...
>
> Begging your pardon ... who's pdfmark macros?
>


Re: [groff] groff as the basis for comprehensive documentation?

2018-04-20 Thread Ralph Corderoy
Hi,

Ingo wrote:
> The name of that standard section in man(7) and mdoc(7) is "EXIT
> STATUS", not "Exit Status" nor "Exit status" nor "exit status".

The shouting section heading makes it easier to find that heading rather
than the same word occurring elsewhere, e.g. `ENVIRONMENT'.
And if the .SH's parameter isn't shouting then perhaps there's a reason
for it and it should be preserved, even if it just shows up the bug to
fix.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy



Re: [groff] groff as the basis for comprehensive documentation?

2018-04-20 Thread Keith Marshall
On 20/04/18 19:19, John Gardner wrote:
> And as if that weren't enough, the renderer includes first-class
> support for Deri Jame's pdfmark macros ...

Begging your pardon ... who's pdfmark macros?



Re: [groff] groff as the basis for comprehensive documentation?

2018-04-20 Thread Ingo Schwarze
Hi John,

John Gardner wrote on Sat, Apr 21, 2018 at 06:21:33AM +1000:
> Ingo Schwarze wrote:

>> and blanks in fragment names replaced by underscores rather than
>> hyphens, for example:

> The underscores look really jarring...
> what's the argument against using dashes instead?

   $ man -k Sh,Ss=- | wc -l
43
   $ man -k Sh,Ss=_ | wc -l 
11

Dashes are much more common in normal English text (which section
and subsection headings usually consist of).  If you see a hyphen
there, you expect that it represents a hyphen, right?  Besides,
i regard the underscore as the ASCII printable character most
visually similar to the blank as it draws nothing *inside* the
box, but just at the edge.

But really, it's no big deal, i could have gone with the hyphen
back then, but nobody said they would prefer it when the decision
was made.  I'm merely pointing out there is an opportunity here to
consciously choose to be compatible, or to not be compatible...  :)

>> man://mandoc.1#EXIT_STATUS

> Now, as for the SHOUTY SHOUTY...

That's not a matter of SHOUTING, but of case sensitivity.
The name of that standard section in man(7) and mdoc(7)
is "EXIT STATUS", not "Exit Status" nor "Exit status"
nor "exit status".  Case is preserved, consider:

  https://man.openbsd.org/mandoc.1#EXIT_STATUS
  https://man.openbsd.org/mandoc.1#PAGER
  https://man.openbsd.org/mandoc.1#HTML_Output
  https://man.openbsd.org/mandoc.1#Output_Formats
  https://man.openbsd.org/mandoc.1#Syntax_tree_output
  https://man.openbsd.org/mandoc.1#Errors_related_to_tables
  https://man.openbsd.org/mandoc.1#error
  https://man.openbsd.org/mandoc.1#mdoc
  https://man.openbsd.org/mandoc.1#c
  https://man.openbsd.org/mandoc.1#K
  https://man.openbsd.org/ls.1#a
  https://man.openbsd.org/ls.1#A
  https://man.openbsd.org/ksh.1#pwd
  https://man.openbsd.org/ksh.1#PWD
  https://man.openbsd.org/ksh.1#[[
  https://man.openbsd.org/ksh.1##
  https://man.openbsd.org/ksh.1#-
  https://man.openbsd.org/ksh.1#_

> for HTML output, I'll be using correctly cased section headings,
> with the correct application of `text-transform: uppercase;`
> being applied by CSS.

That's a bad idea.  I admit that many authors use unusual and even
inconsistent casing in section headers (even in the very mandoc.1)-:,
which may sometimes seem awkward.  But in technical documentation,
casing is often deliberate, and automatically changing it based on
natural language rules is prone to make information incorrect in
some cases.

> In fact you can a start I made on a semantic HTML5 output example:
> https://rawgit.com/Alhadis/Stylesheets/master/complete/manpage/example.html#name

Oops, you are rolling your own CSS from scratch.

I see absolutely nothing semantic in there, it looks like a purely
presentational style sheet to me.

Are you aware of this semantic style sheet for manual pages:

  https://man.openbsd.org/mandoc.css
  http://mandoc.bsd.lv/cgi-bin/cvsweb/mandoc.css

It has matured a lot since Kristaps started it in Dec. 2008.  With
HTML code containing the correct attributes, it can be used with
both man(7) and mdoc(7) documents - of course it is much more
powerful with mdoc(7), though.  Man(7) HTML output can never be
very pretty (nor semantically rich) due to the limitations of the
language, but it more or less kind of works all the same:

  https://man.openbsd.org/gcc.1
  https://man.openbsd.org/perl.1
  https://man.openbsd.org/curses.3

Yours,
  Ingo



Re: [groff] groff as the basis for comprehensive documentation?

2018-04-20 Thread John Gardner
>
>
>
> *Unless you have strong reasons for the different syntax, pleaseconsider
> using the syntax established in the new man.cgi(8) a fewyears ago: *


> *  protocol://[manpath/][arch/]name[.sec][#fragment]*


Thank you for bringing this to me. =) Yes I most certainly will use this
syntax (didn't consider the possibility of including $MANPATH in the URI).

*and blanks in fragment names replaced by underscores rather than hyphens,
> for example:*

The underscores look really jarring... what's the argument against using
dashes instead? Slugs like "#camel-kebab-case" tend to be formatted that
way, for example...


man://mandoc.1#EXIT_STATUS


Now,  as for the SHOUTY SHOUTY... for HTML output, I'll be using correctly
cased section headings, with the correct application of `text-transform:
uppercase;` being applied by CSS. In fact you can a start I made on a
semantic HTML5  output example:

*https://rawgit.com/Alhadis/Stylesheets/master/complete/manpage/example.html#name
*

This will be generated by one of two projects I intend to start once this
is finished - postprocessors for purely semantic web technologies that
follow WAI-ARIA accessibility practices and uphold contemporary
web-authoring recommendations.

Note the minimalism in the code I've linked you to... So far, this is what
I've written for its stylesheet

:

On 21 April 2018 at 05:54, Ingo Schwarze  wrote:

> Hi John,
>
> John Gardner wrote on Sat, Apr 21, 2018 at 04:19:06AM +1000:
>
> > My Troff previewer will be doing just that for
> > man://mandoc/1/. =)
> > Will probably add support for subsection-linking with fragment
> > identifiers too:
> > man://mandoc/1/#exit-status
>
> Unless you have strong reasons for the different syntax, please
> consider using the syntax established in the new man.cgi(8) a few
> years ago:
>
>   protocol://[manpath/][arch/]name[.sec][#fragment]
>
> with all components case-sensitive and blanks in fragment names
> replaced by underscores rather than hyphens, for example:
>
>   man://mandoc.1#EXIT_STATUS
>   man://sparc64/lom.4
>
> I'm not saying either syntax is better - as a matter of fact, the
> differences are minimal, but avoiding gratuitious variations may
> benefit the overall ecosystem in the long term.
>
> The [manpath/] component can be used to identify operating systems
> and operating system releases; you may not need it in your context,
> to access local manual pages only.
>
> Note that i didn't invent a new syntax lightly, but there was no
> precedent to follow that i could find.  The old syntax of the
> classical man.cgi was a horrible thing involving
>   ?query=...=...=...
> and so on, so reusing it was not an acceptable option (though
> the new man.cgi still supports it for backward compatibility).
>
> Note that Debian mostly follows that syntax, too:
>
>   https://manpages.debian.org/stretch/mandoc/mandoc.1.en.html#HTML_Output
>
> Except for the .lang.html insertion.
> They are using [manpath/] for "release/package/",
> so that component is somewhat flexible depending on the context.
>
> Yours,
>   Ingo
>


Re: [groff] Release Candidate 1.22.3.rc1

2018-04-20 Thread Dave Kemper
On 2/15/18, Bertrand Garrigues  wrote:
> If you think there are some important fixes that
> must be passed (I haven't reviewed the Savannah bug list for a while,
> I'll check in the next days) please feel free to comment on this list.

I wouldn't consider this an *important* fix, but bug #42191 does have
a patch that solves the problem and seems to introduce no others.  The
patch's author, Werner, is not happy with it, having labelled it a
workaround rather than a fix -- I'm not sure why, and am too ignorant
of -me internals to have an opinion on it.

But the patch results in a net improvement of -me output, so it may as
well be applied, unless Werner strongly disagrees.

http://savannah.gnu.org/bugs/?42191



Re: [groff] groff as the basis for comprehensive documentation?

2018-04-20 Thread Ingo Schwarze
Hi John,

John Gardner wrote on Sat, Apr 21, 2018 at 04:19:06AM +1000:

> My Troff previewer will be doing just that for
> man://mandoc/1/. =)
> Will probably add support for subsection-linking with fragment
> identifiers too:
> man://mandoc/1/#exit-status

Unless you have strong reasons for the different syntax, please
consider using the syntax established in the new man.cgi(8) a few
years ago:

  protocol://[manpath/][arch/]name[.sec][#fragment]

with all components case-sensitive and blanks in fragment names
replaced by underscores rather than hyphens, for example:

  man://mandoc.1#EXIT_STATUS
  man://sparc64/lom.4

I'm not saying either syntax is better - as a matter of fact, the
differences are minimal, but avoiding gratuitious variations may
benefit the overall ecosystem in the long term.

The [manpath/] component can be used to identify operating systems
and operating system releases; you may not need it in your context,
to access local manual pages only.

Note that i didn't invent a new syntax lightly, but there was no
precedent to follow that i could find.  The old syntax of the
classical man.cgi was a horrible thing involving
  ?query=...=...=...
and so on, so reusing it was not an acceptable option (though
the new man.cgi still supports it for backward compatibility).

Note that Debian mostly follows that syntax, too:

  https://manpages.debian.org/stretch/mandoc/mandoc.1.en.html#HTML_Output

Except for the .lang.html insertion.
They are using [manpath/] for "release/package/",
so that component is somewhat flexible depending on the context.

Yours,
  Ingo



Re: [groff] groff as the basis for comprehensive documentation?

2018-04-20 Thread John Gardner
>
>
> *This is the thing I miss most about Konqueror: you could type a URI 
> like**“man:mdoc”
> and it would format and display the page*


There'll be a feature like that in Atom. The editor recently introduced a
feature where extension authors can register an external/custom protocol to
open links in Atom from browsers/emails. My Troff previewer will be doing
just that for man://mandoc/1/. =) Will probably add support for
subsection-linking with fragment identifiers too:
man://mandoc/1/#exit-status

I'll even made manpage refs like groff(1) hotlinks for navigating between
manpages in real-time. And as if that weren't enough, the renderer includes
first-class support for Deri Jame's pdfmark macros, enabling display and
traversal of PDF bookmarks in the viewport itself..

I've been so anxious to finish this and show everybody but I'm blocked on a
retarded issue of panning/zooming transformations that require a
math-empowered brain that's better than mine... :(

On 21 April 2018 at 04:09, Larry Kollar  wrote:

>
> Ingo Schwarze  wrote:
>
> > So yeah, even though proportional font is slowly becoming more
> > widely used, you may be right:  The legacy of Wolfram Schneider's
> > FreeBSD man.cgi is still pretty widespread and even motivated Michael
> > Stapelberg to use a fixed width font for Debian, even though the
> > rendering engine he uses would happily support proportional fonts.
>
> This is the thing I miss most about Konqueror: you could type a URI like
> “man:mdoc” and it would format and display the page. Whatever they were
> using for an algorithm, it worked better to display a manpage as HTML than
> anything else available at the time.
>
> When Apple first announced Safari, and its Webkit origins, I had high hopes
> they would carry that feature over. No such luck.
>
> Larry
>


Re: [groff] groff as the basis for comprehensive documentation?

2018-04-20 Thread Larry Kollar

Ingo Schwarze  wrote:

> So yeah, even though proportional font is slowly becoming more
> widely used, you may be right:  The legacy of Wolfram Schneider's
> FreeBSD man.cgi is still pretty widespread and even motivated Michael
> Stapelberg to use a fixed width font for Debian, even though the
> rendering engine he uses would happily support proportional fonts.

This is the thing I miss most about Konqueror: you could type a URI like
“man:mdoc” and it would format and display the page. Whatever they were
using for an algorithm, it worked better to display a manpage as HTML than
anything else available at the time.

When Apple first announced Safari, and its Webkit origins, I had high hopes
they would carry that feature over. No such luck.

Larry


Re: [groff] groff as the basis for comprehensive documentation?

2018-04-20 Thread Larry Kollar

John Gardner  wrote:

> It's easier than you think.You just have to separate presentational
> semantics from structural and content-related ones.

I’m fond of saying ‘All you have to do is…’ is one of the biggest lies ever 
told. ;-D

> I've seen grohtml's complexity and was bewildered.  Hence why I intend to
> write my own. The procedures for inferring structural or semantic metadata
> from low-level intermediate output commands will be an entertaining
> challenge. =)

Good luck! i’ve actually had some limited success using ps2ascii’s more complex
output modes to derive structure from font/size specifications and 
spacing/location,
converting a PDF file to some kind of markup. Of course, it’s very specific to
individual documents — it’s actually a collection of scripts, one of which 
returns
a list of fonts and sizes used in a document and the number of characters used
for each. i would use that to build a table, specifying whether strings with 
that
format were inline or block, and the kind of markup to wrap them in.

Paragraph detection is, um, fun. Some books use indents, others use vertical
space. And don’t get me started on definition lists or tables…

Larry


Re: [groff] groff as the basis for comprehensive documentation?

2018-04-20 Thread Colin Watson
On Thu, Apr 19, 2018 at 06:48:19PM +0200, Ingo Schwarze wrote:
> Colin Watson wrote on Thu, Apr 19, 2018 at 10:06:28AM +0100:
> > "man ./apropos.1", as Nate pointed out.  man-db's heuristic is that if
> > the page name contains a slash then it's surely a path name instead and
> > should be treated as such; I think that's a reasonable one.
> 
> Thank you for explaining the heuristic and for pointing out the
> missing feature in mandoc.  Given the existence of the -l option,
> having the heuristic is maybe not absolutely required, but i
> agree that it is not unreasonable, and we have seen that the absence
> of the heuristic can confuse casual users who are used to man-db.
> 
> So with the commit below, i added the same heuristic to mandoc.

Thanks.

> By the way, the old version of man-db in jessie exhibits a strange
> behaviour in that case, but probably that has been fixed long ago:
> 
>$ lsb_release -d
>   Description:Debian GNU/Linux 8.10 (jessie)
>$ dpkg-query -l man-db | tail -n 1
>   ii  man-db 2.7.0.2-5i386 on-line manual pager
>$ man --version
>   man 2.7.0.2
>$ man -w man ./man.1
>   man: man-./man.1: No such file or directory
>   man: man_./man.1: No such file or directory
>   /usr/share/man/man1/man.1.gz
>   ./man.1

This is still incorrect in current versions: man(1)'s command-line
parsing is not quite as elegantly well-factored as it ought to be ...  I
don't quite have time to sort it out just now, but I've filed
https://savannah.nongnu.org/bugs/index.php?53708 so that I don't forget
about it.  Thanks for the report!

-- 
Colin Watson   [cjwat...@debian.org]