[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

2024-01-15 Thread Deri James
Follow-up Comment #27, bug#63074 (group groff):

[comment #25 comment #25:]
> I hear your expression of urgency but I don't think "stringhex" is good
long-term solution to what ails us.  You are correct in comment #22 that I did
not correctly apprehend at first what it was for.  I thought you developed it
because we had no way to reliably transmit arbitrary byte sequences to device
control commands.  But we did, sort of--it just needed to be made consistent
and reliable.  That it wasn't is what my test case attempts to illustrate and
what the fix to bug #64484 attempts to prove.

Yes, I want .device to pass everything exactly "as-is" just as it does now,
all escapes left untouched. I don't understand your opposition to stringhex.
If I remember correctly you didn't like "shouty" capitals in .SH/Sh and people
generally agreed, so you added .stringup/down to enforce the case change -
fair enough. Similarly people on the list have asked for pdf support for
non-latin languages, and to support all the features currently supported for
latin languages I need stringhex or a suitable alternative. So far none of
your suggestions will solve the problem, which means that perhaps I have not
explained it well enough. Let's look at a real example in the file mom-pdf.mom
in the mom/examples directory, it contains:-

.HEADING 2 NAMED internal "Creating internal links"

.PP
Furthermore, \*[cod]NAMED\|\*[codx] stores the text of the
heading for use later on when linking to it (see
.PDF_LINK internal SUFFIX ). +
If headings are being numbered, the heading number is prepended.

Which produces what is shown in the attached png file. However, if this was
written by a greek person (or google translate!) it might look more like:-

.HEADING 2 NAMED εσωτερικό "Δημιουργία εσωτερικών
συνδέσεων"

.PP
Επιπλέον, \*[cod]NAMED\|\*[codx] αποθηκεύει το
κείμενο του επικεφαλίδα
για χρήση αργότερα κατά τη σύνδεση με
αυτήν (βλ
.PDF_LINK εσωτερικό SUFFIX ). +
Εάν οι επικεφαλίδες αριθμούνται,
προηγείται ο αριθμός της επικεφαλίδας.

In either case the string at the end of the HEADING line has to be available
(using the .PDF_LINK key) so that the "+" symbol can be replaced by the given
text. This means that whatever is given as the NAMED  has to be used as
part of the array which holds the heading strings and for text converted to
\[u] form this would not be a legal identifier name, hence the need to
munge it into something acceptable which can be reconstituted.

Your example using \A'' is not helpful since it would reject any non-latin
language being used as a NAMED .

> No, I accept your premise that the main driver behind "stringhex" was this:
>
> > The problem lies in the original pdfmark API, if you look at the
pdfmark.pdf you will see that in the sections describing .pdfhref M and
.pdfhref L which both refer to a "dest-name" and "descriptive text", it says
that if a dest-name is not given the first word in the description is used as
the dest-name.
>
> I appreciate your explanation.  If the problem was with the pdfmark API,
then let's fix the pdfmark API.
>
> In particular, this:
>
> > if a dest-name is not given the first word in the description is used as
the dest-name
>
> ...strikes me a short-sighted, especially without any validation going on. 
A textual description of a hyperlink/bookmark might contain all sorts of crazy
stuff.  (Like Cyrillic or CJK characters or, worse, motion or type-size or
font-selection escape sequences.)

The pdfmark API was around well before I wrote gropdf, so I just had to
support it.

What about embedded calls to macros (i.e. \*[macro arg]) I deal with that as
well. Here's an example from mom-pdf.mom again:-

.AUTHOR "Deri James" "\*[UP .5p]and" "Peter Schaffter"

Mom's AUTHOR macro predates pdf integration so it made sense to just add:-

.pdfinfo /Author \\$*

(or similar) to the macro. Which produces in the grout file:-

x X ps:exec [/Author (Deri James, \v'-.5p'and, Peter Schaffter,) /DOCINFO
pdfmark

(using the pdf.tmac on my branch) and if you look at the finished pdf you will
see a pristine author attribution in the document properties.

>Assuming that it was going to be a well-behaved sequence of ASCII bytes or
even that one could "sanitize" or "cln" one's way through was a hopeless
notion.  That won't be practical until we have a string iterator and more
conditional expressions that enable the user of an iterator to identify the
type of each item in an iterated string/macro/diversion.  But if I understand
you correctly, we don't need that fancy new stuff to solve the present
problem, with stringhex or without.

No such assumption is made, I expect the data to be dirty, I have removed
an*cln, asciify and never used sanitize. Stringhex is to make the NAMED 
acceptable to be used as a string name. I need none of your "fancy new
stuff".

> It would probably benefit me to look up Peter's documentation on _mom_'s
"HEADING" macro.  It is a bit baffling to me that one has to repeat arguments
like 

[bug #59962] soelim(1) man page uses pic diagram--should it?

2024-01-15 Thread G. Branden Robinson
Follow-up Comment #11, bug#59962 (group groff):

Thanks to Dave for covering some of these points.  That may help make my reply
shorter (but don't count on it ).

[comment #8 comment #8:]
> I think we discussed this already, I sill think this is not correct, but it
does not make sense to discuss this further

I would just point out (as I learned from Ingo Schwarze) that mixed-case
section and subsection headings are better for accessibility purposes. 
Apparently, screen readers pronounce sequences of capitals letter-by-letter,
not as words.
 
> #2
> Thanks for the explanation. 
> a) consistency is key (and also for tools processing the man pages), so for
me youngster (I only started Linux/Unix mid 90s) putting references in bold is
all I know, hence I ask to make it consistent when I see a deviation. 
> 
> And at least some tools (external to roff) "know" these rules and hyperlink
(e.g. in HTML) B(x).
> 
> The new macro is a great idea and I agree that it will be picked up properly
very slowly. Is it a GNU extension and which minimum requirement is attached
to it? It would be great if there is a web page explaining the use of this
(and the minimum requirements as well) so I could point people to this. I
expect people to tell me "my software runs on xyz" so I cannot use it or "I
support this old Linux distribution(s)".

As Dave said, [https://github.com/9fans/plan9port _plan9port_] and _groff_
1.23.0, released in July, support it. 
[https://cvsweb.bsd.lv/mandoc/man_html.c?rev=1.187=text/x-cvsweb-markup=date
Likely the next release of _mandoc_ will.]

Here's what _groff_man_(7) says nowadays:


 .MR topic manual‐section [trailing‐text]
(since groff 1.23) Set a man page cross reference as
“topic(manual‐section)”.  If trailing‐text (typically
punctuation) is specified, it follows the closing
parenthesis without intervening space.  Hyphenation is
disabled while the cross reference is set.  topic is set in
the font specified by the MF string.  The cross reference
hyperlinks to a URI of the form
“man:topic(manual‐section)”.

 
> And yes, the probably best thing would be to request all those markup
languages with converters into *roff to use this macro during conversion, this
would probably speed up the conversion quite a bit.

I don't _expect_ any converters to start taking it up until some readers of
man pages see the macro in the wild and wonder why their converters aren't
producing it.  And it is probably easier for the users of such converters to
convince their maintainers to adopt the new macro than for me to approach them
singing the praises of my own invention.  (I use the term "invention" with
some reluctance.  Ingo says that `MR` "copies" _mdoc_(7)'s `Xr`, but as far as
I can tell, both have a common ancestor [or at least an analogue] in the `MS`
macro of DEC Ultrix _man_(7), which _groff_ has documented for decades.)

>  To be honest, I know very little about *roff myself, so if I were to write
a man page, I would use some higher level tool myself, in the distant past I
used SGML.

In my opinion it is not difficult, and I have spent many hours revising
_groff_man_(7) and _groff_man_style_(7) trying to make the subject
approachable.

> b) The "bold is literal and italic is variable" is very helpful to me both
as reader and as a translator.

I'm glad to hear it.  It's a good rule of thumb, but please don't over-apply
this principle.

As _groff_man_style_(7) says:


Use bold for literal portions of syntax synopses, for
command‐line options in running text, and for literals that
are major topics of the subject under discussion; for
example, this page uses bold for macro, string, and register
names.  In an .EX/.EE example of interactive I/O (such as a
shell session), set only user input in bold.
[...]
Use italics for file and path names, for environment
variables, for C data types, for enumeration or preprocessor
constants in C, for variant (user‐replaceable) portions of
syntax synopses, for the first occurrence (only) of a
technical concept being introduced, for names of journals
and of literary works longer than an article, and anywhere a
parameter requiring replacement by the user is encountered.
An exception involves variant text in a context already
typeset in italics, such as file or path names with
replaceable components; in such cases, follow the convention
of mathematical typography: set the file or path name in
italics as usual but use roman for the variant part (see .IR
and .RI below), and italics again in running roman text when
referring to the variant material.
[...]
 Observe what is not prescribed for setting in bold or italics
 above: elements of “synopsis 

line drawing and cargo cults (was: [bug #64285] [troff] \D't')

2024-01-15 Thread G. Branden Robinson
[migrating discussion to groff list, as bug-groff is mainly a reflector
for Savannah traffic, which this isn't]

At 2024-01-14T19:31:43-0500, Steve Izma wrote:
> On Sun, Jan 14, 2024 at 05:23:28PM -0500, G. Branden Robinson wrote:
> > I pretty much do, yeah.  Every crazy feature we keep dragging
> > along with us makes the language harder to acquire, remember,
> > and work with.  Where the size of the impacted user community
> > is small, as it surely is here--I fear the most prolific users
> > of drawing escape sequences outside of macro packages or
> > preprocessors are cargo cultists--it seems an easy choice to
> > make.
> 
> I agree that this should be fixed. But I can't imagine any
> situation where the current behaviour can be considered a
> feature. It's more likely that everyone using to the \Z'' fix is
> doing so as a workaround and any such documents would be
> unaffected by improving the \D't' behaviour.

That appears to be the consensus view.

[rearranging]
> There are scads of situations where one doesn't want to bother with
> macro packages, e.g., one-page posters, flyers, announcements. I've
> probably made hundreds of them with groff. Those are the kinds of
> situations where one draws lots of lines and where this bug becomes a
> nuisance.

Right.  I'd like to serve those users better.

> I don't understand your reference to "cargo cultists".
> The term "cargo cult" is almost always used in a pejorative way.

I'm employing it in this sense:

https://en.wikipedia.org/wiki/Cargo_cult_programming

...and I do mean it pejoratively.

> But I've been to Vanuatu and it's clear to me that the real history of
> cargo cults is a history of anti-colonial movements that engaged in
> communal, ritualized resistance, not so-called superstition.

That's a fair point.  Colonized societies experience wealth extraction--
that is the point of establishing a colony in the imperial sense--and
consequently tend to focus their citizens' attention on material
realities.  When a superstition is costly in terms of physical
resources, it faces strong selection pressures in such an environment.
(If nothing else, colonial administrators will eventually receive word
of such displays, and the shrewder among them will start wondering about
the opportunity costs, and the wisdom of foregoing a redirection of
equivalent value back to the imperial hub in London, DC, Moscow, or
Beijing.)

By contrast, programmers in highly capitalized firms paid to bring to
market the next thing their manager characterizes as a "revolution" work
in a resource-rich environment but are are taught to emphasize
"velocity" over mastery.[1]  Mastery remains valued as a social currency
among the laborers, however, with the predictable game-theoretic
consequences of reducing the overall supply of it to benefit those who
already possess it[2], and substitution of other indicators as
(ultimately unreliable) proxies.  Many of these can be summarized as
"at every opportunity, favor the implication of great expertise over its
demonstration".  Thus, I'd argue that professional software engineers
are far more prone to cargo-cult superstition than any population of
Pacific Islanders.

And I'm envious of your experience in Vanuatu.  I haven't gotten even as
far as Port Moresby!

Regards,
Branden

[1] and promises of future compensation of potentially zero value
(bonuses and stock options) over salary increments

[2] I would term this the "Goldfinger" technique, recalling the film's
stratagem of rendering useless a large stockpile of a valuable
commodity one doesn't own to increase the value of that which one
_does_ possess.  It's unwise to assume that a founding CTO or VP of
Engineering is more brilliant in any meaningful way than someone
hired two years down the road.  The former was simply "there first",
which often means they started out with more capital or had better
connections...or both.  When one purchases a degree from a
prestigious academic institution, the prospect of a network of
people with whom one can exchange scratched backs is the value one
is bidding to acquire, not a superior specimen of the thing we term
"education".

For more on this and "meritocracy" generally, see Chris Hayes,
_Twilight of the Elites_, Penguin, 2012.


signature.asc
Description: PGP signature


[bug #64285] [troff] \D't' (set line thickness) drawing command alters drawing position

2024-01-15 Thread G. Branden Robinson
Follow-up Comment #15, bug#64285 (group groff):

[comment #14 comment #14:]
> we now know that in the six months that 1.23 has been out, people have
complained about various changes debuting in it, but not this one (at least
not where I've seen it, though of course I don't follow every forum where such
complaints might be voiced).

I've been keeping an eye out as well, in many places (doing Web searches,
checking out distributors' change logs and bug trackers, monitoring techie Q
forums, and so forth).

Not a peep about \s.  This isn't a surprise to me, because most usage of the
escape sequence that I have seen isn't ambiguous.  Mostly what I see is man
pages (likely because they constitute a majority of *roff documents in the
world), doing stuff like:


foo\s-2bar\s0baz


...that.  These aren't ambiguous and we didn't change them.

We can revisit the matter in another six months, maybe, to see if we need to
update our observations, but assuming the level of consternation remains low
to zero, then I'd say the \s change was a good example of the sort of
regularizing, simplifying reform we _should_ be undertaking.

Just like \D't' not altering the drawing position!  ;-)

I trust I will not draw contradiction when I venture that explicit/manual use
of that escape sequence in documents is unlikely to be more prevalent than
\s.

If we want to start up another argument along these...lines, we can debate
whether the \D'p' request should automatically close the specified polygon, or
whether that drawing command is better thought of as a "polyline" operator. 
[https://lists.gnu.org/archive/html/groff/2023-08/msg00041.html This thread is
probably the place to resurrect the discussion initially.]


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/