Re: Draft: London and Reiser's UNIX/32V paper, reconstructed

2024-06-11 Thread Ralph Corderoy
I didn't read your reply.  As others have found, you write too much,
drowning their free time, and so they give up participating in the
community.  It takes enough of my time to write a carefully considered
and polite email in the first place.  You normally don't reciprocate.

-- 
Cheers, Ralph.



Re: [TUHS] Draft: London and Reiser's UNIX/32V paper, reconstructed

2024-06-11 Thread Ralph Corderoy
G. Branden Robinson wrote:
> For groff list subscribers, I will add, because people are accustomed
> to me venturing radical suggestions for reforms of macro packages,
> I suggest that we can get rid of groff mm's "MOVE" and "PGFORM"
> extensions.  They're buggy (as the man page has long conceded), and
> I don't think anyone ever mastered them, not even their author.

I have quite a lot of old troff -mm source containing lines like

.PGFORM 21c-2i 29.7c-1.5i 1i 1

and they worked fine for me.

Part of troff's attraction is it has reached an age where it doesn't
have breaking changes.  Perhaps they should be in a fork of groff.
gbroff?  Though I'd have though an entirely new formatter would give
much more freedom for experimentation given modern input and output
formats and greater processing power.

Meanwhile, Werner's earlier groff is still available and other troffs
exist.

-- 
Cheers, Ralph.



Re: bitmaps in groff documents

2023-06-04 Thread Ralph Corderoy
Hi Doug,

> I would like to assure that a 1000x1000 bitmap, say, occupies exactly
> 1000x1000 pixels on the display, regardless of the physical size of a
> pixel.

What's the output device?

There's the .H and .V number registers which give the output device's
resolution in basic units.

But a bigger problem might be the route from output device to eyeball.
If it's PDF to X Windows, say, then many installations now treat a
screen as 96 DPI.  An easy test is using a PDF viewer to show a piece of
‘paper’ of known size at 100% and measuring its width on screen.

http://wok.oblomov.eu/tecnologia/mixed-dpi-x11/ covers the mess.

-- 
Cheers, Ralph.



Re: How to contribute

2023-06-04 Thread Ralph Corderoy
Hi Michał,

> > > > Alas, groff's requirement for copyright assignment to the FSF
> > > > ruled out contributions from me many years ago after the FSF's
> > > > legal counsel confirmed they'd hold partial copyright on a
> > > > non-groff work which contained code of copyrightable expression
> > > > if I later used it in groff; the order in time doesn't matter.
> > > > The FSF want 100% ownership to enforce copyright but deny 100%
> > > > ownership of that earlier work in doing so.
> > > 
> > > I do not get this.  Can you elaborate or provide an example?
> > 
> > Say I write a 25-line function in a program which is all my own
> > work.  A year later, I'm adding a feature to GNU groff and realise
> > that function can be re-used. GNU groff's maintainer rightly
> > requires a copyright assignment before accepting the contribution
> > which includes that function.  If I provide the assignment, I am
> > granting the FSF partial copyright over the program written a year
> > earlier which is all my own work because of that sole re-used
> > function.

I note you didn't question that the order in time doesn't matter.

> > Amongst other problems, signing copyright assignments can cause a
> > lot of hassle when lawyers do due diligence. What did you sign away?
> > What did you think it covered? What did it actually cover? Are you
> > the sole owner of what you're selling to our client?
>
> I do not believe it works this way.  It would be extremely stupid.

It is surprising to programmers how the law works and thus how lawyers
think.

> I can image this applies when you copy large amount of code, but
> single 25 lines function?

It applies if ‘foo() [is] a copyrightable work with copyrightable
expression’, to use the FSF's legal counsel's words.  Which I think is
the same judgement GNU groff's maintainer uses when deciding if a
copyright assignment is required for a contribution.

A bug fix of a few lines doesn't count.  But here, the counsel and I were
talking about a single function, foo(), which was not ‘so short’ or
simply ‘functional’ that expression could be judged not to exist.

> No one would contribute to GNU projects if it worked like this.

I read the copyright assignment carefully.  I had questions which I put
to the FSF's Assignments department.  They couldn't answer and passed me
onto their legal counsel.  The counsel were very helpful.

I expect most of those who complete the copyright assignment do not read
it carefully, do not then think up what-if's, and do not then make the
effort to get them answered.

Many who do sign it will not have any problems afterwards.  They will
not have their own work checked by lawyers.  They will not be violating
an agreement with their employer or future employer.  Or they won't
realise they are and it won't come to light.  But that doesn't mean the
FSF's interpretation of their copyright assignment doesn't exist.

-- 
Cheers, Ralph.



Re: How to contribute

2023-06-03 Thread Ralph Corderoy
Hi Michał,

> Alas, groff's requirement for copyright assignment to the FSF ruled
> out contributions from me many years ago after the FSF's legal
> counsel confirmed they'd hold partial copyright on a non-groff work
> which contained code of copyrightable expression if I later used it
> in groff; the order in time doesn't matter.  The FSF want 100%
> ownership to enforce copyright but deny 100% ownership of that
> earlier work in doing so.
>
> I do not get this.
> Can you elaborate or provide an example?

Say I write a 25-line function in a program which is all my own work.
A year later, I'm adding a feature to GNU groff and realise that function
can be re-used.  GNU groff's maintainer rightly requires a copyright
assignment before accepting the contribution which includes that function.
If I provide the assignment, I am granting the FSF partial copyright over
the program written a year earlier which is all my own work because of
that sole re-used function.

Amongst other problems, signing copyright assignments can cause a lot of
hassle when lawyers do due diligence.  What did you sign away?  What did
you think it covered?  What did it actually cover?  Are you the sole
owner of what you're selling to our client?

-- 
Cheers, Ralph.



Re: How to contribute

2023-06-03 Thread Ralph Corderoy
Hi Michał,

> Where do I need to register?

For any significant contribution in copyright terms, you'll have to sign
an assignment of copyright of your work to the FSF and it's GNU groff.

https://www.fsf.org/licensing/contributor-faq

That put me off.

Alas, groff's requirement for copyright assignment to the FSF ruled
out contributions from me many years ago after the FSF's legal
counsel confirmed they'd hold partial copyright on a non-groff work
which contained code of copyrightable expression if I later used it
in groff; the order in time doesn't matter.  The FSF want 100%
ownership to enforce copyright but deny 100% ownership of that
earlier work in doing so.

— https://www.mail-archive.com/groff@gnu.org/msg15539.html

-- 
Cheers, Ralph.



Re: Perl and linguistics. (Was: neatroff for Russian.)

2023-04-30 Thread Ralph Corderoy
Hi Oliver,

BTW, are you subscribed to groff@gnu.org?  If so, I'll stop mailing you
directly too.

> and from there directly jumped to Perl the moment I familiarized
> myself with X11 workstations at our university, due to its wonderfully
> elliptical style (I am a linguist by training, and many of the Perl
> language constructs just got alive in my brain the very instant I used
> them for the first time).

I assume you know that Perl's creator, Larry Wall, studied linguistics
at the University of California, Berkeley?  That shows up in Perl's
design and what he has written about Perl over the years, e.g. on
Usenet.  https://en.wikipedia.org/wiki/Larry_Wall

> Later I started learning Prolog, but never made it to Py (and anything
> that follows).

If you ever feel the need for a modern compiled language for that bit
more speed, consider Go.  It has a simple clean syntax and two-thirds of
its designers are Bell Labs alumni.  Arguably three-quarters given Russ
Cox joining early on to shape the standard library.  https://go.dev

-- 
Cheers, Ralph.



Re: Count number of spaces at beginning of line?

2023-04-30 Thread Ralph Corderoy
Hi Oliver,

Dave wrote:
> > Is there any possibility to count leading spaces in groff?
>
> See the documentation for the \n[lsn] register.

Which can be seen with

info groff 'Leading Spaces Traps' | cat

I'd been playing with an input trap, ‘.it’, so you might be interested
to learn about those too, but Dave's suggestion is more relevant for
what you asked.

-- 
Cheers, Ralph.



Re: Count number of spaces at beginning of line?

2023-04-29 Thread Ralph Corderoy
Hi Oliver,

> .\" Start of example text
>
> Learning Language X \"should return 0
>
> [] Prerequisites \" should return 1
>
> [][] Tools \" should return 2
...
> \n[Text_of_my_line] would contain the level of indentation, like
> \n[Tools] and \n[Documentation] whould both return 2 which could be
> used to set the section numbering via .NH x.

Unlike TeX, where everything is written in TeX, troff favours using a
preprocessor which produces troff, e.g. pic(1) and tbl(1).  These can be
quite simple, say an awk script which processes what it recognises and
passes through the rest.  chem and dformat are both awk scripts:
https://troff.org/prog.html#chem

Whilst it's interesting to wonder how it can be done in troff, longer
term you'll probably write preprocessors for this kind of thing.  :-)
It's common to create ‘little languages’ particular to you using Unix
programs.

-- 
Cheers, Ralph.



Re: [tbl] Setting the widths of the columns

2023-04-29 Thread Ralph Corderoy
Hi Branden,

> Per tbl(1) from groff 1.23.0:
>
>   Column modifiers

> Any number of modifiers can follow a column classifier.
> Arguments to modifiers, where accepted, are case‐sensitive.
> If the same modifier is applied to a column specifier more than once,
> or if conflicting modifiers are applied,
> only the last occurrence has effect.
> The modifier x is mutually exclusive with e and w,
> but e is not mutually exclusive with w;
> if these are used in combination, x unsets both e and w,
> while either e or w overrides x.

The above is unclear to me.

It starts with a clear rule: last one wins a conflict.

If the same modifier is applied to a column specifier more than once,
or if conflicting modifiers are applied,
only the last occurrence has effect.  

Does ‘mutually exclusive’ below mean it is an error to give xew?
Are xe and xw also an error?  Or does ‘mutually exclusive’ mean
conflicts as used above?  If so, it should stick to the same term
for clarity.

The modifier x is mutually exclusive with e and w,

This next bit is only needed to clear up the ‘e and w’.

but e is not mutually exclusive with w;

And then this is repeating the ‘last one wins a conflict’ rule AFAICS.
But it shouldn't be needed.

if these are used in combination,
x unsets both e and w, while either e or w overrides x.

Or is it trying to tell me that xe has x win as an exception to the
earlier rule?  As someone trying to learn, there are multiple
interpretations.

I think it gets simpler if it begins with

The x modifier conflicts with e and with w.

and as much as can be inferred, stripped.

-- 
Cheers, Ralph.



Re: bc and dc.

2023-04-28 Thread Ralph Corderoy
Hi Alejandro,

> I could only see this:
>
> $ echo 'l(1114112) / l(2)' | bc -lc
> @iK1114112:C2,0:K2:C2,0:/W@r
> @i

That's GNU bc.  Its -c dumps its internal byte code rather than dc code
because it nevers runs dc.

> $ echo 'l(1114112) / l(2)' | /usr/lib/plan9/bin/bc -c
>  1114112 l<12>x 2 l<12>x/ps.
> q

That's correct output.  The <12> is ASCII FF and is the name of the l()
function as defined by the dc code in bc's -l maths library.

> $ echo 'l(1114112) / l(2)' | /usr/lib/plan9/bin/bc -lc
> c[cannot open input file:1, ]pc
>  1114112 l<12>x 2 l<12>x/ps.
> q

There's your problem.  When your plan9/bin/bc is given -l, it produces
dc to print ‘cannot open input file:1, ’.  Looks like the installation
is incomplete or the file is present but in the wrong place.

At a guess, have you a /usr/lib/plan9/lib/bclib?

-- 
Cheers, Ralph.



Re: bc and dc. (Was: neatroff for Russian.)

2023-04-28 Thread Ralph Corderoy
Hi Alejandro,

None of the below may apply to GNU's bc and dc.  I prefer Unix.

> bc(1) on the contrary, is likely to be using 'long double', for being
> able to provide so many digits.

No, bc doesn't use a C language or machine type.  The precision can
be set.

$ bc -l
scale=42
l(1114112) / l(2)
20.087462841250339408254066010810404354011270
$

bc's l() function is written in bc rather than a built-in and can be
read for fun.  bc uses dc(1) to do the work and can be asked to ‘compile
only’ with -c.  dc has k to set the precision; bc's scale simply uses k.

dc uses a byte to store each pair of decimal digits.  This allows
overflow within the byte during calculations and makes it quick to
perform the common case of formatting the many-byte number to
decimal-digit text.

-- 
Cheers, Ralph.



Re: neatroff for Russian.

2023-04-27 Thread Ralph Corderoy
Hi Oliver,

> I am not familiar with modern incarnations of C/C++.  Is there really
> no char data type that is Unicode-compliant?

IIRC, neatroff gets by quite happy using C's char and UTF-8 encoding.

-- 
Cheers, Ralph.



neatroff for Russian. (Was: Questions concerning hyphenation patterns for non-Latin languages, e.g. Russian)

2023-04-26 Thread Ralph Corderoy
Hi Oliver,

Are you aware there are other troff implementations than GNU's groff?
Neatroff is one.  Ali Gholami Rudi wrote it because he wanted better
Unicode support for foreign languages, including right-to-left text.
He seems very much of your mould in needs.

A good summary of its features is http://litcave.rudi.ir/neatroff.pdf
I see UTF-8 hyphenation files mentioned.
There's also whole-paragraph formatting and lots of other delights.
Rudi's http://litcave.rudi.ir has a Typesetting section past the initial
list of recent changes to his software.

Feel free to continue discussing neatroff here along with general troff
questions.

-- 
Cheers, Ralph.



Re: Comprehension problem with macros

2023-04-24 Thread Ralph Corderoy
Hi Branden,

> > yes, I remember having heard of the two different modes
>
> "Copy mode" and (not copy mode), which didn't have a name in CSTR #54.
> (Terser is better.  :-| )

No name is needed.  It would be clutter to add it.  troff is either in
copy mode or is not in copy mode.  There is no need for a not-copy mode
term.

There are quite a few modes in troff, e.g. ligature.  There isn't a
special term for not being in ligature mode.  Creating a mode ‘bar’ to
indicate the mode isn't ‘foo’ increases what needs to be learnt and
remembered from one term to two terms and the relationship between them.

-- 
Cheers, Ralph.



Re: user-defined characters, translation maps, and environment binding

2023-04-24 Thread Ralph Corderoy
Hi Branden,

> > .char £ pound sterling
> > .char $ United States dollar
> > .
> > The £ and $ are almost at par.
> > .
> > .tr aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
> > £ crashes overnight!
> > .
> > .pl \n(nlu
> > ^D
> > The pound sterling and United States dollar are almost at par.
> > POUND STERLING CRASHES OVERNIGHT!
...
> > I'd want to see shouty caps.
>
> I think this an excellent example of user-defined character abuse.

It's a long-standing idiom.  Not abuse.

> There's no reason not to use strings here.
...
> .  de UP
> .tr aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
> .ds \\$1 \\*[\\$1]
> .tr AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUUVVWWXXYYZZ
> .  .
...
> .ds P pound sterling\"
> .ds D United States dollar\"
> .
> The \*P and \*D are almost at par.
> .
> .ds news \*P crashes overnight!
> .UP news
> \*[news]
> .pl \n(nlu

Reasons not to use strings here:

- It avoids having to re-work existing documents.
- The text should be typed with £ and $, not \*P and \*D.
- The text to shout now has to be put into a string, a macro called, and
  the string interpolated.  Before, the text was just written.

It sounds like transliteration local to an environment is a new feature,
not one worthy of breaking .tr.

-- 
Cheers, Ralph.



Re: Comprehension problem with macros

2023-04-23 Thread Ralph Corderoy
Hi Oliver,

> I was not quite sure if I could abuse single-digit numbers in such a
> way.

Single-digit strings.

Sure, it's not abuse.  It's the language.  Just as dc(1) allows a byte
as the name of a string.

$ dc -e '6 7*s
> cl
> p'
42
$

If you think it might clash with other uses of strings 0, 1, ... then
prefix their name.

.ds lut0 0
...
\[u216\*[lut\nn]]

-- 
Cheers, Ralph.



Re: Comprehension problem with macros

2023-04-23 Thread Ralph Corderoy
Hi Oliver,

> .nr number 0 1
> .while (\n[number] < 16) \{\
> .ie (\n[number] < 10) \[u4E0\n[number]]
> .el \[u4E0\*[\n[number]]]
> .nr number +1
> .\}
>
> So far, everything works perfectly.
>
> However, if I wrap this loop in a macro like
>
> .de myline
> .\" material as above
> ..

The text after .de is read twice.  Once when the macro is defined and
again when it is executed.  The reading occurs in two different modes.
See section 7 of CSTR 54.  https://troff.org/54.pdf

For example, \nn puts the value of number-register n when the macro is
defined into the macro whereas \\nn delays getting the value of n to
when the macro is executed.

Here's a variation on your test.

$ cat oliver.tr
.pl 2
.
.ds 0 0
.ds 1 1
.ds 2 2
.ds 3 3
.ds 4 4
.ds 5 5
.ds 6 6
.ds 7 7
.ds 8 8
.ds 9 9
.ds 10 A
.ds 11 B
.ds 12 C
.ds 13 D
.ds 14 E
.ds 15 F
.
.nr n 0 1
.while (\n[n] < 16) \{\
\[u216\*[\nn]]
.nr n +1
.\}
.br
.
.de m
.nr n 0 1
.while (\\n[n] < 16) \\{\\
\\[u216\\*[\\nn]]
.nr n +1
.\\}
..
.m
$ troff -Tutf8 oliver.tr | grotty
Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ Ⅵ Ⅶ Ⅷ Ⅸ Ⅹ Ⅺ Ⅻ Ⅼ Ⅽ Ⅾ Ⅿ
Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ Ⅵ Ⅶ Ⅷ Ⅸ Ⅹ Ⅺ Ⅻ Ⅼ Ⅽ Ⅾ Ⅿ
$

-- 
Cheers, Ralph.



Re: groff injects blank page

2023-04-22 Thread Ralph Corderoy
Hi Carlos,

> That's my result from the teletype.  And xterm confirms the same
> pretty much

As I said elsewhere, give us an example which doesn't work, the command
you run, and the result on the teletype and xterm.

-- 
Cheers, Ralph.



Re: groff injects blank page.

2023-04-22 Thread Ralph Corderoy
Hi Carlos,

> .TH, that is normally in the beginning of the page, causes it, or it's
> indirectly involved in this blank page.
...
> But if I were to type an .ig and two dots in between that pesky `.TH`
> macro, the blank page disappears.

Can you give the list a short example input and the command which shows
the error.

-- 
Cheers, Ralph.



Chinese texts. (Was: A new ignoramus question about user-installed fonts)

2023-04-22 Thread Ralph Corderoy
Hi Deri,

> The problem is probably not ghostscript, current gropdf moans about
> "too many glyphs" when it can't allocate all used to 8 bit code
> points.  This too will be addressed in the next version of gropdf,
> which addresses problems of very large fonts with thousands of
> different glyphs.

Does this mean ‘troff -Tps | grops’ and GhostScript's ps2pdf(1) will
allow Oliver to experiment with Chinese texts in the meantime?

-- 
Cheers, Ralph.



Re: [PATCH v8 3/5] regex.3: Finalise move of reg*.3type

2023-04-21 Thread Ralph Corderoy
Hi Alejandro,

> when it has structure types with multi-line comments, you see what
> happens in the first PDFs I sent (mis-aligned comments).

Fix the formatting commands in the troff source so the comments are
aligned.  The man page is troff source for producing beautifully typeset
pages.

-- 
Cheers, Ralph.



Re: [PATCH v8 3/5] regex.3: Finalise move of reg*.3type

2023-04-21 Thread Ralph Corderoy
Hi Alejandro,

> > (a)  Use .nf/.fi for the function prototypes, and .EX/.EE for the
> >  types.
> > 
> > (b)  .EX/.EE for everything, as you did.
> > 
> > Please have a look at the PDF versions
...
> Which one looks better to you?  I've attached two PDF files

The Synopsis should not be in a fixed-width font.

-- 
Cheers, Ralph.



Re: Newbie question - Indents and paragraph filling

2023-04-17 Thread Ralph Corderoy
Hi Ben,

> I was looking to insert a question Num and then without a break indent
> the question text. Similar to the IP command with a bullet, however,
> I wish to append an additional mark allocation to the end of this body
> text also without a break.

Attached is some source which has a go at what you describe.
It's intended to be useful to study rather than just run.
I developed it using

nroff -ww ben.tr

but it formats as PDF okay too with

troff -ww -Tpdf ben.tr | gropdf >ben.pdf

> So pointers to sources to read or inaccuracies in my thinking would
> also be greatly appreciated.

One terse resource with handy indexes at the start is Kernighan and
Ossanna's CSTR 54.  It also has a tutorial macro set at the end.
https://troff.org/54.pdf

Here's the nroff output of ben.tr.

0.1.2.3.4.5.6.7.8
  _
  
   ___

   Question goes here.   Right

i) Q1estion goes here.   Right

   ii) Question goes here.   Right

  iii) Another question.   [2]

   iv) A much longer question.  Far longer than before. 12

v) A much longer question.  Far longer than before. 12

   vi) This  question takes several lines.  Both in the input.
   And in the output.  123

  vii) This question takes several lines.  Both in the  input.
   And in the output.  124

 viii) This  question takes several lines.  Both in the input.
   And in the output.  125

   ix) A question that doesn’t  leave  enough  space  for  the
   marks to sit on the same line forcing a break.   123456

x) A  question  that  doesn’t  leave  enough space for the
   marks to sit on the same line forcing a break.
   1234567

   xi) A question that doesn’t  leave  enough  space  for  the
   marks to sit on the same line forcing a break.
  12345678

  xii) A  question  that  doesn’t  leave  enough space for the
   marks to sit on the same line forcing a break.
 123456789

 xiii) A question that doesn’t  leave  enough  space  for  the
   marks to sit on the same line forcing a break.
1234567890

  xiv) Final question.  42

-- 
Cheers, Ralph.
.sp |1i
.\" Leave adjustment on, even though it looks worse, because it's a
.\" trickier prospect.
.\" .na
.
.\" Diagnostic ruler.
.po 0
.in 0
.ll 9i
.ta 1i +1i +1i +1i +1i +1i +1i +1i
.tc .
0	1	2	3	4	5	6	7	8
.tc
.br
.
.\" The line starts after the page offset.
.\" The indent eats into the line's length.
.po 1i
.ll 6i
.in 0.5i
\h'-\n(.iu'\l'\n(.iu'\" The indent.
.br
\h'-\n(.iu'\l'\n(.lu'\" The line length.
.br
\l'\n(.lu-\n(.iu'\" The line used for text: length less indent.
.sp
.
.
.\" Manually, without macros.
Question goes here.
.ds r Right
\h'\n(.lu-\n(.iu-\n(.ku-\w'\*r'u'\*r
.sp
.
.
.\" Question number.
.nr n 0 1
.af n i \" Complicate its width.
.
.\" Pose a question.
.de q
.ds n \\n+n)\\ \" Trailing unstretchable space.
.if t .as n \\ \" Add another one for troff.
.as n \c
.nr ti 0-\\w'\\*n'u \" How much to move left by.
\\h'\\n(tiu'\\*n
..
.
.q
Q1estion goes here.
.ds r Right
\h'\n(.lu-\n(.iu-\n(.ku-\w'\*r'u'\*r
.sp
.
.
.\" Marks available.
.de m
.ds mk \\ \\$1
.cg
.if \\n(gp<0 \{\
.br
.cg
.\}
\\h'\\n(gpu'\\*(mk
.sp
..
.\" Calculate gap in gp.
.de cg
.nr gp \\n(.lu-\\n(.iu-\\n(.ku-\\w'\\*(mk'u
..
.
.
.q
Question goes here.
.m Right
.
.q
Another question.
.m [2]
.
.q
A much longer question.
Far longer than before.
.m 12
.
.q
A much longer question.
Far longer than before.
.m 12
.
.q
This question takes several lines.
Both in the input.
And in the output.
.m 123
.
.q
This question takes several lines.
Both in the input.
And in the output.
.m 124
.
.q
This question takes several lines.
Both in the input.
And in the output.
.m 125
.
.q
A question that doesn't leave enough space
for the marks to sit on the same line
forcing a break.
.m 123456
.
.q
A question that doesn't leave enough space
for the marks to sit on the same line
forcing a break.
.m 1234567
.
.q
A question that doesn't leave enough space
for the marks to sit on the same line
forcing a break.
.m 12345678
.
.q
A question that doesn't leave enough space
for the marks to sit on the same line
forcing a break.
.m 123456789
.
.q
A question that doe

Re: groff 1.23.0.rc4 on Solaris 11 OpenIndiana

2023-04-17 Thread Ralph Corderoy
Hi Branden,

> pdfinfo \
> | tr -d '\000' \
> | sed -n -e '/Page *size:/s/Page * size: *\([0-9.]*\) *x * 
> \([0-9.]*\).*$/\.nr pdfpic*width (p;\1)\
> .nr pdpic*height (p;\2)/;tprint
> b
> :print
> p'

Why the dance with ‘tprint’?  sed -n s/foo/bar/p

The \ in \.nr isn't needed.  It isn't in the other one.

To match one or more p's in a BRE, the idiom is pp* rather that p*p.
Though I'm not sure it's necessary here for the spaces.

The substitution's address is different from its pattern in that
‘Pagesize:’ matches the former but not the latter.

One could

sed -ne '/^Page *size: *\([0-9.][0-9.]*\) *x *\([0-9.][0-9.]*\).*$/s//.nr 
pdfpic*width (p;\1)\
.nr pdpic*height (p;\2)/p'

Its idiomatic to have the pipe at the end of the line.
By design, this also avoids the backslash clutter in the shell.

-- 
Cheers, Ralph.



Re: groff 1.23.0.rc4 on Solaris 11 OpenIndiana

2023-04-16 Thread Ralph Corderoy
Hi Bruno,

> When I run
>
> pdfinfo doc/automake.pdf | tr -d '\000' | grep "Page *size" | sed -e 's/Page 
> *size: *\\([[:digit:].]*\\) *x *\\([[:digit:].]*\\).*$/\
> .nr pdfpic*width (p;\\1)\\n\
> .nr pdfpic*height  (p;\\2)/'

Are you running those three lines in a shell or have you edited the troff
to be that text?  IOW, is troff filtering the text before sh sees it?

-- 
Cheers, Ralph.



Re: groff 1.23.0.rc4 on AIX

2023-04-16 Thread Ralph Corderoy
Hi Branden,

> > > xlc -q64 ... -lSM -lICE -lXaw -lXmu -lXt -lX11  -lm libxutil.a 
> > > lib/libgnu.a 
> > > ld: 0711-317 ERROR: Undefined symbol: .XpmReadFileToPixmap
> > > ld: 0711-317 ERROR: Undefined symbol: .XShapeCombineMask
> > > ld: 0711-317 ERROR: Undefined symbol: .XShapeQueryExtension
...
> > > Hmm. In the 'configure' output I see two lines
> > > 
> > >   checking for Xaw library and header files... yes
> > >   checking for Xmu library and header files... yes
> > > 
> > > Should there also be a line
> > > 
> > >   checking for Xpm library and header files...
...
> > I'd expect that to resolve
> >
> > > ld: 0711-317 ERROR: Undefined symbol: .XpmReadFileToPixmap
> >
> > but not
> >
> > > ld: 0711-317 ERROR: Undefined symbol: .XShapeCombineMask
> > > ld: 0711-317 ERROR: Undefined symbol: .XShapeQueryExtension
> >
> > It has been nearly 20 years since I did X11 work, but the SHAPE X11
> > protocol extension is not directly related to Xpm.  I think.
...
> > I thought Xpm was (just?) an extension to the X11 core bitmap file
> > format to support color depths other than 1.

And transparent pixels IIRC.  But I agree it doesn't use Shape which is
for ‘shaping’ non-rectangular windows.

> > These missing external symbols suggest to me that libXext will be
> > required too, to get the client library interface to the SHAPE
> > extension.

Yes.

> I question whether there should in fact be any "checking for Xpm"
> configuration test in groff for the Xpm library or for the SHAPE
> extension client library interface (which is in Xext).
>
> The reason is that the groff code doesn't use these interfaces or
> symbols.  gxditview should link and run just fine on a libXpm-free
> system with no SHAPE extension support (client- _or_ server-side).
>
> It sounds like one of AIX's versions of the standard X11 libraries
> links to these (my money's on Xaw, a popular site for vendor
> extensions since its defaults are so minimalistic and ugly).

Check your local systems to hand.
Here, Linux, libXaw wants libXext and libXmu for Shape.

$ ldd /bin/gxditview | g -o '/usr[^ ]+' | sort -u | _ -t readelf -s _ |& g 
'^readelf|Shape'
readelf -s /usr/lib64/ld-linux-x86-64.so.2 
readelf -s /usr/lib/libc.so.6 
readelf -s /usr/lib/libdl.so.2 
readelf -s /usr/lib/libICE.so.6 
readelf -s /usr/lib/libm.so.6 
readelf -s /usr/lib/libSM.so.6 
readelf -s /usr/lib/libuuid.so.1 
readelf -s /usr/lib/libX11.so.6 
   221: 000a1b4025 FUNCGLOBAL DEFAULT9 
XkbAllocGeomShapes
   428: 000a15d041 FUNCGLOBAL DEFAULT9 XkbFreeGeomShapes
   522: 000a2260   286 FUNCGLOBAL DEFAULT9 XkbAddGeomShape
   604: 0001fd90   286 FUNCGLOBAL DEFAULT9 
_XTryShapeBitmapCursor
   628: 00098500   242 FUNCGLOBAL DEFAULT9 
XkbComputeShapeBounds
  1270: 00098600   218 FUNCGLOBAL DEFAULT9 
XkbComputeShapeTop
readelf -s /usr/lib/libXau.so.6 
readelf -s /usr/lib/libXaw.so.7 
 2:  0 FUNCGLOBAL DEFAULT  UND 
XmuCvtStringToShapeStyle
37:  0 FUNCGLOBAL DEFAULT  UND XShapeCombineMask
   233:  0 FUNCGLOBAL DEFAULT  UND 
XmuCvtShapeStyleToString
   245:  0 FUNCGLOBAL DEFAULT  UND 
XShapeQueryExtension
readelf -s /usr/lib/libxcb.so.1 
readelf -s /usr/lib/libXdmcp.so.6 
readelf -s /usr/lib/libXext.so.6 
75: 87c0   490 FUNCGLOBAL DEFAULT9 
XShapeQueryExtents
76: 86e0   223 FUNCGLOBAL DEFAULT9 XShapeOffsetShape
   137: 804073 FUNCGLOBAL DEFAULT9 
XShapeQueryExtension
   140: 8090   282 FUNCGLOBAL DEFAULT9 
XShapeQueryVersion
   143: 85d0   271 FUNCGLOBAL DEFAULT9 
XShapeCombineShape
   155: 89b0   195 FUNCGLOBAL DEFAULT9 XShapeSelectInput
   170: 8b90   541 FUNCGLOBAL DEFAULT9 
XShapeGetRectangles
   173: 81b0   447 FUNCGLOBAL DEFAULT9 
XShapeCombineRegion
   175: 8370   351 FUNCGLOBAL DEFAULT9 
XShapeCombineRectangles
   183: 84d0   255 FUNCGLOBAL DEFAULT9 XShapeCombineMask
   187: 8a80   266 FUNCGLOBAL DEFAULT9 
XShapeInputSelected
readelf -s /usr/lib/libXmu.so.6 
19:  0 FUNCGLOBAL DEFAULT  UND XShapeCombineMask
   151: 00011b90   264 FUNCGLOBAL DEFAULT9 
XmuCvtShapeStyleToString
   153: 000119f0   404 FUNCGLOBAL DEFAULT9 
XmuCvtStringToShapeStyle
readelf -s /usr/lib/libXpm.so.4 
readelf -s /usr/lib/libXt.so.6 
$

And libXaw uses XpmReadFileToPixmap() pulling in libXpm.

> I don't know anything about how linking on AIX works.

The main difference I remember is it loops around until symbols are
resolved so things like ‘-la -lb -la’ 

Re: groff 1.23.0.rc4 on Solaris 11 OpenIndiana

2023-04-16 Thread Ralph Corderoy
Hi Branden,

> pdfpic.tmac does a pretty hairy thing.
>
> .  \" Get image dimensions.  The `tr` command to strip null bytes is
> .  \" distasteful, but its necessity is imposed on us.  See
> .  \" .
> .  ec @
> .  sy pdfinfo @$1 | \
> tr -d '\000' | \
> grep "Page *size" | \
> sed -e 's/Page *size: *\\([[:digit:].]*\\) *x *\\([[:digit:].]*\\).*$/\
> .nr pdfpic*width (p;\\1)\\n\
> .nr pdfpic*height  (p;\\2)/' \
> > @*[pdfpic*temporary-file]
> .  ec

Doesn't that look a bit odd?  Both tr and sed want to see a single
backslash in their argv[] string.  tr for \000 and sed for \( and \).
The arguments to both are in sh's single quotes.  Yet the backslashes
for tr are single whereas sed's are doubled.

This suggests some variation between sh implementations.  I'd double
tr's to \\000.

Also, the grep isn't needed if the sed is made -n and the s/// becomes
s///p.  This would also remove the oddity of using "" for grep when
nothing in its argument needs interpolation.

And there is an extra space after ‘height’.

If the multi-line sed is a portability problem.  And it probably isn't.
Then the sed could just replace with /\1 \2/ and the pipeline captured
in ‘set $(...)’.  Then the .nr could be printed using the shell's $1
and $2.

-- 
Cheers, Ralph.



Re: Compressed man pages

2023-04-13 Thread Ralph Corderoy
Hi Mingye,

> [Zstd and brotli each have a "dictionary mode" to deal with this, but
> (a) Zstd dict-file requires an extra flag on decompress (b) nobody has
> brotli, which has a default dictionary, installed.]

I found brotli was already installed here.
So here's some numbers, just for the lists' info.

$ ls | grep '\.gz$' | shuf -n10 |
> while read -r f; do
> printf '%32s  %5d  %5d\n' "$f" `wc -c <"$f"` \
> `zcat "$f" | brotli | wc -c`
> done |
> awk '{print $0 "  " $3/$2}'
postmap.1.gz   4125     0.808
   gnutls-cli-debug.1.gz   2627   2108  0.802436
  cwebp.1.gz   5074   4106  0.809223
findsmb.1.gz   1810   1474  0.814365
ppmntsc.1.gz   1282973  0.75897
  libuv.1.gz  76363  62274  0.8155
  xmlwf.1.gz   3486   2760  0.791738
  users.1.gz763572  0.749672
   gpgparsemail.1.gz294231  0.785714
   perl561delta.1perl.gz  51764  42957  0.829862
$

-- 
Cheers, Ralph.



Re: Compressed man pages

2023-04-12 Thread Ralph Corderoy
Hi Mingye,

> the thing is we are talking about storage for distribution on every
> single person's computer

No, I was talking to s...@gentoo.org so I assumed Gentoo as the target.

> We are looking at a world where almost every system has xz installed
> because of some past decisions, unfortunate or not.

That's not the kind of thing I expect to bother Gentoo.  :-)

-- 
Cheers, Ralph.



Re: Compressed man pages

2023-04-12 Thread Ralph Corderoy
Hi Sam,

> I started looking into changing to xz (or just.. not bz2, anyway)

If you're putting effort into researching another compressor then
consider lzip(1).  https://www.nongnu.org/lzip/lzip.html

Its author compares it against xz in particular.
https://www.nongnu.org/lzip/xz_inadequate.html

-- 
Cheers, Ralph.



Re: Proposed v2: an eqn keyword change: gfont -> gifont

2023-04-11 Thread Ralph Corderoy
You type ‘s’ and spot an error.

> The bold font is used by the bold primitive.

The bold primitive uses the bold font.

-- 
Cheers, Ralph.



Re: Proposed v2: an eqn keyword change: gfont -> gifont

2023-04-11 Thread Ralph Corderoy
Hi Branden,

> Further feedback is welcome.

I see groff's documentation has Roman numerals but roman fonts.
I try to parse roman as something+man.  :-)

>   Fonts
> eqn uses up to three typefaces to set an equation: an italic face
> for letters, a roman face for everything else, and a bold face.

If the Roman face is for everything else then it seems odd at this point
that bold is needed.

> The defaults for these correspond to the groff font styles I, R,
> and B, respectively, using the font family that is current when
> the equation is set.
> The primitives gifont, grfont, and gbfont assign a groff typeface
> to each of eqn's faces.
> Control which characters are treated as letters (and therefore set
> in italics) with the chartype primitive described above.
> A character assigned the type letter is set in italics; a digit is
> set in roman.

It's only at the end that everything else is probably seen to be ‘digits’.

eqn uses three typefaces to set an equation: italic, Roman, and bold.
Set them to a groff font style with primitives gifont, grfont, and gbfont.
The defaults are I, R, and B in the current font family.
The chartype primitive sets a character's type, see above.
A letter character is set in italics.
A digit character in Roman.
The bold font is used by the bold primitive.

-- 
Cheers, Ralph.



Re: reformatting man pages at SIGWINCH

2023-04-11 Thread Ralph Corderoy
Hi Branden,

> see man pages as they would have formatted for Western Electric
> Teletype machines, which printed to long spools of paper with 66 lines
> to the nominal page.

In case it isn't obvious, it was normal for teletypes and line printers
to print six lines per inch onto letter-height fan-fold paper perforated
every eleven inches giving 66 lines per real page, not nominal.

As long as the paper was positioned so it started printing just after a
perforation, the page breaks occurred over a perforation.  To allow for
a bit of leeway, the page often started and ended with blank lines.

-- 
Cheers, Ralph.



Re: A file suffix for troff's output.

2023-04-10 Thread Ralph Corderoy
Hi DJ,

> Since ditroff stands for “Device Independent troff”

It does.  It means the troff which doesn't have a device hard-coded
within it but can instead take -Tps, -Tutf8, etc.

The output of the device-independent troff is specific to the chosen -T
device.  Continuing to include the ‘independent’ as part of the name for
the output seems wrong.  It's the output of a DI troff but is not itself
DI.

> what about something like ‘dee-vie-roff’ or ‘di-vie-roff’ depending on
> how you pronounce “device”?

Those not in the know won't see a ‘v’ when looking at ‘ditroff’ so ‘vie’
won't feature in its pronunciation.

-- 
Cheers, Ralph.



Re: A file suffix for troff's output.

2023-04-10 Thread Ralph Corderoy
Hi Alejandro,

> I'd use .cat.set for UTF8/ASCII pages, and .html.set for HTML pages.

Yes, I was thinking if a .tr was being turned into several formats then
I'd include troff's -T device in the filename.

But as a general case, where just one -T is being targeted, a plain
chapter.set seems sufficient.

-- 
Cheers, Ralph.



Re: A file suffix for troff's output.

2023-04-10 Thread Ralph Corderoy
Hi Steve,

> > troff chapter.tr >chapter.set
> > grops chapter.set >chapter.ps
> > 
> > Short, simple, not already widely used by another program,
> > pronounceable, a clear derivation.
>
> Maybe I'm mis-reading the problem here, but Postscript output from
> groff in my experience has always been a temporary file.

The issue is the output from troff which is later turned into
PostScript.  Or PDF, or whatever, depending on the device troff was
given; default PostScript.  The chapter.set above.

> And under Unix one could reasonably argue that an un-suffixed file
> outside of bin directories was by default a text file.

I agree text files on Unix don't need a suffix.

Some programs can act on the pattern to filenames, e.g. make(1).
It can be told how to turn a .tr into a .set and a .set into a .ps.
Even though all three are text files, it's useful to suffix them
descriptively.

-- 
Cheers, Ralph.



Re: A file suffix for troff's output.

2023-04-10 Thread Ralph Corderoy
Hi DJ,

> > troff lays out its input.
> > The input has been placed on the page.
> > It is typeset.
> > It is set.
> >
> > troff chapter.tr >chapter.set
>
>   $ file chapter.set
>   chapter.set: ditroff output text for PostScript, ASCII text
>
> Wouldn’t “.ditroff” be more appropriate?

The question was what else to use other than .dit or .ditroff given
Kernighan has ‘never been fond if it’.

I don't like ditroff either.  It's too long as a suffix.  Given troff is
tee-roff is it die-tee-roff, unfortunate, or dee-eye-tee-roff?  Both are
long to say.

$ file chapter.set
chapter.set: troff typeset output for PostScript, ASCII text

-- 
Cheers, Ralph.



A file suffix for troff's output. (Was: pdfroff in groff 1.23.0.rc3 changes compared to 1.22.4)

2023-04-10 Thread Ralph Corderoy
Hi Branden,

> Perhaps you can think of some alternative names for distinguishing
> formatter output that we expect non-groff output drivers to be able to
> cope with from those that exercise the extension.

troff lays out its input.
The input has been placed on the page.
It is typeset.
It is set.

troff chapter.tr >chapter.set
grops chapter.set >chapter.ps

Short, simple, not already widely used by another program,
pronounceable, a clear derivation.

-- 
Cheers, Ralph.



Re: Compressed man pages

2023-04-09 Thread Ralph Corderoy
Hi Alejandro,

> Sure; do you have a mailing list, or should I send them to you and
> CC linux-man@?  I have at least one bug report for you.

Start from https://man-db.gitlab.io/man-db/,
which is the home page according to Arch Linux's package,
and you'll end up in all the typical places:
mailing list, issue tracker, etc.

-- 
Cheers, Ralph.



Re: Accessibility of man pages

2023-04-09 Thread Ralph Corderoy
Hi,

(Colin, something for you near the end; search ‘interesting’.)

Eli wrote:
> Dirk wrote:
> > $ find /usr/share/man -type f -exec bzgrep -l RLIMIT_NOFILE {} \;
...
> > find... man -K...
> >
> > real 107.45 real 96.34
> > user 117.06 user 70.11
> > sys 14.43   sys 26.86
...
> > $ time -p find /usr/share/man -type f | xargs bzgrep -l RLIMIT_NOFILE
...
> > real 24.30
> > user 32.34
> > sys 6.84
>
> Multiprocessing, obviously.  Your CPU has more than one execution
> unit, so the pipe via xargs runs 'find' and 'bzgrep' in parallel on
> two different execution units.  By contrast, "find -exec" runs them
> sequentially, in a single thread.

No, I don't think it's that.

With the first, find(1) does stop whilst waiting for bzgrep to grep a
single file.  bzgrep may or may not run on the same core.  The important
thing is the one bzgrep per file and its fork() and exec() overhead.

The second has find fill a pipe's buffer with paths and when that's
full, xargs's read can return.  This continues until xargs either reads
end-of-file or reaches the argv[] limits.  It then runs a single bzgrep
with many filenames.  The fork+exec overhead is much reduced.

bzgrep is a shell script and has overhead before it gets to the
argument-processing loop.  That overhead is suffered many times if
bzgrep is run once per file.

The *zgrep scripts are a poor option in general due to this
one-grep-per-file overhead.  Better than nothing, but a grep which can
internally decompress all the different compression formats avoids this
shell overhead.

Here is an example.  260 files causes eight times as many clone(2)s,
i.e. forks.  I've added an extra ‘×...’ column.  The ls and xargs will
complete their work nearly instantly.  All the wall-clock time is the
single run of zgrep.

$ pwd
/usr/share/man/man7
$ ls *.gz | wc -l
260
$
$ ls *.gz | LC_ALL=C strace -fc xargs -rd\\n zgrep -H not-to-be-found
% time seconds  usecs/call callserrors syscall
-- --- --- --- 
 93.70   27.5100397555  3641 ×14  1560 wait4
  0.850.248763  26  9389   mmap
  0.680.198674  11 17166   rt_sigprocmask
  0.560.165702  35  4691   mprotect
  0.520.153146   6 22637   rt_sigaction
  0.500.146029  21  6780   read
  0.430.125451  10 12235  1040 close
  0.310.091715  29  3132   openat
  0.250.073542  11  6513   522 fcntl
  0.240.070822  12  5728  2080 stat
  0.230.068825  16  4171   fstat
  0.200.057703  24  2348   brk
  0.190.054838  69   786 4 execve
  0.180.052849  25  2081 ×8clone
  0.170.051089  17  2862   782 access
  0.150.043284  55   782   munmap
  0.140.040012  48   819   write
  0.110.031992  11  2870   260 lseek
  0.110.031393   8  3902   dup2
  0.080.023038  22  1041   pipe
  0.070.021190  13  1560   rt_sigreturn
  0.060.018363  11  1564   782 arch_prctl
  0.050.016013   7  2081 ×8getgid
  0.050.015314   7  2081 ×8getegid
  0.050.015251   7  2081 ×8getuid
  0.050.014780   7  2081 ×8geteuid
  0.020.004703  18   260 ×1sigaltstack
  0.010.003685  13   264 ×1prlimit64
  0.010.003523   1  2084 ×8getpid
  0.010.003443  13   260 ×1set_tid_address
  0.010.003425  13   260 ×1set_robust_list
  0.000.94  23 4   getdents64
  0.000.53  17 3 2 ioctl
  0.000.33  16 2   poll
  0.000.18  18 1   sysinfo
  0.000.14  14 1   getppid
  0.000.13  13 1   uname
  0.000.13  13 1   getpgrp
-- --- --- - - 
100.00   29.358834128163  7032 total

Compare with running sh(1) to run zcat and grep on each bunch of xargs's files.

$ ls *.gz | LC_ALL=C strace -fc sh -c 'xargs -rd\\n zcat | grep 
not-to-be-found'
% time seconds  usecs/call callserrors syscall
-- --- --- --- 
 82.180.150049   37512 4 1 wait4
  5.920.0

Re: sensitivity vs. specificity in software testing

2023-04-08 Thread Ralph Corderoy
Hi Branden,

> My personal test procedures, I think, adequately do this for man(7);
> every time I'm about to push I render all of our man pages (about 60
> source documents) to text and compare them to my cache of the ones I
> rendered the last time I pushed.

Yes, that's good as a lone developer.  Making it a hurdle for others
could be nice to have.

> I think there is a risk here of confounding macro package and
> formatter problems with output driver problems.  All should be tested,
> but not necessarily together except for inputs designed as integration
> tests.

I think a distinction between us is I'm not talking about designed
inputs.  ‘.DS .bp .DE’ isn't a typical designed input.

> With the benefit of a few years experience, I would claim that our
> defect rate in output drivers is pretty low compared to that in the
> formatter and (particularly) macro packages.

Yes, I'd have thought that likely.  Though fuzzing device drivers would
be fun.  :-)

> That is why the tests I've written have demonstrated an increasing
> bent toward use of "groff -Z" and groff "-a"; these produce
> device-independent output and "an abstract preview of output",
> respectively.

troff's output is device dependent, as I just mentioned in another
thread, but I know what you mean.

$ sdiff -s -w64 <(troff << I reiterate though, that the bugs we tend to encounter are detectable
> before getting to the output driver.

The encountered bugs are, yes.  There are the bugs unseen.

The aim of formatting a corpus to pixels would be to quickly test a
growing set of real-world documents.  It would be cheap to add another
document.  The output of a preprocessor, troff, or a device driver may
change intentionally.  Eyeballing those changes for the corpus would be
tedious and error prone.  The pixels intentionally change less often.
And eyeballing pixels to see the nature of the change tends to be quick
compared to comprehending what a diff at a stage of the pipeline
represents.

So a corpus diffed as pixels serves a different purpose to hand-written
coverage or regression tests.  Just as fuzzing attacks from yet another
angle.

> Several ways to skin this cat.  :)

Yes.  I'd be tempted to have a standard encoding which gives a readable
rendering but compresses two or more blank lines.

awk '
!length   { b++; next }
b == 1{ print "" }
b > 1 { print "-" b }
  { b = 0 }
/^-[0-9]/ { printf "-" }
1
' 

One could also highlight or encode tabs or end-of-line white-space to
make it obvious to the reader and protect it from incorrect change.

-- 
Cheers, Ralph.



Re: pdfroff in groff 1.23.0.rc3 changes compared to 1.22.4

2023-04-08 Thread Ralph Corderoy
Hi John,

> I've always just called it "ditroff" (*"device-independent troff
> [output]"*), with *.dit and *.ditroff being my typical choice of file
> extensions.

The ‘dit’ suffix is probably what I've seen the most.

> I'm aware that it's a reappropriation of an obsolete name for all
> post-Osanna troff(1) implementations, but its meaning is clearer to
> readers familiar with the term *"device-independent [gt]roff output"*.

Yes.  Though it contains device-dependent troff output.  :-)

> The names "grout" and "trout", OTOH, are a lot less obvious.

Also, the ‘out’ seems wrong.  So many files are the output of something
yet don't have that in their suffix.  a.out seems to have that honour as
it nabbed it first.

The file contains a rendering of the troff for a device.
In case that helps trigger better suggestions.

-- 
Cheers, Ralph.



Re: reformatting man pages at SIGWINCH

2023-04-08 Thread Ralph Corderoy
Hi,

> > > (1) what part of the screen was the reader actually looking at?

less(1) has -j; that would be a good start.

> > > (2) how is the pager supposed to know how to map any given
> > > location on the screen back to a place in the unrendered source
> > > document so it can be accurately found when the document is
> > > rerendered?

I would assume the pager looks for the same place in its input, not in
the man-page source.  It keeps seeking forward to the best matching run
of words, jumping to the best so far.

Problems I can think of:

- the formatter's input may be ephemeral and so need buffering,
- the originator may not have intended that and limited its size,
- seeking the best match after being WINCH'd must also buffer and may
  never reach EOF,
- the input formatter may alter its output based on the terminal's size,
  e.g. a pic(1) diagram disappears, and
- a solution which re-starts the pager loses the pager's ephemeral
  settings.

I expect more would be found in practice.

-- 
Cheers, Ralph.



Re: man page rendering speed (was: Playground pager lsp(1))

2023-04-08 Thread Ralph Corderoy
Hi Branden,

> You're referring to cat pages.  As far as I know, these are on their
> way out if not already gone.

catman must die.  It was never a good solution to the problem.  As well
as ignoring different TERMs, it also didn't handle a user's variations
to a terminal's definition.  I'm glad to see Colin is open to the idea,
though accept it's initial and on-going work for him.

> On my system, all groff man pages but one render in between a tenth and
> a fortieth of a second.

Colin made the point I was going to make: how long must my eyeballs wait
to be pleasured?

$ strace -ttt -fe read,write -o /tmp/st man ffmpeg-all
$ cat /tmp/st
 →  19788 1680952657.119429 read(3, 
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0  \0\0\0\0\0\0"..., 832) = 832
...
19801 1680952658.350823 write(1, "FFMPEG-ALL(1)   "..., 
1023 
19801 1680952658.352054 <... write resumed>) = 1023
19801 1680952658.353074 write(1, "ified by a plain output url.\33[m\n"..., 
1023 
19801 1680952658.353357 <... write resumed>) = 1023
19801 1680952658.354272 write(1, "e command line multiple times. E"..., 
1023) = 1023
19801 1680952658.357171 write(1, "aw input 
files.\33[m\n\33[m\n\33[1mDETAI"..., 1009) = 1009
19801 1680952658.357478 read(0, "--- | encoded data | <+\n"..., 
4096) = 4096
19801 1680952658.358752 write(1, "   | output | <-"..., 
1023) = 1023
19801 1680952658.359556 write(1, "peg\33[0m can process raw audio an"..., 
574) = 574
 →  19801 1680952658.359735 read(3,  
...
19801 1680952662.323859 <... read resumed>"q", 1) = 1
...
$

1680952658.359735 - 1680952657.119429 = 1.240306

strace adds a bit of overhead.

$ PAGER=true time -p man ffmpeg-all
real 0.99
user 1.07
sys 0.15
$

Hard to find a slower CPU.

$ grep name /proc/cpuinfo | uniq -c
  4 model name  : Intel(R) Atom(TM) CPU D525   @ 1.80GHz

-- 
Cheers, Ralph.



Re: Proposed: an eqn keyword change: gfont -> gifont

2023-04-08 Thread Ralph Corderoy
Hi Branden,

> gbfont f
> Set the bold font to f.
>
> gifont f
> Set the italic font to f.
>
> grfont f
> Set the roman font to f.
>
> For AT&T eqn compatibility, gfont is recognized as a synonym for
> gifont.

gbfont f
Set the bold font to f.

gfont f
gifont f
Set the italic font to f.  gifont is a GNU extension.

grfont f
Set the roman font to f.

I think ‘is a GNU extension’ is typical language for GNU documentation.

> -  { "gfont", GIFONT },
> +  { "gifont", GIFONT },
> +  { "gfont", GIFONT }, // for backward compatibility
...
> -lex_error("invalid argument to gfont primitive");
> +lex_error("invalid argument to gifont primitive");

The user should see a message which uses the keyword they entered.
It's annoying to search for ‘gifont’ and not find it, as will be typical.

One way to do this would be to have GFONT and GIFONT and only treat them
the same later on.

-- 
Cheers, Ralph.



sensitivity vs. specificity in software testing

2023-04-07 Thread Ralph Corderoy
Hi Branden,

> On the one hand I like the idea of detecting inadvertent changes to
> vertical spacing (or anything else) in a document, but on the other,
> I find narrowly scoped regression tests to be advantageous.

Agreed.  I assume groff is a long way from a set of tests which give
high code coverage.  I think that swings in favour of detecting
inadvertent changes.

> I think maybe the best-of-both-worlds solution is to have a model
> document-based automated test--perhaps one that exercises as many
> ms(7) macros as possible.

A bit of a torture test?  Yes, worthy.

> would add the highly sensitive Rumsfeldian "unknown unknowns" problem
> detection that I think your suggestion is tuned to.

I don't think it would catch the things not thought of, like the .bp
within a display.  I've probably mentioned it before, but a corpus of
real-life documents would be good input to a troff test harness.  Render
each at, say, 150 pixels per inch in monochrome by default and compare
against a golden version made earlier.

Commands like ‘gm compare’ or gmic(1) can do the pixel comparison.
A differing pixel would leave the diffs of each stage of the pipeline
for eyeballing.  So tbl's output is saved rather than just piped into
troff.

This needn't be part of groff's test suite, or anything to do with the
FSF.  It could be useful for comparing versions of other troffs.
Documents could be tagged with what troffs they're compatible with and
whether the test needs upgrading, say to colour pixels.  A buildbot or
similar could run the suite against a new release or Git commit.

When a new document is thrown into the pot, a golden test result is
eyeballed and saved.  It's not too important whether it's perfect so it
doesn't need costly proof-reading.  What matters is a later
change is detected and ensured to be deliberate.

> > output=\
> > ',,The first page is 1.,, display,
> > ,,, -2-,,,The second page is 2.
> > '
> > output=$(echo "$output" | tr , \\012)
>
> This is a good suggestion for handling blank line-happy output, of
> which we have quite a bit in groff.

I of course produced it by doing the opposite, line feeds into commas,
and then reverted a comma by hand where I wanted to show a page break
with a linefeed.

-- 
Cheers, Ralph.



[PATCH] fix for groff Git regression (Savannah #64005)

2023-04-06 Thread Ralph Corderoy
Hi Branden,

> Feedback welcome.
...
> +input='.pl 18v
> +.LP
> +The first page is \n%.
> +.DS
> +display
> +.DE
> +.bp
> +.LP
> +The second page is \n%.
> +.pl \n(nlu'
> +
> +output=$(printf '%s\n' "$input" | "$groff" -Tascii -P-cbou -ms)
> +echo "$output"
> +echo "$output" | grep -Fqx 'The second page is 2.'

Would it be worth testing all of $output is exactly as expected?
This would widen what's being tested which may catch a future regression
outside the scope of this test, e.g. with .DS/.DE.
The downside is a deliberate change might ripple through more tests but
the fix-up should be straightforward and would preserve the wider
testing.

$output can be encoded if it's considered too long.  These two are
equivalent:

output='





The first page is 1.

 display












 -2-


The second page is 2.'

output=\
',,The first page is 1.,, display,
,,, -2-,,,The second page is 2.
'
output=$(echo "$output" | tr , \\012)

-- 
Cheers, Ralph.



Re: Proposed: stop subjecting right-hand sides of `char` family requests to character translation

2023-04-02 Thread Ralph Corderoy
Hi Branden,

> Namely,
>
> .tr @--@
>
> is not a no-op!

Is that surprising, even if troff isn't known?

$ tr ab ba << In fact, it works a lot like file descriptor redirections in the
> shell.
>
> foo >/dev/null 2>&1 | grep error

That pipeline's not doing what ‘grep error’ suggests.  And tr and
redirection don't, to my mind, work in a similar manner.

Transliteration is conceptually done by a simple look-up table which is
only indexed once per character.  ‘tr a b’ or ‘.tr ab’ stores ‘b’ at
index ‘a’.  If ‘@--@’ were a no-op then how would swapping ‘@’ and ‘-’
be achieved?

Redirection is done left to right.  So the above pipeline maps stdout to
null and then stderr to stdout which is by now null.  It's ‘.tr 1n2n’ in
file-descriptor terms.  The grep for ‘error’ suggests ‘.tr 1n21’ is
wanted.

$ ls error >/dev/null 2>&1 | grep error | nl
$
$ ls error 2>&1 >/dev/null | grep error | nl
 1  /bin/ls: cannot access 'error': No such file or directory
$

If redirection were like transliteration then order wouldn't matter.
Just as ‘.tr 1n21’ and ‘.tr 211n’ are the same, these would be too.

ls error  >/dev/null 2>&1
ls error 2>&1 >/dev/null

Which would result in a loss of expression.

-- 
Cheers, Ralph.



Re: Proposed: stop subjecting right-hand sides of `char` family requests to character translation

2023-04-02 Thread Ralph Corderoy
Hi Dave,

> tl;dr: For this input:
>
> .tr zx
> .char \(zz zeezee
> \(zz top
>
> Would you want the output to be "zeezee top" or "xeexee top"?

$ preconv | nroff
.na
.nf
.
.char £ pound sterling
.char $ United States dollar
.
The £ and $ are almost at par.
.
.tr aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
£ crashes overnight!
.
.pl \n(nlu
^D
The pound sterling and United States dollar are almost at par.
POUND STERLING CRASHES OVERNIGHT!
$

I'd want to see shouty caps.

-- 
Cheers, Ralph.



Re: z/OS porting issues, UTF-8 support, and the groff man(1) page

2023-04-01 Thread Ralph Corderoy
Hi Dave,

> whereas < and > are pretty common for this and no one will bat an eye
> at those in non-UTF-8 contexts.

   ‘The angle-bracket "<" and ">" and double-quote (") characters are
excluded because they are often used as the delimiters around URI in
text documents and protocol fields.’
   — https://www.rfc-editor.org/rfc/rfc2396, §2.4.3


   ‘The recommendation is that the angle brackets (less than and greater
than signs) of the ASCII set be used for this purpose.

   ‘...

   ‘Example

   ‘Yes, Jim, I found it under  but
 you can probably pick it up from .’
— https://www.w3.org/Addressing/URL/5.1_Wrappers.html

-- 
Cheers, Ralph.

‘Short words are best and the old words when short are best of all.’
 — Winston Churchill



Re: Greek in email

2023-03-24 Thread Ralph Corderoy
Hi Branden,

Thanks for passing on the original email.  It's as I suggested:

- the text/html in the multipart/alternative got axed by the mailing
  list, and
- the multipart/alternative was left with a sole text/plain which got
  promoted into the multipart/alternative's place.

The text/plain part is truncated mid-quote.  It doesn't even end in a
linefeed.  Let's blame Kmail.  :-)

$ mhlist -nov -file greek-email
 msg part  type/subtype  size description
   0   multipart/mixed55K
 1 multipart/alternative  37K
 1.1   text/html  22K
 1.2   text/plain5689
 2 application/pdf13K
$
$ mhcat -file greek-email -part 1.2 | tail -3; echo
>
> What is happening is that letters with the acute accent (Greek: tonos)
> are getting dropped.  preconv(1) produces them in precomposed form
$
$ mhcat -file greek-email -part 1.1 | tail -3
> 
Branden


$

> When I view my GMail inbox via IMAP with NeoMutt, I don't have two
> copies of the message, but one.

So you're seeing different results with Gmail web interface and Gmail's
IMAP through NeoMutt?  If so then I think one is showing you the
text/html part and the other the text/plain part.

Traditionally, an MUA should show the first alternative it can handle.
This will be the ‘best’ quality because they're ordered best to worst.
But many decent MUAs allow that to be overridden to say text/plain is
preferred to text/html.

> When I look at the headers in NeoMutt, it appears to be the
> directly-delivered copy (since I was CCed), not the one to the list.

Agreed.

$ received )
   id 1pfWXB-0002DR-7X
01:48:15 +00:00:14 from smarthost01c.sbp.mail.zen.net.uk 
(smarthost01c.sbp.mail.zen.net.uk. [212.23.1.5])
   by mx.google.com
   with ESMTPS
   id 
o7-20020adfcf0700b002cfe44486dbsi15902598wrj.940.2023.03.23.18.48.15
   for  (version=TLS1_2 
cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256)
01:48:16 +00:00:15 by 2002:a05:6358:9211:b0:110:644f:f37b
   with SMTP
   id d17csp208408rwb
Delivered-To:  g.branden.robin...@gmail.com
$

-- 
Cheers, Ralph.



Re: Greek in Groff

2023-03-24 Thread Ralph Corderoy
Hi Branden,

> GMail shows me the response part of your message, but Neomutt does
> not, making it look like you did not respond at all except for (part
> of) the quotation--and the attachment.

The email I received from the list had a MIME type of multipart/mixed
with two parts as peers.

part  type/subtype
  multipart/mixed
1 text/plain
2 application/pdf

The text/plain contained just the quoted text, as you describe.

My guess is Deri's reply was only in a text/html part which was in a
multipart/alternative along with the text/plain.  The mailing list
deleted the text/html and promoted the text/plain out of the
multipart/alternative as it was the only alternative left.

Why was Deri's reply only in the text/html part?

I assume the two different views you see, Gmail and Neomutt, are due to
looking at two different emails: the one directly from Deri to you and
the other via the mailing list.

-- 
Cheers, Ralph.



Re: Greek in Groff

2023-03-18 Thread Ralph Corderoy
Hi Oliver,

> The encoding of choice would probably be ISO 8859-7 in order to remain
> within the 8 bit character encoding space.
...
> 4. Write your documents in ISO 8859-7 or convert them from Unicode to
> ISO 8859-7

I'd recommend your second option; that Mortadelas writes in UTF-8 and
uses preconv(1).

> 2. Localize necessary strings (like "abstract", "contents", days of the
> week etc.)

This may not be needed, e.g. if a macro set isn't being used.

> > P.S: This
> > 
> > is the post I am referring to.

For others who may reply, this thread is worth a read to see what's
already been suggested to Mortadelas.

-- 
Cheers, Ralph.



Re: [eqn] error: invalid input (character code 159)

2023-03-10 Thread Ralph Corderoy
Hi Alejandro,

> To me man(1) is still a black box (I have looked at its source code
> occasionally, but not very often), and there's no way to know what
> it's doing, since it provides no equivalent to groff(1)'s -V.

Colin Watson's man(1) uses his libpipeline(3).

$ PIPELINE_DEBUG=1 man ls 2>&1 >/dev/null | grep '^Starting pipeline:'
Starting pipeline:
zcat
[input: {0, /usr/share/man/man1/ls.1p.gz}, output: {-1, NULL}]
Starting pipeline:
zcat
[input: {0, /usr/share/man/man1/ls.1.gz}, output: {-1, NULL}]
Starting pipeline:
zcat
[input: {0, /usr/share/man/man1/ls.1.gz}, output: {-1, NULL}]
Starting pipeline:
(echo .nh && echo .de hy && echo .. && echo .na && echo .de ad && echo 
.. && zcat)
[input: {0, /usr/share/man/man1/ls.1.gz}, output: {-1, NULL}]
Starting pipeline:
(cd /usr/share/man && /usr/lib/man-db/zsoelim) |
(cd /usr/share/man && /usr/lib/man-db/manconv -f UTF-8:ISO-8859-1 -t 
UTF-8//IGNORE) |
(cd /usr/share/man && preconv -e UTF-8) |
(cd /usr/share/man && tbl) |
(cd /usr/share/man && nroff -mandoc -rLL=265n -rLT=265n -Tutf8)
[input: {-1, NULL}, output: {-1, NULL}]
Starting pipeline:
col -b -p -x |
sed -e '/^[[:space:]]*$/{ N; /^[[:space:]]*\n[[:space:]]*$/D; }'
[input: {-1, NULL}, output: {0, NULL}]
Starting pipeline:
zcat
[input: {3, NULL}, output: {-1, NULL}]
Starting pipeline:
Waiting for pipeline: zcat
[input: {3, NULL}, output: {-1, NULL}]
$

> > $ groff -Tascii -kpt -V
> > preconv | pic | tbl | troff -Tascii | grotty
> > $
>
> Does man(1) run pic(1)?

It can do.  It sometimes reads the man page to find out the
preprocessors to use.  Read around ‘MANROFFSEQ’ in man(1).

> BTW, you may notice I added some trick to make eqn(1) fail:
>
> 

I'd grep for ^ as otherwise the code doesn't match the commit message.
:-)

-- 
Cheers, Ralph.



Re: [eqn] error: invalid input (character code 159)

2023-03-10 Thread Ralph Corderoy
Hi Alejandro,

> > I haven't studied what you're doing, but are you aware of preconv(1)?
...
> So, the actual pipeline that I should be using is more like
>
> preconv | tbl | eqn | troff | grotty

Yes.  The problem with groff(1) existing is it hides information and
stops newcomers learning the basics.  :-)  ‘-V’ helps.

$ groff -Tascii -kpt -V
preconv | pic | tbl | troff -Tascii | grotty
$

> and then `... | col | grep` for checking the 80-col limit is respected
> in the output.  Right?

col would need at least -x and -b.  I'd check grotty(1) to see its
options can remove the need for col.

> Do you recommend that I use a fallback-encoding (-D)?  Or should it be
> unnecessary?

If this is within your environment then I'd have thought the testing
framework would mean -D is not needed.

-- 
Cheers, Ralph.



Re: [eqn] error: invalid input (character code 159)

2023-03-10 Thread Ralph Corderoy
Hi Alejandro,

> I guess I'm missing some flags to eqn(1) maybe?

I haven't studied what you're doing, but are you aware of preconv(1)?

-- 
Cheers, Ralph.



Re: [Optional] versus parameters

2023-02-23 Thread Ralph Corderoy
Hi John,

> this is something that's been eating away at me every time I've
> resorted to using square-brackets as a logical grouping mechanism;
> i.e., stuff like
>
> [[*upgrade* | *update*] *package*]

That's using [] to mean two different things so is clearly a bad idea.
:-)

> This could be interpreted in two different ways (expressed using BNF):
>
>  := ("upgrade" | "update") 
> Or
>  := ("upgrade" | "update" )

I think you're answering it yourself.  Use () to group.  Those familiar
with grammars will understand it.  The rest will probably guess.  If it
gets too complex then be happy to repeat some parts rather than factor.

-- 
Cheers, Ralph.



Re: using .so in a macro

2023-02-16 Thread Ralph Corderoy
Hello Riza,

> Hello Ralph,
>
> cat main.ms | soelim | pic | tbl | groff -ms > main.ps
...
> What is soelim used for?

soelim(1) attempts to explain.  It doesn't tend to be needed when one
starts using troff but might be wanted in special cases later.

Consider the difference between these two if foo does a .so of bar.

pic foo | troff
soleim foo | pic | troff

Given the .so is processed by troff at the end of the pipeline,
the first leaves bar unseen by pic whereas the second includes
bar's content in the standard input seen by pic.

-- 
Cheers, Ralph.



Re: using .so in a macro

2023-02-16 Thread Ralph Corderoy
Hello Riza,

> Is it possible to use the ".so" command in a macro, and pass in an
> argument from the macro.
>
> .de MC
> .so $1
> ..
>
> .MC file.txt

Yes.

$ cat >foo
.de source
.so \\$1
..
foo
.source bar
foo
^D
$
$ cat >bar
bar
^D
$
$ nroff foo | grep .
foo bar foo
$

You're escaping the ‘$’ too much.  You're seeing:

$ nroff foo | grep .
troff: foo:5: can't open '\$1': No such file or directory
foo foo
$

-- 
Cheers, Ralph.



Re: groff 1.23.0.rc2 available for testing

2023-02-05 Thread Ralph Corderoy
Hi Branden,

Some drive-by comments from a quick skim.

> o New requests `soquiet` and `msoquiet` are available.  They operate as
>   `so` and `mso`, respectively, except that they do not emit a warning
>   diagnostic if the file named in their argument does not exist.

Given the ‘file’ warning also controls this, AIUI, I wonder if it would
have been more orthogonal to have a new command to alter the warnings
for just what follows.

.warn -file so might-be-missing
.warn -el historicalmacro foo bar

> o nroff now supports spaces and tabs between option flag letters and
>   option arguments, like groff and troff themselves.

I think that's trying to say

nroff -o 3,1,4

is now okay, i.e. the option's value can be a separate argument to the
option, but it reads to me that

nroff -o' 3,1,4'

will ignore the space.  Having to mention spaces and tabs smells wrong.

> o The `PDFPIC` macro (provided by the `pdfpic` package) no longer aborts
>   upon encountering trouble.  Instead, it reports an error and abandons
>   processing of its argument(s).  It is also more sensitive to other
>   kinds of problems and handles them the same way, by issuing a
>   diagnostic and returning.  If you wish `PDFPIC` to abort document
>   processing immediately upon error, you can append an `ab` request to
>   the package's error-handling macro.
>
> .am pdfpic@error
> .  ab
> ..
>
> o The pspic package now also has an error hook macro, which you can use
>   to make failed image loads fatal (or attempt fallback or recovery).
>
> .am pspic@error-hook
> .  ab
> ..

Were those ‘.ab’ written with the lack of a default message in mind?

> o The new rfc1345 macro package, contributed by Dorai Sitaram, defines
>   special character identifiers implementing RFC 1345 mnemonics (plus
>   some additions from Vim, which itself uses RFC 1345 for its digraphs).
>   It is documented in the groff_rfc1345(7) man page.

Mention ‘digraphs’ earlier and more prominently as that's their common
name.

>   you should now write
> .MR ls 1 .

Is text to include in one's man-page preamble given which tests if .MR
is available and if not defines it?  This would encourage .MR to be
used.

>   The default is "b" (adjust lines to both margins) as has been
>   the Unix man(7) default since 1979.

Probably just because it was showing off, similar to UNIX with small
caps.  :-)  It looks ugly.

> o On output devices using the Latin-1 character encoding ("groff -T
>   latin1" and the X11 devices) the special character escape sequence
>   \[oq] (opening quote) is now rendered as code point 0x27 (apostrophe)
>   instead of 0x60 (grave accent).  The ISO 8859/ECMA-94 Latin character
>   sets do not define any glyphs for directional ("typographer's")
>   quotation marks, but the apostrophe is depicted in the defining
>   standard as a neutral (vertical) glyph, whereas the grave accent 0x60
>   and acute accent 0xB4 are mirror-symmetric diacritical marks.
>
>   This change has no effect on _input_ conventions for roff source
>   documents.  You can still get directional single quotes on UTF-8,
>   PostScript, PDF, and other output devices supporting them by typing
>   sequences like `this' in the input (character remapping with 'char'
>   requests and similar notwithstanding).

-Tascii could do with a mention to place it in the Latin-1 or UTF-8
camp.

What's producing those funky ‘o’ bullet points?  And the `hip`
backticks?  Could UTF-8 be produced instead with • and ‘elegant’?

-- 
Cheers, Ralph.



Re: Wanted: your historical me(7) documents

2023-01-23 Thread Ralph Corderoy
Hi Dave,

> Even the draft above from Ralph, who champions brevity above all else,

Lies!  Correctness is one of several things which I'd rank higher.  :-)

> Perhaps Ralph will have ideas for tightening it further without losing
> information.

Anyone for a round of golf?

I don't know if this is correct, I just played with the words.

On typesetting devices,
because the e (me) macro package's line length
is now set by the device description,
it has changed from 6i to 6.5i.
papersize.tmac can override this,
e.g. troff(1)'s ‘-dpaper=a4l’,
without needing to alter the document.
Terminal line length remains unchanged.

-- 
Cheers, Ralph.



Off-Topic: Modular C library

2022-12-29 Thread Ralph Corderoy
Hi Alejandro,

> I'm considering writing a new C library that is designed as a hurd of
> microlibraries, which can be replaced independently.

Please don't post off-topic content to the Groff list.
Replies just amplify the noise.


Werner, can you please fill in Mailman's text which appears
under ‘About groff’ at https://lists.gnu.org/mailman/listinfo/groff
It's probably found at
https://lists.gnu.org/mailman/admin/groff/?VARHELP=general/info

If you're stuck for ideas, I'd suggest something with wide scope:

   ‘For users and developers of GNU groff and the other troff
implementations, macros, pre- and post-processors to request help,
share techniques, uncover history, and develop improvements.’

-- 
Cheers, Ralph.



Re: Wanted: your historical me(7) documents

2022-12-28 Thread Ralph Corderoy
Hi Dave,

> > > The thing I was saying NEWS should mention is that for -me users
> > > targeting troff, the default line length has changed from (its
> > > long-entrenched historical) 6i to (the arguably saner) 6.5i.
> >
> > Well, maybe just quoting the NEWS file in its current state would
> > help.
> >
> > o On typestting output devices, the e (me) macro package now derives
> >   the line length from the device description, which can be
> >   overriden by the "papersize.tmac" macro file (usually configured
> >   via the "-d paper" groff command-line option).  The package thus
> >   adapts to landscape orientation and paper formats other than U.S.
> >   letter.  It continues to use a line length of 6 (notional) inches
> >   on terminals.
> >
> > Does that seem to get the right information across?

It lacks the clarity of ‘the default line length has changed
from 6i to 6.5i’.

It doesn't have the end user in mind who wants to know what affects him.
It's long.  It puts the meat at the end so the user has to wade without
knowing why.  This trains the user to skim.

The -me macros' line length is now 6.5i instead of 6i on a
typesetter as it is set from the device or papersize.tmac.
It remains unchanged on a terminal.

> Side note: the blurb misspells "overridden."

And ‘typestting’.

-- 
Cheers, Ralph.



Re: Viewing PDFs. (Was: Happy new Sun)

2022-12-27 Thread Ralph Corderoy
Hi Steffen,

> On Saturday, 24 December 2022 17:31:38 GMT Steffen Nurpmeso wrote:
> > "o" for the former.
> > 'That said, https://ftp.sdaoden.eu/code-mailx-1.pdf (beware:
> > 1MB!), realized with newest minor of mdocmx on groff
> > 1.23.0.rc1.2915-c6d7, via pdfmark.tmac, causes the outline pane to
> > show things twice; i cannot recall whether that was true on 1.22.3
> > already, it is very ugly and so i would surely have seen this in
> > the past hmm.  It is like
> > 
> >   v NAME
> > Synopsis
> >   v SYNOPSIS
> > Table of contents
> >   v TABLE OF CONTENTS
> > Description
>
> Hi Ralph,

Deri means you.

> Since this was produced by gropdf you don't need -mpdfmark since it
> contains its own versions, however, I thought there was a test to stop
> both being loaded, so may not be the cause. Any chance of seeing the
> source file.

-- 
Cheers, Ralph.



Re: man(7), hyphen, and minus

2022-12-27 Thread Ralph Corderoy
Hi Branden,

> > # Get a count of the number of lines before the first blank line, which
> > # we'll pass to .Vb as its parameter.  This tells *roff to keep that 
> > many
> > # lines together.  We don't want to tell *roff to keep huge blocks
> > # together.
> > my @lines = split (m{ \n }xms, $text);
> > my $unbroken = 0;
> > for my $line (@lines) {
> > last if $line =~ m{ \A \s* \z }xms;
> > $unbroken++;
> > }
> > if ($unbroken > 12) {
> > $unbroken = 10;
> > }
...
> Well, you can throw away that line counting logic in Perl altogether
> and simply use `ne` _before_ EX (not EE).

I think the code is counting the number of lines in the first
‘paragraph’ although I find it misleading given it uses \A and \z with
//ms on a string which will only contain one line.

If that's its aim then it would be simpler to just count the number of
leading non-blank lines.

$ for s in '' $'\n' $'a\n' $'a\na\n' $'a\n\n' $'a\n\na\n'; do
> perl -n0777e '$n = () = /\G^.*\S.*\n/mg; print "$n\n"' <<<"$s"
> done
0
0
1
2
1
1
$ 

-- 
Cheers, Ralph.



Re: Old version of vgrind no longer working

2022-12-24 Thread Ralph Corderoy
Hi Damian,

>   echo "$i" | sed -e 's%[[./*$^\\]%\\&%g' -e 's%.*%/^&:/d%'
...
>   sed: -e expression #1, char 18: unterminated `s' command

It's the first expression which is the problem.

$ sed -e 's%[[./*$^\\]%\\&%g' 

Re: Viewing PDFs. (Was: Happy new Sun)

2022-12-24 Thread Ralph Corderoy
Hi Deri,

> You may need to open the outline pane on the left, which acts like a
> table of contents.

Is it mandatory that a PDF viewer must supply this mechanism to be
worthy of the name?  Off-hand, I don't recall mupdf(1) or llpp(1) having
an option or keystroke to show it.

-- 
Merry Christmas to the list, Ralph.



Inline TTY Pixel Rendering. (Was: groff 1.23.0.rc2 status report)

2022-12-19 Thread Ralph Corderoy
Hi Branden,

> Probably not a lot of this will be visible to terminal-only users.

Given some TTY emulators now support pixel-level graphics,
e.g. https://sw.kovidgoyal.net/kitty/graphics-protocol/,
I have been wondering if anyone is getting man(1) to render to pixels,
say via PDF, to display inline in the terminal.

Sounds like it's up John Gardner's alley.  :-)

-- 
Cheers, Ralph.



Re: [BUG] gropdf, tbl: Completely broken table

2022-12-17 Thread Ralph Corderoy
Hi Branden,

> The suffixes(7) page, which I've managed to never see in 25 years as a
> GNU/Linux user!

Me neither.

.text│ text file
.txt │ equivalent to .text

I don't recall seeing .text used as it's the default on Unix.
.txt is an import from foreign lands.

BUGS
This list is not exhaustive.

Just delete the page and anything that refers to it.  Bug fixed.
Either the user has the Internet, which is exhaustive, or they're savvy
enough to use a system without the Internet and don't need suffixes(7).

-- 
Cheers, Ralph.



Re: a Q quotation macro for man(7) (was: groff man(7) extensions)

2022-12-13 Thread Ralph Corderoy
Hi Branden,

> Even with that wrinkle, a `Q` macro would be dead simple.
...
> .\" Define opening and closing quotation marks as appropriate to your
> .\" language and/or output device.
> .ds oq \(lq
> .ds cq \(rq

If the idea is to entirely do away with producing quotes manually then
what about the second set which are normally used when quotes are needed
inside quotes?  :-)  For example, here in English-land, we'd use ‘…“…”…’.

-- 
Cheers, Ralph.



Groff History in Git. (Was: groff in git)

2022-12-12 Thread Ralph Corderoy
Hi Dave,

> Eric, can reposurgeon retroactively add an earlier release to git
> without changing all the existing git hashes (which are referenced all
> over the place, in the bug tracker and elsewhere)?  I know nothing
> about how these hashes are generated, so this may be utterly
> infeasible.

A Git commit ID is effectively a hash of its ancestry so that history
can't be changed in this case without the unwanted ripple.

If the groff Git repository had an empty ‘epoch’ commit from which
everything descended then other old versions could descend from that
without affecting existing descendants, but I don't think it has.  The
oldest commit looks to be for 1.02.
https://git.savannah.gnu.org/cgit/groff.git/commit/?h=1.04&id=351da0dcdf702cf243d26ffa998961bce2aa8653

If the epoch commit had existed then the new contributions wouldn't have
been in their correct location, e.g. 1.03 wouldn't be between 1.02 and
1.04, but Git's various searches could still have included them.

The alternative is to have a Git repo specifically for maintaining
historical versions, not for development, and then the commit IDs can be
completely regenerated as new discoveries are inserted.  This is what
Spinellis does for his
https://github.com/dspinellis/unix-history-repo#readme

-- 
Cheers, Ralph.



Re: Ping^1: Chapters of the manual (was: Bug#1018737: /usr/bin/rst2man: rst2man: .TH 5th field shouldn't be empty)

2022-12-12 Thread Ralph Corderoy
Hi Michael,

> I don't see a good reason to break an established term and instead
> suggest to follow the above and s/chapter/section/g.

man(1), apropos(1), and other commands use -s to specify sections and
many finger muscles won't change now.

-- 
Cheers, Ralph.



Re: words (and commands) that I learnt because of Branden

2022-12-10 Thread Ralph Corderoy
Hi John,

Nate wrote:
> > > Actually, "horde" and "hoard" are homophones
> >
> > I keep getting those spellings mixed up.
>
> It's a penchant of collectors to hoard a horde of whatever interests
> them.

If you want a mnemonic,

affect  hoard
effect  horde

-- 
Cheers, Ralph.



Footie. (Was: Dynamic Paperlength for PS or PDF device)

2022-12-10 Thread Ralph Corderoy
Hi John,

> I sure as shit aren't gonna miss England taking on France.

Yes, should be fun.  Les rosbifs and the Frogs have been sparring
partners quite a few times over the centuries.  Including The Hundred
Years War.  Makes me smile every time I see Waterloo Station is the
terminus for Eurostar trains from Paris.  :-)

-- 
Cheers, Ralph.



Online Dictionaries. (Was: words (and commands) that I learnt...)

2022-12-10 Thread Ralph Corderoy
Hi John,

> > Your emails are the reason I know and often use dict(1).
>
> Branden's e-mails are the reason I consult the Oxford English
> dictionary

Given the open-source bias of this list's readers, I recommend
Wiktionary.  I've used dict(1) for a very long time and OED if I'm
giving a reference for a bug report to fix a British version of English
dictionary, but I typically enter ‘wikt unix’ into the browser and the
‘wikt’ keyword I've created for the site's top-right search box takes me
to https://en.wiktionary.org/wiki/unix

Wiktionary often gives translations, is multi-lingual, though they quite
rightly put English first :-), and isn't too hard to edit once you've
made one or two changes.  A DICT server for its content keeps being
discussed, but there isn't one last time I checked.

-- 
Cheers, Ralph.



Re: Dynamic Paperlength for PS or PDF device

2022-12-10 Thread Ralph Corderoy
Hi John,

> .\" A4 portrait
> .pagesize 8.3i 11.7i
...
> .\" A4 landscape
...
> .pagesize 11.7i 8.3i

May as well be precise with 21c and 29.7c?

-- 
Cheers, Ralph.



Re: Pic question

2022-12-02 Thread Ralph Corderoy
> You're assigning an expression to a variable and ‘n/m <...>’ is
> available in expressions as far as I recall.

*unavailable*



Re: Pic question

2022-12-02 Thread Ralph Corderoy
Hi Riza,

> I have a location using a fraction like so "1/3 ".  How can
> I get the x component of that location?
...
> a = 1/3 ;
> print a.x;
>
> This fails at the assignment and it says "syntax error before ','"

You're assigning an expression to a variable and ‘n/m <...>’ is
available in expressions as far as I recall.

This should put you on the right track.

$ cat third.tr
.PS
B: box
A: 1/3 
print A
print A.x
.PE
$
$ pic /dev/null
0.75, 0.083
0.75
$

-- 
Cheers, Ralph.



Deviating the Inch Margin. (Was: 08/21: [me]: Integrate better with papersize.tmac.)

2022-11-28 Thread Ralph Corderoy
Hi Dave,

> > I think I'd prefer to assume that people want 1 inch margins all
> > around the page;
>
> I agree.

You've reminded me of a general point about margins, one which may have
no bearing on the topic under discussion...

Margins of about an inch around the page are good, but I came to the
conclusion that it's worth allowing them to deviate a bit so the area
within the margins, given the paper's size, is a nice typographical
number to allow for ease in sizing, subdivision, etc.

I've since noticed others do the same, within troff and without.
The oddball values are the margins and once their calculation is
understood it can be ignored.  The nice values for the printable area
are easily manipulable with mental arithmetic as they're encountered.

-- 
Cheers, Ralph.



Re: Chapters of the manual (was: Bug#1018737: /usr/bin/rst2man: rst2man: .TH 5th field shouldn't be empty)

2022-11-17 Thread Ralph Corderoy
Hi Alejandro,

> 'chapter' definitely makes more sense, at least considering the manual
> as a book.  Since it seems to have been in general use in the past,
> it's not so much of a breaking change to start using it now again.

Yes it is a breaking.  This is a terrible idea.  Colin Watson's man(1)
has -s to specify a section and talks of sections throughout.

Plan 9's man page refers to section.

Working with Unix from the ’80s, including many different commercial
versions, and reading dozens of published Unix books which pre-date the
web, I have only seen section be used, not chapter.

> With time, I expect to replace all occurrences of section that should
> be chapter in the man-pages.

There are projects which need custodians rather than radicals.

-- 
Cheers, Ralph.



Re: C Strings and String Literals.

2022-11-14 Thread Ralph Corderoy
Hi Alejandro,

> > > C doesn't _really_ have strings, except at the library level.
> > > It has character arrays and one grain of syntactic sugar for encoding
> > > "string literals", which should not have been called that because
> > > whether they get the null terminator is context-dependent.
> > >
> > >  char a[5] = "fooba";
> > >  char *b = "bazqux";
> > >
> > > I see some Internet sources claim that C is absolutely reliable about
> > > null-terminating such literals, but I can't agree.  The assignment to
> > > `b` above adds a null terminator, and the one to `a` does not.  This
> > > is the opposite of absolute reliability.  Since I foresee someone
> > > calling me a liar for saying that, I'll grant that if you carry a long
> > > enough list of exceptional cases for the syntax in your head, both are
> > > predictable.  But it's simply a land mine for the everyday programmer.
> > 
> > - C defines both string literals and strings at the language level,
> >e.g. main()'s argv[] is defined to contain strings.
>
> I must disagree.  The string concept is very broad, and you can define
> you own string, as for example:
>
> struct str_s {
>   size_t  len;
>   u_char  *s;
> }

The point under discussion was whether the language specification of C
has strings or just character arrays and whether string literals should
have been called that because whether they have terminating NUL is
‘context-dependent’.

To contradict what I've written, you're widening the discussion to
arbitrary data structures which can be used to implement a string.  That
is not relevant.

> However, assuming that the concept of string is a NUL-terminated char
> array, there's little in the core language about it.

But little is not nothing and so the C language does have both strings,
as the specification states that is what is sitting in main()'s argv[],
and string literals.

> Sure, string literals are the only true strings in the language

Your ‘Sure’ implies you're agreeing with someone.  If so, it's not me.
You're wrong on this point.

> You can prove that string literals are really strings (i.e.,
> NUL-terminated char arrays), by applying sizeof to them, and then
> looping over their contents to see that there's exactly one NUL byte
> at its last position.

Your definitions are wrong.  Proving "foo\0bar" ends with a NUL does not
make it a C string because a NUL-terminated char array is not a C string
if it contains a NUL before that.  A C string is zero or more non-NUL
chars followed by a NUL.

> > - In C, "foo" is a string literal.  That is the correct name as it is
> >not a C string because a string literal may contain explicit NUL bytes
> >within it which a string may not: "foo\0bar".
>
> I wouldn't discard them as string literals only for that.

I'm not discarding them as anything.  I am pointing out that according
to the language definition, "foo\0bar" is a string literal but not a C
string because of the embedded NUL thus the distinction is necessary and
terms are needed for each.

> Writing by accident a NUL byte is not usual, anyway.

I didn't claim it was.  I was arguing why ‘they should not have been
called string literal’ is wrong and that whether they get a NUL
terminator is not ‘context dependent’.

> > - A character array may be initialised by a string literal.  Successive
> >elements of the array are set to the string literal's characters,
> >including the implicit NUL if there is room.
> > 
> >  char two[2] = "foo";   // 'f' 'o'
> >  char   three[3] = "foo";   // 'f' 'o' 'o'
> >  charfour[4] = "foo";   // 'f' 'o' 'o' '\0'
> >  charfive[5] = "foo";   // 'f' 'o' 'o' '\0' '\0'
> >  char implicit[] = "foo";   // 'f' 'o' 'o' '\0'
>
> Ahh my friend, you're too used to some dialect of C that allows this,
> I believe.  ISO C11 doesn't, and I'm guessing any older ISO C versions
> behave in the same way:
>
> $ cat str.c
>  char two[2] = "foo";   // 'f' 'o'
>  char   three[3] = "foo";   // 'f' 'o' 'o'
>  charfour[4] = "foo";   // 'f' 'o' 'o' '\0'
>  charfive[5] = "foo";   // 'f' 'o' 'o' '\0' '\0'
>  char implicit[] = "foo";   // 'f' 'o' 'o' '\0'
>
> $ cc str.c -Wpedantic -pedantic-errors
> str.c:1:23: error: initializer-string for array of ‘char’ is too long
>  1 | char two[2] = "foo";   // 'f' 'o'
>|   ^

You are showing compiler output and claiming its error proves the
standard.  It would be handier to have a reference to the standard.

Here's a compiler which has been told I want C11.

$ gcc -std=c11 -c str.c
str.c:1:19: warning: initializer-string for array of chars is too long
 char two[2] = "foo";   // 'f' 'o'
   ^
$ objdump -sj .data str.o

str.o: file format elf64-x86-64

Contents of section .data:
  666f666f 6f666f6f 00666f6f 666f  fofoofoo.foo..fo
 0010 6f00 o.

C Strings and String Literals. (Was: Pascal rides again)

2022-11-13 Thread Ralph Corderoy
Hi Branden,

> C doesn't _really_ have strings, except at the library level.
> It has character arrays and one grain of syntactic sugar for encoding
> "string literals", which should not have been called that because
> whether they get the null terminator is context-dependent.
>
> char a[5] = "fooba";
> char *b = "bazqux";
>
> I see some Internet sources claim that C is absolutely reliable about
> null-terminating such literals, but I can't agree.  The assignment to
> `b` above adds a null terminator, and the one to `a` does not.  This
> is the opposite of absolute reliability.  Since I foresee someone
> calling me a liar for saying that, I'll grant that if you carry a long
> enough list of exceptional cases for the syntax in your head, both are
> predictable.  But it's simply a land mine for the everyday programmer.

- C defines both string literals and strings at the language level,
  e.g. main()'s argv[] is defined to contain strings.

- In C, "foo" is a string literal.  That is the correct name as it is
  not a C string because a string literal may contain explicit NUL bytes
  within it which a string may not: "foo\0bar".

- A string literal has an implicit NUL added at its end thus "foo" fills
  four bytes.

- A character array may be initialised by a string literal.  Successive
  elements of the array are set to the string literal's characters,
  including the implicit NUL if there is room.

char two[2] = "foo";   // 'f' 'o'
char   three[3] = "foo";   // 'f' 'o' 'o'
charfour[4] = "foo";   // 'f' 'o' 'o' '\0'
charfive[5] = "foo";   // 'f' 'o' 'o' '\0' '\0'
char implicit[] = "foo";   // 'f' 'o' 'o' '\0'

That's it.

- The string literal is reliably terminating by a NUL.
- It is not context dependent whether a string literal has a terminating
  NUL.
- It is absolutely reliable and clearly stated in the C standard and in
  any other C reference worth its salt.
- There is no need to ‘carry a long enough list of exceptional cases for
  the syntax in your head’.
- An ‘everyday C programmer’ will know this simple behaviour by dint of
  being a C programmer who writes it every day; there is no landmine
  upon which to step.  :-)

Hope that helps clear up this corner of C.

-- 
Cheers, Ralph.



Re: [groff] 19/40: [devpdf]: Tweak generation of "download" file.

2022-11-13 Thread Ralph Corderoy
Hi Deri,

> Sorry, this needs a revert too.

Are speculative ‘feature’ branches worth using for this kind of thing?
They could just be deleted after perhaps being merged when there's
agreement.

-- 
Cheers, Ralph.



sizeof in Macros. (Was: Specifying dependencies more clearly)

2022-11-10 Thread Ralph Corderoy
Howdy Alejandro,

> > Okay, here we go for a rant.

Consider the cost of lost opportunities.

> Since I wrote the code from memory, I had a few typos, but the idea
> was there...
>
> >      typedef struct {
> >      size_t  length;
> >      u_char  *start;
> >      } str_t;
> > 
> >      #define length(s)   (sizeof(s) - 1)
> > 
> >      #define str_set(str, text)  do   \
> >      {    \
> >      (str)->length = length(test);    \
> >      (str)->start = (u_char *) text;  \
>  } while (0)

And s/test/text/.

The lack of parenthesis around ‘text’ in the assignment to ‘start’ looks
wrong.

> > Of course, cowboy programmers don't need terminating NUL bytes

The Unix filesystem didn't terminate full-length filenames in the
sixteen-byte directory entries; two for the inode number, fourteen bytes
for the filename.

Not using terminating NULs allows substrings to be sliced from the
original without copying the bytes.

> > And the use of that macro is typically for things like:
>
> str_t  str;
> str_set(&str, "foo");
>
> > And then some new programmer in the team writes a line of code
> > that's something like:
>
> str_set(&str, cond ? "someword" : "another");
>
> > Then testing reveals some issue if cond is true.  You see only
> > "somewor"; hmm. 
...
> > I quickly realize that it is due to the ternary operator decaying
> > the array into a pointer and sizeof later doing shit.
...
> > My patch just changes
> >  #define length(s)   (sizeof(s) - 1)
> > to:
> >  #define length(s)   (nitems(s) - 1)
> >
> > (nitems() is defined to be the obvious sizeof division (called
> > ARRAY_SIZE(9) in the Linux kernel)

The bug is the original macro str_set() doesn't document its ‘text’
parameter must be a string constant.  Macros I have to hand are more
explicit: ‘string constant s’.

/* DIM gives the number of elements in the one-dimensional array a. */
#define DIM(a) (sizeof (a) / sizeof (*(a)))

/* LEN gives the strlen() of string constant s, excluding the
 * terminating NUL. */
#define LEN(s) (sizeof (s) - 1)

By passing a char pointer on what's presumably a 64-bit pointer machine,
P64, sizeof gives the pointer's eight bytes, one is knocked off as if
there were a terminating NUL, and the first seven bytes of "someword"
result.

Your patch switches to nitems() and your description makes me think it's
like DIM() above.  That takes the eight bytes of the pointer given by
the first sizeof and divides it by one, doing nothing, as the second
sizeof gives the size of a u_char.  So we still arrive at eight, knock
one off for the NUL as you write ‘nitems(s) - 1’, giving seven.  So I
don't see why this wouldn't also give "somewor" instead of "someword".
How did it fix the problem?

Unless the value needs to be known at compile time, using strlen(3)
would work in all cases, save much human time at the cost of a little
accumulated machine time, and in the simple common case of a string
constant it will probably be evaluated by any optimising compiler into a
constant.

typedef struct {
size_t length;
u_char *start;
} str_t;

#define str_set(str, text) \
do { \
(str)->length = strlen(text); \
(str)->start = (u_char *)(text); \
} while (0)

-- 
Cheers, Ralph.



Re: [groff] 27/33: eqn(1): Fix content and style nits.

2022-10-24 Thread Ralph Corderoy
Hi Branden,

> I've introduced or retained "Limitations" (sub)sections in several
> groff man pages; often I find it a better fit for discussion of issues
> than the historically well-attested "Bugs".  Against Ingo's advice I
> tend not to use that section title.  We have a bug tracker for bugs;
> as far as I know, Room 1127 in Murray Hill didn't.  "Limitations"
> seems like a better characterization of features

It may be, but I don't think that outweighs users knowing to search for
‘bugs’ when they want to see if the man page has that section on
encountering odd behaviour.

-- 
Cheers, Ralph.



Re: 3-word compound adjectives; the return of the '-'

2022-10-16 Thread Ralph Corderoy
Hi Branden,

> as there isn't in "hot fudge sundae" (even though it is only the fudge
> that is hot, not the whole sundae).

It's a ‘fudge sundae’ and a ‘hot-fudge sundae’ as the fudge is hot but
it is not a ‘hot sundae’ so ‘hot fudge sundae’ is wrong as it can't be
read as ‘hot and fudge sundae’.

> "two-fisted drinker"

Agreed.  He is not a ‘two and fisted drinker’.

> Similarly, we say "thirty year-old bug"

‘We’ don't.  :-)  It's a thirty-year-old bug.

-- 
Cheers, Ralph.



Re: 3-word compound adjectives; the return of the '-'

2022-10-16 Thread Ralph Corderoy
Hi Alejandro,

> In a patch to linux-man@ there's a 3-word compound adjective.  I don't
> know what are the rules for such a thing, and I'd like to have some
> consistency (and correctness) in the manual pages.
>
> I've seen many different things in the past;:
>
>   a) block device-based filesystems
>   b) block-device-based filesystems
>   c) block- device-based filesystems
>
> Which form would you recommend me to use?

‘Block filesystems’ is taken to mean one which sits on a block device.
To be specific, ‘block-device filesystems’ could be used.

One way which might help when there are multiple words is what ones can
be omitted and still give a correct meaning for this case.

Without hyphens, this hard to parse.

Deleting the cron job lock file fixed the problem.

There are three spaces so 2³ ways of hyphenating.

 1   Deleting the cron job lock file fixed the problem.
 2   Deleting the cron job lock-file fixed the problem.
 3   Deleting the cron job-lock file fixed the problem.
 4   Deleting the cron job-lock-file fixed the problem.
 5   Deleting the cron-job lock file fixed the problem.
 6   Deleting the cron-job lock-file fixed the problem.
 7   Deleting the cron-job-lock file fixed the problem.
 8   Deleting the cron-job-lock-file fixed the problem.

I'd go for 6 because omitting space-delimited adjectives still gives
accurate descriptions.  Not just correct English, but a meaning which
matches what's being described.  I'm taking ‘lock-file’ to be the noun;
some would write ‘lockfile’.

 6a  Deleting the  lock-file fixed the problem.
 6b  Deleting the cron-job lock-file fixed the problem.

If the cron-job context is already provided or if trying to shorten the
text then the first could be used if it wasn't ambiguous, but both are
correct.

In contrast,

 3a  Deleting the   file fixed the problem.
 3b  Deleting the  job-lock file fixed the problem.
 3c  Deleting the cron  file fixed the problem.
 3d  Deleting the cron job-lock file fixed the problem.

3a's ‘file’ is vague; removing all adjectives can be ambiguous.
It's not a file for a ‘job lock’ so 3b and 3d are out.
It's not a cron file either; a crontab(5) is an example of those.
So none are apt and 3 is ruled out.

> And now I found one more 
> :
>
>   d) block device\[en]based filesystems
>
> Where the en dash is used to distinguish it from 'a block filesystem 
> based on a device'.

Using an en-dash seems very odd-ball advice which I haven't seen in
print and wouldn't recommend.  It will jar the reader and make him
switch to wondering its meaning; just stick with hyphens.

You might find this site helpful; I know a non-English speaker who liked
its plain descriptions and many examples.  IIRC, it was started by a
ex-military Englishman who valued clear unambiguous reporting.
https://www.grammar-monster.com/punctuation/using_hyphens.htm has an
overview with links off to more detailed pages.

Oh, and it's most definitely a ‘three-year-old bug’ with all those
hyphens.

-- 
Cheers, Ralph.



Re: APA Format

2022-10-09 Thread Ralph Corderoy
Hi Aaron,

> refer -PS -e -p ./references.ref arb.tr | groff -ms -T pdf > output.pdf
>
> The references seem "close" to APA format but there are still issues
> with the formatting.

Here's what I have from what you've said so far.

$ cat references.ref
%K congress108-narcoterrorism
%Q ONE HUNDRED EIGHTH CONGRESS
%T "NARCO-TERRORISM: INTERNATIONAL DRUG TRAFFICKING AND TERRORISM--A 
DANGEROUS MIX"
%O 
https://www.govinfo.gov/\%content/\%pkg/CHRG-108shrg90052/\%html/CHRG-108shrg90052.htm
%D May 20, 2003
%L Senate Hearing
$
$ cat arb.tr
.na
The drug
.[
congress108-narcoterrorism
.]
hearing.
$
$ refer -PS -e -p references.ref arb.tr | nroff -ms -Tutf8 | cat -s

The drug (ONE HUNDRED EIGHTH CONGRESS, 2003) hearing.

References

ONE HUNDRED EIGHTH CONGRESS, 2003.
 ONE HUNDRED EIGHTH CONGRESS, "NARCO‐TERRORISM: INTERNA‐
 TIONAL DRUG TRAFFICKING AND TERRORISM‐‐A DANGEROUS MIX"
 (May 20, 2003). https://www.govinfo.gov/content/‐
 pkg/CHRG‐108shrg90052/html/CHRG‐108shrg90052.htm.

$

It would if you could detail how the above differs from what you'd like
as APA's style could be unknown to many of us on the list.

-- 
Cheers, Ralph.



Re: PDF outline not capturing Cyrillic text

2022-09-20 Thread Ralph Corderoy
Hi Branden,

> A shorter pole might be to establish a protocol for communication of
> Unicode code points within device control commands.  Portability isn't
> much of an issue here: as far as I know there has been no effort to
> achieve interoperation of device control escape sequences among
> troffs.
>
> That convention even _could_ be UTF-8, but my initial instinct is
> _not_ to go that way.  I like the 7-bit cleanliness of GNU troff
> output, and when I've mused about solving The Big Unicode Problem
> I have given strong consideration to preserving it, or enabling
> tricked-out UTF-8 "grout" only via an option for the kids who really
> like to watch their chrome rims spin.

Adding an option seems more needless complexity.
I am not a kid and have never had chrome rims.

> I realize that Heirloom and neatroff can both boast of this

I expect they just think it mundane.

> but how many people _really_ look at device-independent troff output?
> A few curious people, and the poor saps who are stuck developing and
> debugging the implementations, like me.  For the latter community,
> a modest and well-behaved format saves a lot of time.

I read it, diff(1) it, etc.  Skipping the device-specific rendering
often simplifies the comparison and removes another layer of potential
mud and error.

There's nothing great about the device-independent format being ASCII.
I strongly suggest using UTF-8 encoding for the Unicode runes that need
passing through to the device driver.  This will continue to make it
easy to read, grep, etc., and avoid yet another encoding format because
none of the existing ones are ‘good enough’.  The device drivers will
probably have UTF-8 parsing code to hand.

If groff ever reaches ‘UTF-8 everywhere’, an ad-hoc encoding for this
one thing will appear to be an anachronism when it is really a poor
recent decision.

-- 
Cheers, Ralph.



Re: 1.23: UTF-8 device: more display oddities

2022-09-18 Thread Ralph Corderoy
Hi Steffen,

These may give some ideas.  The first link allows your choice of
preview text.

- 
https://fonts.google.com/?category=Monospace&preview.text=troff%20%E2%80%98%E2%80%99%20%60%C2%B4%20%27&preview.size=14&preview.text_type=custom

- https://en.wikipedia.org/wiki/List_of_monospaced_typefaces

-- 
Cheers, Ralph.



Re: 1.23: UTF-8 device: more display oddities

2022-09-17 Thread Ralph Corderoy
Hi Steffen,

>   ` U+0060, GRAVE ACCENT, "backtick"
>
> is displayed as
>
>   ‘ U+2018, LEFT SINGLE QUOTATION MARK
>
> which in Liberation Mono (at least!) this reverses the direction of
> the tick.

This shows the problem.

pango-view --backend ft2 --header --font 'Liberation Mono' \
--dpi 96 --hinting none \
-t "$(troff -Tutf8 <<<'`'\'' \`\'\'' \(aq' | grotty)"

The font's designers have chosen to make the 6 and 9 quotes both lean to
the right and, as is common today, to have very slight bulges so they
are similar in appearance.

Fortunately, the command also provides a way to preview a better choice
of font.  :-)

-- 
Cheers, Ralph.



Re: 1.23: UTF-8 device produces mysterious characters

2022-09-14 Thread Ralph Corderoy
Hi Steffen,

> > > En dash would look nice, i could imagine.
> > 
> > Those ASCII ‘-’ above should be rendered as a hyphen in nicely
> > typeset output.  An en-dash is far too big.  Oh, there's another
> > one!
...
> But i was talking -Tutf8, and these are fixed width font

Given we use terminal emulators on pixel-based devices and our choice
of font, I still see a significant difference with

$ troff -Tutf8 <<<'Re-sort with \-u.' | grotty | grep .
Re‐sort with −u.
$

The hyphen is narrower so doesn't crash into the following rune.  It also
sits at a different height.  Whereas the option's dash is heavier and
more noticeable, as it should be given its significance.

-- 
Cheers, Ralph.



Re: 1.23: UTF-8 device produces mysterious characters

2022-09-13 Thread Ralph Corderoy
Hi Steffen,

> Hyphen is good at the end of line when a word is hyphenated, otherwise
> it is misplaced.

Not in English.  A hyphen may be used to join compound adjectives, as a
two-minute Google would show.  :-)  An ‘American-football player’ isn't
necessarily American whereas an ‘American football player’ is, but he
may be playing what the Yanks call soccer.  It's also used when adding a
prefix to a word, as in ‘re-sort’ to re-run sort(1) as ‘resort’ is where
one holidays.

> En dash would look nice, i could imagine.

Those ASCII ‘-’ above should be rendered as a hyphen in nicely typeset
output.  An en-dash is far too big.  Oh, there's another one!

-- 
Cheers, Ralph.



Re: Warn about long lines

2022-09-05 Thread Ralph Corderoy
Hi Alejandro,

> If you know a (hopefully trivial) filter that transforms any
> multi-byte sequences in exactly the number of bytes that will be
> visible (and hopefully those bytes should be similar to the original
> UTF-8 content), that would greatly help.
...
> tbl man1/memusage.1 \
> | eqn -Tutf8 \
> | troff -man -t -M ./etc/groff/tmac -m checkstyle -rCHECKSTYLE=3 \
>  -ww -Tutf8 -rLL=78n \
> | grotty -c \
> | col -b -x \
> | toplaintext \
> | (! grep -n '.\{80\}.' >&2)

I'm unclear on the problem trying to be solved.  grep(1) in a UTF-8
locale already treats a multi-byte UTF-8 sequence for one rune as
matched by ‘.’ which leaves the terminal's escape sequences, but they've
been disabled by grotty's ‘-c’, and over-striking for underlining, dealt
with by col(1).

In other words, what's wrong with

zcat man7/groff_char.7.gz |
eqn -Tutf8 |
troff -man -t -ww -Tutf8 -rLL=78n |
grotty -c |
col -pbx |
(! grep -n '.\{80\}.' >&2)

Does it miss overlong lines or wrongly report a short line as too long?
If so, an example would help target further suggestions.

-- 
Cheers, Ralph.



Re: groff maintainership, release, and blockers (was: groff 1.23.0.rc2 readiness)

2022-08-29 Thread Ralph Corderoy
Hi Branden,

> > Wouldn't it be better to simply abandon the the GNU roff project
...
> The FSF provides useful infrastructure.  Consider the risks of
> relocating to a site like GitHub

(Just a reminder that the FSF also provide https://savannah.nongnu.org
which is presumably the same underlying system.)

> > Sorry, i fail to understand that.  The acronym "RC" stands for
> > "release candidate".  I would define a "release candidate" as "a
> > version that is believed to be ready for release".
>
> Apparently we have a terminological and/or philosophical disagreement.

I urge resolving this disagreement by moving towards the standard
definitions which Ingo, other packagers, and the RotW use.

> Therefore, by that standard, any commit not marked "Test fails at this
> commit."..._is a release candidate_.

No, an RC is, all things going well, what should be the next release:
a culmination of effort into a neat boundary of features, code, tests,
and documentation.

Announcing an RC causes work for others and to lessen that, the changes
between RCs should be just what's needed to polish the RC to make it
nearer to being the release.  Ongoing development not intended for this
release should occur elsewhere.

> > These tests cause non-trivial work for significant numbers of
> > people, most of whom are *not* groff developers, so an RC should
> > only be made when the software is really believed to be ready - both
> > out of respect for testers' time and because releasing multiple RCs
> > will weary out testers and increase the likelihood of serious bugs
> > slipping into the release: some testers will not have the time to
> > test over and over again, so the more RCs you ship, the less test
> > coverage you get.
>
> I agree with this, but I reiterate that, in a sense, we've had
> literally thousands of RCs since groff 1.22.4.

No, commits are not RCs because an RC is announced as such.

> > In particular, i'm firmly convinced that issuing an RC while even
> > one single blocker issue is unresolved is a blatant contradiction.
> > Before an RC, all blockers must either be resolved or explicitly
> > re-classified as "not release critical" and re-scheduled for the
> > subsequent release.
>
> Well, then, in a sense we don't have _any_ blockers, because "the tree
> isn't red".

Right, blocker is being misused for what's an aspiration to see an issue
fixed in a release.  Again, this confuses by misusing a well-understood
term.

> On the other hand, applying your definition strictly, we'd almost
> never have any blocker bugs at all.

If a document unintentionally renders differently then the release
shouldn't occur.  An expanding corpus of real-life documents can be
turned to pixels for comparing with the golden version.  Then things
like https://savannah.gnu.org/bugs/index.php?24047 wouldn't slip
through.  Writing individual tests for features isn't a substitute; both
are needed.

> So, we could indeed stop using the Savannah Blocker severity for the
> purpose I'm employing it.

Should rather than could.

> But I ask you: what _good_ would that do, apart from satisfying your
> personal esthetic of release management?

It would avoid confusion by deviating from standard definitions which in
turn might put off contributors because of the impression it gives.

-- 
Cheers, Ralph.



Re: Warn about long lines

2022-08-29 Thread Ralph Corderoy
Hi Alejandro,

> > Piping grotty's output into ‘col -pbx’ may also be useful.
>
> Changed; thanks!  BTW, I didn't add -p, since I don't think it's
> necessary.

I added it so if you've thought wrong the unrecognised control sequences
will show up rather than be silenced.  That might quickly show it should
be omitted, but I thought it interesting to find out.

> Now it only uses normal ASCII chars.

Be aware you're now testing something different from what most of your
users will experience.

-- 
Cheers, Ralph.



Re: Warn about long lines

2022-08-28 Thread Ralph Corderoy
Hi Alejandro,

> > > | sed 's/\x1b\[[^@-~]*[@-~]//g' \
> > 
> > Out of interest, what's the sed(1) attempting to do?
> > (I know what it's doing.)
...
> I need to get rid of the highlighting.

There's probably a better way; see grotty(1)'s -c option or GROFF_NO_SGR
environment variable.  Piping grotty's output into ‘col -pbx’ may also
be useful.

> > No, the page dimensions, etc., are set within the troff source and
> > it's up to that source to allow for external specification if
> > required.  If a document is written without specifying them then the
> > defaults apply and the user can override these by using one of the
> > paper-size macro sets, e.g. ‘-ma4’ for A4 paper from groff's
> > a4.tmac.
> > 
> >  troff -ma4 letter.tr
>
> Ahh, it makes sense: normally documents are designed to be rendered in
> a specific device and characteristics.

Well, relative to the page, yes, but many documents needn't assume much
about the output device or page size.  If you look at a4.tmac, for
example, you'll see it just sets the page length to the paper size and
the line length to the paper width less an inch margin both sides.

.pl 29.7c
.ll 21c-2i

Boundaries of the text, like the header and footer position, are set
relative to the edge of the paper: the header a distance from the top;
the footer up from the bottom, not the top.  Thus the page length isn't
of immediate concern.

-- 
Cheers, Ralph.



Re: [PATCH] lint-man.mk: Use a pipeline instead of the groff(1) wrapper

2022-08-28 Thread Ralph Corderoy
Hi Alejandro,

> +DEFAULT_EQNFLAGS := -Tutf8
...
> +DEFAULT_TROFFFLAGS   += -Tutf8

I'd have a variable set to ‘utf8’ to ease changing to another output
device.

> - $(GROFF) $(GROFFFLAGS) $< \
> + $(TBL) <$< \

You've ditched passing the filename, instead using standard input.
This prevents the filename being passed through the pipeline which will
presumably result in poorer messages.

$ tbl /etc/passwd | grep '^\.'
.if !\n(.g .ab GNU tbl requires GNU troff.
.if !dTS .ds TS
.if !dTE .ds TE
 →  .lf 1 /etc/passwd
$

-- 
Cheers, Ralph.



Re: Warn about long lines

2022-08-27 Thread Ralph Corderoy
Hi Alejandro,

>   | sed 's/\x1b\[[^@-~]*[@-~]//g' \

Out of interest, what's the sed(1) attempting to do?
(I know what it's doing.)

> > grotty(1) doesn't decide where to split the line, it happens earlier
> > than that, so you want to affect groff(1).
...
> > - If it's groff, then use ‘-rLL=80n’; see groff_man(7).
>
> Ahh, this is what I needed.  I sometimes struggle to understand how
> groff divides the implementation.

That's part of the problem with groff(1) existing instead of users
learning the pipeline of constituent parts, what they do, and how they
communicate, e.g.

pic foo.tr | tbl | eqn -Tutf8 | troff -man -Tutf8 | grotty

The data between troff(1) and grotty(1), or any other post-processor,
is in groff_out(5) format and what marks to put where has already been
decided, fixing the line length.

The man macros realise the man command might be on a terminal of varying
width or producing ‘cat’ man pages for storage.  To allow the man
command to specify the line-length required, the man macro allows the LL
number register to override what length it would normally use for the
‘.ll’ command.  When the man command runs troff, it uses the -r option
to set the number register.

> It doesn't seem like a man(7)-specific thing

It is.

> I mean, when searching for an option that controls the line length,
> I expect it to be a generic option that will be applicable to groff as
> a whole

No, the page dimensions, etc., are set within the troff source and it's
up to that source to allow for external specification if required.  If a
document is written without specifying them then the defaults apply and
the user can override these by using one of the paper-size macro sets,
e.g. ‘-ma4’ for A4 paper from groff's a4.tmac.

troff -ma4 letter.tr

> I searched for /column /length /width in groff(1) and found nothing. :/

groff(1) is a confusing front-end program for the normal troff-based
pipeline so some more interesting options would be in troff(1), for
example.  But in this case, it's not a command-line option.  :-)

-- 
Cheers, Ralph.



  1   2   3   4   5   6   7   8   9   10   >