[bug #66481] [troff] `\w|x|` no longer works on the bleeding edge of Git

2024-11-30 Thread G. Branden Robinson
Update of bug #66481 (group groff):

  Status: In Progress => Fixed
 Open/Closed:Open => Closed
 Planned Release:None => 1.24.0

___

Follow-up Comment #10:


commit d052cb31d9982ef2ad1d776d828bd4370ce5e43e
Author: G. Branden Robinson 
Date:   Mon Nov 25 17:13:56 2024 -0600

[troff]: Fix Savannah #66481 and unfix #66099.

* src/roff/troff/input.cpp (is_char_usable_as_delimiter): Revert fix for
  bug #66009.  Unfortunately, `|` is in use in the wild as a delimiter,
  for instance in man pages for GNU awk, GNU grep, and GNU rcs.  Weaning
  people off of it (because it is a valid character in a numeric
  expression, and GNU troff has never accepted most other such
  characters as delimeters,{*} whereas AT&T troff accepted them all)
  looks to be a multi-stage, multi-year process.

Fixes .  Thanks to Paul Eggert for
the report.

{*} For distorted values of "most"--both GNU and AT&T troffs accept any
basic Latin letter ([A-Za-z]) as a delimiter, a collection of 52
exceptions that quantitatively swallows the rule.  Pragmatically,
few *roff document authors past or present seem to have been
adventurous enough to exercise this freedom.




___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #66481] [troff] `\w|x|` no longer works on the bleeding edge of Git

2024-11-25 Thread G. Branden Robinson
Follow-up Comment #9, bug #66481 (group groff):

Hi Paul,

At 2024-11-25T15:09:02-0500, Paul Eggert wrote:
> Follow-up Comment #8, bug #66481 (group groff):
>> +  case '|':
>> +error("support for '|' as an argument delimiter is deprecated and"
>> + " will be withdrawn in a future release");
>
> A quick survey of what's installed on my desktop suggests that this
> will cause diagnostics to be issued for the man pages for gawk, grep,
> and rcs.

I'm glad to hear it's not *worse*!  This bug report has caused me some
disquiet.

> The uses I see are at the top level, so how about if groff issues the
> warning only when \w is nested inside some other quoted construct? I'd
> rather not have to go through those long-working man pages to change
> '|' to some other character. And there should be no problem with \w|X|
> when it's not nested.

As I understand the code, this may be possible--there is a member
function `input_stack::get_level()` that I think would facilitate this.

However, having interpolation-depth-dependent grammar seems almost as
bad to me as having ambiguous grammar.

Here's the solution I'm considering now, having slept on the problem:

1.  Go back to accepting `|` for _groff_ 1.24, without diagnostics.
2.  We're planning on adding a `-wstyle` option to GNU _troff_ for
_groff_ 1.25 (bug #62776).  This can become one.  That way people
can run _groff_ 1.25 with that option over corpora of man pages and
see where this problem shows up.  I (and fingers crossed, others)
start submitting patches to affected man pages.
3.  Stick the above-quoted deprecation message in for _groff_ 1.26.
4.  Withdraw support for `|` as a delimiter in _groff_ 1.27 (but see
next paragraph).

Relatedly but distinguishably, it looks to me like I can make GNU
_troff_ *more* AT&T-compatible in compatibility mode (the `-C` option)
by skipping this entire `switch` statement when that mode is enabled.
I'm inclined to make that change adjacently to this one (item 1 above).

While I'm keen to reform *roff grammar in ways that sand down the warts
and sharp edges, I also want GNU _troff_ to render well documents
prepared with AT&T _troff_ in mind, as far as practicable.  (I believe I
am following in James Clark's tradition by not, in general, aiming for
bug-compatibility or identical indulgence of undefined behavior.)

> PS. I see that some traditional troff macro libraries in Solaris 10
> /usr/lib/tmac use control-G instead of ', under the theory that user
> strings never contain control-G.

Yes.  It was a strategem that was (a) unergonomic, (b) esoteric, and (c)
inadequate to the purpose.  If a finite-state automaton could simulate a
pushdown automaton, Chomsky would have told us so.

> I'd hate to have to do that sort of thing.

Fear not--I don't wish to impose that sort of yuckiness on anyone.

Regards,
Branden



___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #66481] [troff] `\w|x|` no longer works on the bleeding edge of Git

2024-11-25 Thread Paul Eggert
Follow-up Comment #8, bug #66481 (group groff):


> +  case '|':
> +error("support for '|' as an argument delimiter is deprecated and"
> + " will be withdrawn in a future release");

A quick survey of what's installed on my desktop suggests that this will cause
diagnostics to be issued for the man pages for gawk, grep, and rcs.

The uses I see are at the top level, so how about if groff issues the warning
only when \w is nested inside some other quoted construct? I'd rather not have
to go through those long-working man pages to change '|' to some other
character. And there should be no problem with \w|X| when it's not nested.

PS. I see that some traditional troff macro libraries in Solaris 10
/usr/lib/tmac use control-G instead of ', under the theory that user strings
never contain control-G. I'd hate to have to do that sort of thing.


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #66481] [troff] `\w|x|` no longer works on the bleeding edge of Git

2024-11-24 Thread Dave
Follow-up Comment #7, bug #66481 (group groff):

[comment #6 comment #6:]
> In "fallbacks.tmac" we have the following.
> 

> .  fchar \[u2012] \^\v'-.3m'\l'\w"\0"u'\v'+.3m'\^\" figure dash


Bug #63354 proposes to overhaul this definition.

(which I note for completeness, not because it affects your point about
nesting escapes)


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #66481] [troff] `\w|x|` no longer works on the bleeding edge of Git

2024-11-24 Thread G. Branden Robinson
Follow-up Comment #6, bug #66481 (group groff):

My concern is a real-world one, and _groff_ itself trips over it.

Exhibit:

In "fallbacks.tmac" we have the following.


.  fchar \[u2012] \^\v'-.3m'\l'\w"\0"u'\v'+.3m'\^\" figure dash


That's a "general argument" (`\w`) inside a (partially) "numeric argument"
(`\l`).

Are people willing to live with this proposed patch?


diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 61029bab6..36eecdc3f 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -2609,8 +2609,11 @@ static bool is_char_usable_as_delimiter(int c)
   case '(':
   case ')':
   case '.':
-  case '|':
 return false;
+  case '|':
+error("support for '|' as an argument delimiter is deprecated and"
+ " will be withdrawn in a future release");
+// fall through
   default:
 return true;
   }




___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #66481] [troff] `\w|x|` no longer works on the bleeding edge of Git

2024-11-24 Thread G. Branden Robinson
Follow-up Comment #5, bug #66481 (group groff):

Ugh.  Because escape sequences can be nested, to fix this I have to track more
state in the formatter: whether the current input level is "general" or
"numeric".

It would be nice if we could wean people off of using "|" as a delimiter just
as we did with:


  case '0':
  case '1':
  case '2':
  case '3':
  case '4':
  case '5':
  case '6':
  case '7':
  case '8':
  case '9':
  case '+':
  case '-':
  case '/':
  case '*':
  case '%':
  case '<':
  case '>':
  case '=':
  case '&':
  case ':':
  case '(':
  case ')':
  case '.':




___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #66481] [troff] `\w|x|` no longer works on the bleeding edge of Git

2024-11-24 Thread G. Branden Robinson
Update of bug #66481 (group groff):

  Status:   Confirmed => In Progress


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #66481] [troff] `\w|x|` no longer works on the bleeding edge of Git

2024-11-24 Thread Dave
Follow-up Comment #4, bug #66481 (group groff):

[comment #3 comment #3:]

> $ printf '\\l95n\\&*9\n' | ~/groff-1.22.3/bin/nroff | cat -s
> :1: cannot use character `9' as a starting delimiter
> 5n*9


> 
> If I had to wager, I'd bet that the foregoing input has been
> rejected by GNU _troff_ since day one.

For over 19 years, at least.

$ nroff --version
GNU nroff (groff) version 1.19.2
$ printf '\\l95n\\&*9\n' | nroff | cat -s
:1: cannot use character `9' as a starting delimiter
5n*9





___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #66481] [troff] `\w|x|` no longer works on the bleeding edge of Git

2024-11-24 Thread G. Branden Robinson
Follow-up Comment #3, bug #66481 (group groff):

I can fix this but GNU _troff_ is still going to be stricter than AT&T
_troff_, albeit not with respect to the `\w` escape sequence, which accepts
what we might call a "general" argument rather than a numeric expression
argument.

Consider use of the `\l` escape sequence, which accepts a numeric expression
argument, and optionally, a character to draw the line with.  If said
character would be valid in a numeric expression, it must be preceded by a
dummy character escape sequence. However, AT&T _troff_ still accepts
(apparently) _any_ delimiter that would be valid for more general input.


$ printf '\\l"5n\\&*"\n' | DWBHOME=~/dwb ~/dwb/bin/nroff | cat -s
*

$ printf '\\l|5n\\&*|\n' | DWBHOME=~/dwb ~/dwb/bin/nroff | cat -s
*

$ printf '\\l95n\\&*9\n' | DWBHOME=~/dwb ~/dwb/bin/nroff | cat -s
*

$ printf '\\l95n\\&*9\n' | nroff | cat -s
troff::1: error: character '9' is not allowed as a delimiter
5n*9


Moreover, in anticipation of gripes from certain quarters, I observe that this
behavior is not an instance of me being a jerkfaced fascist with respect to
input cleanliness.  Jackboot prints have long gouged the earth of our input
parser's hiking trail.


$ printf '\\l95n\\&*9\n' | ~/groff-1.22.3/bin/nroff | cat -s
:1: cannot use character `9' as a starting delimiter
5n*9


If I had to wager, I'd bet that the foregoing input has been rejected by GNU
_troff_ since day one.  If this sort of tomfoolery was once "commonly used" as
Paul said of `\w|`, 35 years of _groff_ influence have, I suspect, boiled it
away.

`printf '\\l95n\\&*9\n'` strikes me as too perverse a use case to support.
Even if people lean on me to support it, I feel confident that I'd stuff it
behind AT&T compatibility mode.  There are few, if any, reasons to use exotic
delimiters in GNU _troff_.[1]

So I will do as Paul suggests, but I will not go further at this time, and
maybe not ever.


5.6.5 Delimiters


[snip]

   Delimiter syntax is complex and flexible primarily for historical
reasons; the foregoing restrictions need be kept in mind mainly when
using 'groff' in AT&T compatibility mode.  GNU 'troff' keeps track of
the nesting depth of escape sequence interpolations, so the only
characters you need to avoid using as delimiters are those that appear
in the arguments you input, not any that result from interpolation.
Typically, ''' works fine.  *Note Implementation Differences::.


(The foregoing language also appears in section "Delimiters" of _groff_(7).)

[1] And if you want exotic, do what GNU _tbl_ does.  Select a bespoke special
character to serve as your delimiter.  (You don't even have to _define_ this
special character with the `char` request--since it is never used to format
anything, but only to delimit input, it never provokes a diagnostic.  Neat
trick!)

https://git.savannah.gnu.org/cgit/groff.git/tree/src/preproc/tbl/table.cpp?h=1.23.0#n29



___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #66481] [troff] `\w|x|` no longer works on the bleeding edge of Git

2024-11-24 Thread G. Branden Robinson
Update of bug #66481 (group groff):

 Summary: \w|x| no longer works in bleeding-edge groff =>
[troff] `\w|x|` no longer works on the bleeding edge of Git


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature