subject:"\[bug #63074\] \[troff\] support construction of arbitrary byte sequences in device control commands"

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

2024-01-16 Thread G. Branden Robinson

Follow-up Comment #29, bug#63074 (group groff):

Self-corrections...

> And since I changed `device` to stop readings its argument (there's really
only one, scanned until the end of the line, line `tm`), here is the "NEWS"
entry.

That was a botch.  It should say:

And since I changed `device` to stop reading its argument (there's really only
one, scanned until the end of the line, like `tm`) *in copy mode*, here is the
"NEWS" entry.

Also, there is goofery in the synopsis of `device` and `\X` in our Texinfo
manual (we need just @var{contents} and no ellipses), and I already have a
TODO screenshot on my tablet to remind me to fix that.


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

2024-01-16 Thread G. Branden Robinson

Follow-up Comment #28, bug#63074 (group groff):

Hi Deri,

[comment #27 comment #27:]
> Yes, I want .device to pass everything exactly "as-is" just as it does now,
all escapes left untouched.

Unfortunately, that is inconsistent with how \X works.


$ cat EXPERIMENTS/device-foolery.groff 
.sp
.device ps: \fB
.device ps: \s0
.device ps: \-
.device ps: \[*A]
.device ps: \[u0391]
$ ~/groff-1.22.3/bin/groff -Z EXPERIMENTS/device-foolery.groff | grep '^x X'
x X ps: \fB
x X ps: \s0
x X ps: \-
x X ps: \[*A]
x X ps: \[u0391]


The foregoing output is the same in _groff_ 1.22.4 and 1.23.0.  *But*...


$ cat EXPERIMENTS/backslash-X-foolery.roff 
.sp
\X'ps: \fB'\
\X'ps: \s0'\
\X'ps: \-'\
\X'ps: \[*A]'\
\X'ps: \[u0391]'
$ ~/groff-1.22.3/bin/groff -Z EXPERIMENTS/backslash-X-foolery.roff | grep '^x
X'
EXPERIMENTS/backslash-X-foolery.roff:4: a special character is invalid within
\X
EXPERIMENTS/backslash-X-foolery.roff:5: a special character is invalid within
\X
EXPERIMENTS/backslash-X-foolery.roff:6: a special character is invalid within
\X
x X ps: 
x X ps: 
x X ps: 
x X ps: 
x X ps: 


The foregoing output is the same in _groff_ 1.22.4.  In _groff_ 1.23.0, it
differs.


$ ~/groff-stable/bin/groff -Z EXPERIMENTS/backslash-X-foolery.roff | grep '^x
X'
x X ps: 
x X ps: 
x X ps: -
x X ps: 
x X ps: 


This was the outcome of bug #61401 (October 2021).

I'd like to get `.device` and `\X` treating their arguments the same way,
hence bug #64484.  You can still pass things representing arbitrary byte
sequences to the output driver--but if you want to represent them with _groff_
escape sequences, you will need to further escape them.


$ cat EXPERIMENTS/backslash-X-escapery.roff
.sp
\X'ps: \\fB'\
\X'ps: \\s0'\
\X'ps: \\-'\
\X'ps: \\[*A]'\
\X'ps: \\[u0391]'
$ ~/groff-HEAD/bin/groff -Z EXPERIMENTS/backslash-X-escapery.roff | grep '^x
X'
x X ps: \fB
x X ps: \s0
x X ps: \-
x X ps: \[*A]
x X ps: \[u0391]


And `device` request behavior is now (in Git) consistent with this.


$ cat ./EXPERIMENTS/device-escapery.groff
.sp
.device ps: \\fB
.device ps: \\s0
.device ps: \\-
.device ps: \\[*A]
.device ps: \\[u0391]
$ ~/groff-HEAD/bin/groff -Z EXPERIMENTS/device-escapery.groff | grep '^x X'
x X ps: \fB
x X ps: \s0
x X ps: \-
x X ps: \[*A]
x X ps: \[u0391]


Here is what our Texinfo manual now says (in Git) about `device` and `\X`.


5.34 Postprocessor Access
=

Two escape sequences and two requests enable documents to pass
information directly to an output driver or other postprocessor.  These
are useful for exercising device-specific capabilities that the 'groff'
language does not abstract or generalize; examples include the embedding
of hyperlinks and image files.  Device-specific functions are documented
in each output driver's man page, such as 'gropdf(1)', 'grops(1)', or
'grotty(1)'.

 -- Request: .device xxx ...
 -- Escape sequence: \X'''xxx ...'''
 Embed all XXX arguments into GNU 'troff' output as parameters to an
 'x X' device control command.(1)  (*note Postprocessor
 Access-Footnote-1::) The meaning and interpretation of such
 parameters is determined by the output driver or other
 postprocessor.

 The 'device' request strips an initial neutral double quote from
 CONTENTS to allow embedding of leading spaces.

 Within a device control command, the escape sequences '\&', '\)',
 '\%', and '\:' are ignored; '\' and '\~' are converted to
 single space characters; and '\\' has its escape character
 stripped.  So that the basic Latin subset of the Unicode character
 set(2) (*note Postprocessor Access-Footnote-2::) can be reliably
 encoded in device control commands, seven special character escape
 sequences ('\-', '\[aq]', '\[dq]', '\[ga]', '\[ha]', '\[rs]', and
 '\[ti]') are mapped to basic Latin characters; see the
 'groff_char(7)' man page.  For this transformation, character
 translations and special character definitions are ignored.(3)
 (*note Postprocessor Access-Footnote-3::)

 Escape sequences other than the foregoing in device control command
 may be ignored, or produce an error.

 A device control command issued with the 'device' request will not
 be reflected in the output unless a partially collected line exists
 at least once in the top-level diversion (recall *note
 Diversions::).  When experimenting with such device controls in
 minimal documents, a 'br' request will ensure this to be the case.

 If the 'use_charnames_in_special' directive appears in the output
 device's 'DESC' file, the use of special character escape sequences
 is _not_ an error; they are simply output verbatim (with the
 exception of the seven mapped to Unicode basic Latin characters,
 discussed above).  'use_charnames_in_special' is currently employed
 only by 'grohtml'.

[.devicem and \Y snipped]


And since I changed `device` to stop readings its argument (there's really
only one, scanned until the end

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

2024-01-15 Thread Deri James

Follow-up Comment #27, bug#63074 (group groff):

[comment #25 comment #25:]
> I hear your expression of urgency but I don't think "stringhex" is good
long-term solution to what ails us.  You are correct in comment #22 that I did
not correctly apprehend at first what it was for.  I thought you developed it
because we had no way to reliably transmit arbitrary byte sequences to device
control commands.  But we did, sort of--it just needed to be made consistent
and reliable.  That it wasn't is what my test case attempts to illustrate and
what the fix to bug #64484 attempts to prove.

Yes, I want .device to pass everything exactly "as-is" just as it does now,
all escapes left untouched. I don't understand your opposition to stringhex.
If I remember correctly you didn't like "shouty" capitals in .SH/Sh and people
generally agreed, so you added .stringup/down to enforce the case change -
fair enough. Similarly people on the list have asked for pdf support for
non-latin languages, and to support all the features currently supported for
latin languages I need stringhex or a suitable alternative. So far none of
your suggestions will solve the problem, which means that perhaps I have not
explained it well enough. Let's look at a real example in the file mom-pdf.mom
in the mom/examples directory, it contains:-

.HEADING 2 NAMED internal "Creating internal links"

.PP
Furthermore, \*[cod]NAMED\|\*[codx] stores the text of the
heading for use later on when linking to it (see
.PDF_LINK internal SUFFIX ). +
If headings are being numbered, the heading number is prepended.

Which produces what is shown in the attached png file. However, if this was
written by a greek person (or google translate!) it might look more like:-

.HEADING 2 NAMED εσωτερικό "Δημιουργία εσωτερικών
συνδέσεων"

.PP
Επιπλέον, \*[cod]NAMED\|\*[codx] αποθηκεύει το
κείμενο του επικεφαλίδα
για χρήση αργότερα κατά τη σύνδεση με
αυτήν (βλ
.PDF_LINK εσωτερικό SUFFIX ). +
Εάν οι επικεφαλίδες αριθμούνται,
προηγείται ο αριθμός της επικεφαλίδας.

In either case the string at the end of the HEADING line has to be available
(using the .PDF_LINK key) so that the "+" symbol can be replaced by the given
text. This means that whatever is given as the NAMED  has to be used as
part of the array which holds the heading strings and for text converted to
\[u] form this would not be a legal identifier name, hence the need to
munge it into something acceptable which can be reconstituted.

Your example using \A'' is not helpful since it would reject any non-latin
language being used as a NAMED .

> No, I accept your premise that the main driver behind "stringhex" was this:
>
> > The problem lies in the original pdfmark API, if you look at the
pdfmark.pdf you will see that in the sections describing .pdfhref M and
.pdfhref L which both refer to a "dest-name" and "descriptive text", it says
that if a dest-name is not given the first word in the description is used as
the dest-name.
>
> I appreciate your explanation.  If the problem was with the pdfmark API,
then let's fix the pdfmark API.
>
> In particular, this:
>
> > if a dest-name is not given the first word in the description is used as
the dest-name
>
> ...strikes me a short-sighted, especially without any validation going on. 
A textual description of a hyperlink/bookmark might contain all sorts of crazy
stuff.  (Like Cyrillic or CJK characters or, worse, motion or type-size or
font-selection escape sequences.)

The pdfmark API was around well before I wrote gropdf, so I just had to
support it.

What about embedded calls to macros (i.e. \*[macro arg]) I deal with that as
well. Here's an example from mom-pdf.mom again:-

.AUTHOR "Deri James" "\*[UP .5p]and" "Peter Schaffter"

Mom's AUTHOR macro predates pdf integration so it made sense to just add:-

.pdfinfo /Author \\$*

(or similar) to the macro. Which produces in the grout file:-

x X ps:exec [/Author (Deri James, \v'-.5p'and, Peter Schaffter,) /DOCINFO
pdfmark

(using the pdf.tmac on my branch) and if you look at the finished pdf you will
see a pristine author attribution in the document properties.

>Assuming that it was going to be a well-behaved sequence of ASCII bytes or
even that one could "sanitize" or "cln" one's way through was a hopeless
notion.  That won't be practical until we have a string iterator and more
conditional expressions that enable the user of an iterator to identify the
type of each item in an iterated string/macro/diversion.  But if I understand
you correctly, we don't need that fancy new stuff to solve the present
problem, with stringhex or without.

No such assumption is made, I expect the data to be dirty, I have removed
an*cln, asciify and never used sanitize. Stringhex is to make the NAMED 
acceptable to be used as a string name. I need none of your "fancy new
stuff".

> It would probably benefit me to look up Peter's documentation on _mom_'s
"HEADING" macro.  It is a bit baffling to me that one has to repeat arguments
like

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

2024-01-14 Thread G. Branden Robinson

Follow-up Comment #26, bug#63074 (group groff):

In my seat-of-the-pants example, I committed the same crime I complained of:
lack of input validation.  My point might be easier to understand if I add it
in.


.nr refno 1
.de DEFREF
.  nr refno +1
.  if !\A'\\$1' \
.ab bogus tag '\\$1' in reference definition; stick to ASCII
.  ds ref*id!\n[refno]!tag \\$1
.  ds ref*id!\n[refno]!author \\$2
.  ds ref*id!\n[refno]!desc \\$3
.  ds ref*id!\n[refno]!year \\$4
..
.DEFREF story "Dupr\[e aa]" "Best \%Story\%Book Ever" 1989
.DEFREF Гуляйпольщина "Махновщина"
"весенне-летнего" 1919 \" gets rejected




___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

2024-01-14 Thread G. Branden Robinson

Follow-up Comment #25, bug#63074 (group groff):

[comment #24 comment #24:]
> Bug #64484 is marked as fixed.

Right, but I believe there was a relationship nevertheless.

> I already have a reliable way to pass byte sequences in device control
commands, .stringhex.

Okay.  But it didn't do anything about this failing test case (which
admittedly didn't exist until I started to research this issue).

https://git.savannah.gnu.org/cgit/groff.git/diff/src/roff/groff/tests/device-control-special-character-handling.sh?id=974c063f0a9e1ef6c0d2cac4755a3b9d6e925b0d

Of which the salient part is the actual test input:


input='.nf
\X#bogus1: esc \%man-beast\[u1F63C]\\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]#
.device bogus1: req \%man-beast\[u1F63C]\\[u1F00]
-\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]
.ec @
@X#bogus2: esc @%man-beast@[u1F63C]@@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]#
.device bogus2: req @%man-beast@[u1F63C]@@[u1F00]
-@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]'


...which looks pretty noisy but tests several things.

1.  Use of \X escape sequences versus `device` requests.
2.  Use of \% escape sequences in device control commands (do they get
removed?).
3.  Use of ordinary hyphens in device control commands (do they get converted
to some crazy Unicode thing?).
4.  Use of special character escape sequences to represent ASCII characters in
device control commands and which should therefore be passed through as
ASCII.
5.  Robustness in the face of a changed roff escape character.  This did *not*
work prior to the bug #64484 fix.

> This bug was previously named "warning messages when using special
characters in TITLE or AUTHOR" and the attached cyrillic.pdf shows both the
pdf title and author shown with cyrillics and no warnings. So I would say this
one is dependent on bug #65098, i.e. merge the rest of my branch.

I hear your expression of urgency but I don't think "stringhex" is good
long-term solution to what ails us.  You are correct in comment #22 that I did
not correctly apprehend at first what it was for.  I thought you developed it
because we had no way to reliably transmit arbitrary byte sequences to device
control commands.  But we did, sort of--it just needed to be made consistent
and reliable.  That it wasn't is what my test case attempts to illustrate and
what the fix to bug #64484 attempts to prove.

No, I accept your premise that the main driver behind "stringhex" was this:

> The problem lies in the original pdfmark API, if you look at the pdfmark.pdf
you will see that in the sections describing .pdfhref M and .pdfhref L which
both refer to a "dest-name" and "descriptive text", it says that if a
dest-name is not given the first word in the description is used as the
dest-name.

I appreciate your explanation.  If the problem was with the pdfmark API, then
let's fix the pdfmark API.

In particular, this:

> if a dest-name is not given the first word in the description is used as the
dest-name

...strikes me a short-sighted, especially without any validation going on.  A
textual description of a hyperlink/bookmark might contain all sorts of crazy
stuff.  (Like Cyrillic or CJK characters or, worse, motion or type-size or
font-selection escape sequences.)  Assuming that it was going to be a
well-behaved sequence of ASCII bytes or even that one could "sanitize" or
"cln" one's way through was a hopeless notion.  That won't be practical until
we have a string iterator and more conditional expressions that enable the
user of an iterator to identify the type of each item in an iterated
string/macro/diversion.  But if I understand you correctly, we don't need that
fancy new stuff to solve the present problem, with stringhex or without.

It would probably benefit me to look up Peter's documentation on _mom_'s
"HEADING" macro.  It is a bit baffling to me that one has to repeat arguments
like this:


.HEADING 1 NAMED Гуляйпольщина "Гуляйпольщина"
...
.PDF_LINK Гуляйпольщина PREFIX ( SUFFIX ) "see: +"


> Where the "+" is replaced by the contents of the string register
pdf:look(Гуляйпольщина), which would actually be a string of
\[u] nodes, so would generate an error. This is what stringhex is for, to
hide the contents so that groff does not see it as a sequence of nodes. The
ideal solution would be to allow string registers to have an attribute (say
"glass") which signals that groff should never try to interpret its contents,
i.e. operate as if the escape mechanism was turned off just for the contents
of that register, and have a way of turning that attribute on/off or an escape
which sets the attribute for the enclosed string.

Right now I don't understand why we would need to elaborate a fairly
fundamental *roff language data type (the string) with a "glass" attribute
when, if you have a list indexed by a number or a _valid_ identifier, you can
simply define a string using a list item's index as a prefix.


.nr refno 1
.de DEFREF
.  nr refno +1
.  ds ref*id!\n[refno]!tag \\$1
.  ds ref*id!\n[refno]!author \\$2
.  ds

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

2024-01-14 Thread Deri James

Follow-up Comment #24, bug#63074 (group groff):

Bug #64484 is marked as fixed. I already have a reliable way to pass byte
sequences in device control commands, .stringhex. This bug was previously
named "warning messages when using special characters in TITLE or AUTHOR" and
the attached cyrillic.pdf shows both the pdf title and author shown with
cyrillics and no warnings. So I would say this one is dependent on bug #65098,
i.e. merge the rest of my branch.

(file #55567)

___

Additional Item Attachment:

File name: cyrillic.pdf   Size:27 KB



AGPL NOTICE

These attachments are served by Savane. You can download the corresponding
source code of Savane at
https://git.savannah.nongnu.org/cgit/administration/savane.git/snapshot/savane-3f5b69a3b837951a0e5c0b7730ee347c798a8844.tar.gz


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

2024-01-14 Thread G. Branden Robinson

Update of bug#63074 (group groff):

  Depends on: => bugs #64484

___

Follow-up Comment #23:

Adding dependency on bug #64484, because to achieve the aim of this ticket, a
reliable means of transmitting encoded byte sequences to the output device is
necessary.  Not all devices need to use the same convention, though that would
be nice.

I still owe Deri a reply to comment #22.


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

2024-01-09 Thread Deri James

Follow-up Comment #22, bug#63074 (group groff):

Whew, rather a lot to cover!

First the original "bug" was "fixed" by including -f U-T in the command.

Next it became a wish to include non-latin character in the bookmarks. This is
now working on my branch, waiting for Branden's integration.

Then it became a discussion on Branden's for iterator being used as a
replacement for stringhex, and using it to send arbitrary bytes in device
control commands, and his recent discovery that you can already do this. My
statement in 2022 (see comment #11):-

"If I dropped the .asciify from pdf.tmac it would mean all the \[u]
strings would reach the post processor gropdf, which could then assemble a
UTF-16 string from the hex numbers."

Which is exactly what I have done in the new pdf.tmac/gropdf.

I think Branden has not fully grasped the reason why stringhex is required.
The problem lies in the original pdfmark API, if you look at the pdfmark.pdf
you will see that in the sections describing .pdfhref M and .pdfhref L which
both refer to a "dest-name" and "descriptive text", it says that if a
dest-name is not given the first word in the description is used as the
dest-name.

The macros create a string like:-

.ds pdf:look(\\*[dest-name]) descriptive text

Since descriptive text can include any groff escape this means that dest-name
may also include any groff escape occurring in the first word. The reason it
creates these string registers is to support mom features such as:-

.HEADING 1 NAMED Гуляйпольщина "Гуляйпольщина"
Гуляйпольщина (укр. Гуляйпольщина) или
Махновщина, также Вольная
Территория — повстанческий район в
Северном Приазовье в период
Гражданской войны 1918—1921 гг.
.PP
And so it goes on.
.PDF_LINK Гуляйпольщина PREFIX ( SUFFIX ) "see: +"

Where the "+" is replaced by the contents of the string register
pdf:look(Гуляйпольщина), which would actually be a string of
\[u] nodes, so would generate an error. This is what stringhex is for, to
hide the contents so that groff does not see it as a sequence of nodes. The
ideal solution would be to allow string registers to have an attribute (say
"glass") which signals that groff should never try to interpret its contents,
i.e. operate as if the escape mechanism was turned off just for the contents
of that register, and have a way of turning that attribute on/off or an escape
which sets the attribute for the enclosed string.

I don't know if this is helpful, and helps you understand why stringhex is
being used.


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

2024-01-09 Thread G. Branden Robinson

Update of bug#63074 (group groff):

  Status: In Progress => Need Info  

___

Follow-up Comment #21:


[comment #20 comment #20:]
> I'll bolster the unit tests I have pending to make sure this works okay for
all output devices.

Bolstered and confirmed.

Setting ticket to "Need Info" status to solicit feedback from Deri in
particular, and anyone else who wants to comment.


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

2024-01-09 Thread G. Branden Robinson

Follow-up Comment #20, bug#63074 (group groff):

Sleeping on the problem provoked illumination, as it sometimes does.

In any other _roff_ context, how would we smuggle something that looks like an
escape sequence into a different interpretation context?

By escaping it.


$ cat
EXPERIMENTS/device-control-request-with-double-escape-special-character-and-partially-collected-line.roff
.device ps: nop \\[u007E]
.br
$ groff -Z
EXPERIMENTS/device-control-request-with-double-escape-special-character-and-partially-collected-line.roff

x T ps
x res 72000 1 1
x init
p1
V12000
H72000
x font 5 TR
f5
s1
V12000
H72000
md
DFd
x X ps: nop \[u007E]
n12000 0
x trailer
V792000
x stop
$ cat
EXPERIMENTS/device-control-escape-sequence-with-double-escape-special-character.roff
\X'ps: nop \\[u007E]'
$ groff -Z
EXPERIMENTS/device-control-escape-sequence-with-double-escape-special-character.roff
x T ps
x res 72000 1 1
x init
p1
V12000
H72000
x font 5 TR
f5
s1
V12000
H72000
md
DFd
x X ps: nop \[u007E]
n12000 0
x trailer
V792000
x stop


I get the same output from _groff_ 1.22.4, 1.23.0, and Git HEAD.

This feels like a win.  Needs some discussion in documentation as the
assumption of many users may be "you can't do that".  And to mention the fact
that it's up to the postprocessor to interpret the syntax (and even then
possibly only in the context of a specific device control command).

I'll bolster the unit tests I have pending to make sure this works okay for
all output devices.


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

2024-01-08 Thread G. Branden Robinson

Follow-up Comment #19, bug#63074 (group groff):

Tricky.  By the time `encode_char` sees the input it has already been
tokenized.  This means that what is of interest for my approach is a "token"
node, and that in turn means that input that looks like


\X'\[u007E]'


has already effectively been rewritten as if it were


\X'\[ti]'


...so that sucks.  (AIUI, "uniglyph.cpp" does this remapping.)

More thinking required.  A different escape notation might be required after
all, but I'm not resigned to that possibility YET.


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

2024-01-08 Thread G. Branden Robinson

Update of bug#63074 (group groff):

  Status:None => In Progress
 Assigned to:None => gbranden   
 Summary: [troff] need a way to embed non-Basic Latin glyphs
in device control commands => [troff] support construction of arbitrary byte
sequences in device control commands

___

Follow-up Comment #18:

This is a skeleton of my proposal, an alternative to Deri's new `stringhex`
request in his branch, "deri-gropdf-ng".

Originally posted in a recent duplicate bug #65137.


diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index a0b987634..f6e5b1279 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -5571,7 +5571,7 @@ static node *do_non_interpreted()
   return new non_interpreted_node(mac);
 }

-static void encode_char(macro *mac, char c)
+static void encode_char_for_device_control(macro *mac, char c)
 {
   if (c == '\0') {
 if (tok.is_stretchable_space()
@@ -5600,6 +5600,13 @@ static void encode_char(macro *mac, char c)
   else if (strcmp("ti", sc) == 0)
mac->append('~');
   else {
+   // TODO: Support '\[u]' for all devices to support
+   // transmission of arbitrary data to the output device.  It's a
+   // misnomer--this doesn't necessarily represent a Unicode code
+   // point, but this syntax beats inventing a new one for this
+   // esoteric purpose.  Whether one sends \[uAABB],
+   // \[u00AA]\[u00BB], or the latter's byte-swapped counterpart is
+   // an interface detail that the output device must specify.
if (font::use_charnames_in_special) {
  if (sc[0] != (char)0) {
mac->append('\\');
@@ -5612,9 +5619,14 @@ static void encode_char(macro *mac, char c)
mac->append(']');
  }
  else
- error("special character '%1' cannot be used within"
-   " device control escape sequence", sc);
+   error("special character '%1' cannot be used within a"
+ " device control escape sequence", sc);
}
+   else
+ // TODO: Put '\[u]' support here.  Don't allow
+ // '\[u_]'.
+ error("special character '%1' cannot be used within a device"
+   " control escape sequence", sc);
   }
 }
 else if (!(tok.is_hyphen_indicator()
@@ -5668,7 +5680,7 @@ static node *do_special()
   c = '\b';
 else
   c = tok.ch();
-encode_char(, c);
+encode_char_for_device_control(, c);
   }
   return new special_node(mac);
 }




___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

[bug #63074] [troff] support construction of arbitrary byte sequences in device control commands

12 matches

Site Navigation

Mail list logo

Footer information