[bug #67735] [long-term] use "unsigned char" for troff's internal character type with militant consistency

2025-12-30 Thread G. Branden Robinson
Follow-up Comment #12, bug #67735 (group groff):


commit 71ccefcf14694698ab8fbacf1219094eea24bdc9
Author: G. Branden Robinson 
Date:   Tue Dec 30 02:09:37 2025 -0600

[troff]: Work on Savannah #67735.

* src/roff/troff/node.h (struct node):
* src/roff/troff/node.cpp (class break_char_node, node::get_break_code):
  Demote type of hyphenation codes from `int` to `unsigned char`, since
  the range of the latter is as much as has ever been stored in them
  anyway.  (This was discovered the hard way in Savannah #66919.)
  Retype `break_code` and `prev_break_code` private member variables and
  `get_break_code()` public member function accordingly.




___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/




[bug #67735] [long-term] use "unsigned char" for troff's internal character type with militant consistency

2025-12-29 Thread G. Branden Robinson
Follow-up Comment #11, bug #67735 (group groff):


commit 7b85efe63117bf9d6d33911dc962bd33a611ea50
Author: G. Branden Robinson 
Date:   Thu Dec 25 18:58:23 2025 -0600

src/roff/troff/input.cpp: Work on Savannah #67735.

* src/roff/troff/input.cpp (do_define_macro): Assign `char`-valued
  literals to temporary `int` storing value we got from
  `read_char_in_copy_mode()` (which calls `input_iterator::fill()`,
  which in turn calls getc(3)), since it is destined to become an input
  character--if we don't have to handle EOF.  Since we must, drop
  C-style type casts to `unsigned char` while still scanning input.
  Absence of a premature EOF established, use `static_cast` operator to
  demote this `int` to `unsigned char` before appending it to a macro.




___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/




[bug #67735] [long-term] use "unsigned char" for troff's internal character type with militant consistency

2025-12-19 Thread G. Branden Robinson
Follow-up Comment #10, bug #67735 (group groff):


commit 7af005dc71c525dec84967f0eca0696de4dfd769
Author: G. Branden Robinson 
Date:   Fri Dec 19 14:00:23 2025 -0600

src/roff/troff/token.h: Work on Savannah #67735.

* src/roff/troff/token.h (token::ch): Compare member variable `c` to
  literal of `unsigned char` type, not a character literal of undefined
  signedness.

Continues the long process of fixing Savannah #67735.




___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #67735] [long-term] use "unsigned char" for troff's internal character type with militant consistency

2025-12-19 Thread G. Branden Robinson
Follow-up Comment #9, bug #67735 (group groff):


commit 6dcbaa0a7a9bf86370b22d604a71c6b5f2215898
Author: G. Branden Robinson 
Date:   Fri Dec 19 14:57:45 2025 -0600

[troff]: Refactor.

...to use the C++ default argument force.  Merge function
`get_number_rigidly()` into `read_measurement(), which differed by one
line out of a dozen, by extending the latter's signature with a default
argument `is_mandatory` of type `bool`.  This coincidentally made GCC's
overload resolver more sensitive to arguments of ambiguous integral
type, so take this opportunity to stop punning between character
literals and `unsigned char`, in favor of the explicitness we'll need
anyway for GNU troff's planned wider fundamental character type.

* src/roff/troff/token.h: Drop declaration of `get_number_rigidly()`.
  Update declaration of `read_measurement()` with `bool`-valued 3rd
  argument defaulting `false`.

* src/roff/troff/input.cpp (read_color_channel_value)
  (do_expr_test, read_size, do_register)
  (is_conditional_expression_true, evaluate_expression):
* src/roff/troff/reg.cpp (define_register_request): Explicitly construct
  literal of `unsigned char` type as argument to `read_measurement()`.

* src/roff/troff/input.cpp (do_expr_test): Migrate only call site of
  `get_number_rigidly()` to `read_measurement()` with an explicit `true`
  3rd argument.

* src/roff/troff/number.cpp (get_number_rigidly): Delete.

  (read_measurement): Accept third argument, `is_mandatory`, of type
  `bool`.  Pass it to `is_valid_expression()`, which is already prepared
  for same.

Continues the long process of fixing Savannah #67735.




___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #67735] [long-term] use "unsigned char" for troff's internal character type with militant consistency

2025-12-17 Thread G. Branden Robinson
Follow-up Comment #8, bug #67735 (group groff):


commit c0030ad7d141eefb875b0fe4fcd7e32dd9051724
Author: G. Branden Robinson 
Date:   Sat Dec 13 15:26:24 2025 -0600

[troff]: Continue fixing Savannah #67735.

* src/roff/troff/input.cpp
  (non_interpreted_char_node::non_interpreted_char_node): Fix code style
  nit.  Migrate input character handling constructor to deal in the type
  `unsigned char` rather than `char`.  Use explicitly unsigned literal
  in comparison.

Continues the long process of fixing Savannah #67735.




___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #67735] [long-term] use "unsigned char" for troff's internal character type with militant consistency

2025-12-12 Thread G. Branden Robinson
Follow-up Comment #7, bug #67735 (group groff):


commit c74bf804f79d6f176120a42c757701a34fc489c6
Author: G. Branden Robinson 
Date:   Mon Dec 8 04:21:14 2025 -0600

[troff]: Continue working on bug #67735.

* src/roff/troff/input.cpp: Migrate more input character reading
  functions to deal in the type `unsigned char` rather than `char`.

  (do_get_long_name): Update declaration.

  (read_rgb, read_cmy, read_cmyk, read_gray, do_get_long_name): Update
  definitions.

  (get_long_name): Update `do_get_long_name()` call site to use literal
  of `unsigned char` rathern than `char` type.

  (do_get_long_name): Change type of local variable `buf` from `char` to
  `unsigned char`, since it is directly populated by reads of bytes from
  the input stream.  This function converts said input into an object of
  the groff class `symbol` and returns it, but `symbol` has no
  constructor accepting a pointer to `const unsigned char`.
  Consequently, once we have successfuly populated `buf`, create a new
  buffer `chbuf`, a heap-allocated array of `char` type.  Free this
  array after constructing a `symbol` on the stack.  (If all this seems
  like rigmarole, consider that it's going to be necessary anyway when
  we read bytes from the input stream, confirm that they're valid UTF-8
  sequences, apply Normalization Form D decomposition, and then store
  them as one or more 32-bit code points in GNU troff's planned future
  internal character data type.  See Savannah #40720.)

  (read_drawing_command_color_arguments): Change type of `end` local
  variable from `int` to `unsigned char`, since that is what the
  `read_{rgb,cmy,cmyk,gray}()` functions now expect as arguments.

Continues the long process of fixing Savannah #67735.




___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #67735] [long-term] use "unsigned char" for troff's internal character type with militant consistency

2025-12-01 Thread G. Branden Robinson
Follow-up Comment #6, bug #67735 (group groff):


commit 9da08996d15768b00d47913f97130522c474573a
Author: G. Branden Robinson 
Date:   Sun Nov 30 04:48:46 2025 -0600

[troff]: Adjust type of TAB_REPEAT_CHAR constant.

* src/roff/troff/env.cpp: Explicitly construct the global constant
  `TAB_REPEAT_CHAR` as type `unsigned char`, because it is destined for
  comparison to input characters, instead of inheriting the type `char`
  from its literal initializer.




___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #67735] [long-term] use "unsigned char" for troff's internal character type with militant consistency

2025-11-28 Thread G. Branden Robinson
Follow-up Comment #5, bug #67735 (group groff):

Some progress toward this goal is seen in my push today.


commit 3699f5298485274162514d3e8ea1fb332113458d
Author: G. Branden Robinson 
Date:   Thu Nov 27 02:21:34 2025 -0600

[troff]: Modestly refactor (1/8).

Perform more careful comparisons of the `unsigned char` values the
formatter reads from input (or a token sequence) by promoting either the
token or a character literal to which it is compared to `int` to (A)
explicitly avoid issues with the implementation-defined signedness of
unqualified `char`, and (B) lay foundation for future migration of GNU
troff's internal character type to a custom, wider type.

* src/roff/troff/div.cpp (return_request): Construct integer from
  character literal.

commit 437c81577ea9cfba5ce81958d82370f21921fc27
Author: G. Branden Robinson 
Date:   Thu Nov 27 02:22:18 2025 -0600

[troff]: Modestly refactor (2/8).

* src/roff/troff/reg.cpp (assign_register_format_request):
  Construct integer from character literal.  Store return value of
  `tok.ch()` in local variable of `int`, not `char`, type.

commit 9700152b56e59c193078091baa0552867616d149
Author: G. Branden Robinson 
Date:   Thu Nov 27 02:25:38 2025 -0600

[troff]: Modestly refactor (3/8).

* src/roff/troff/env.cpp: Retype global `TAB_REPEAT_CHAR` from (`const`)
  `char` to `unsigned char`.

  (configure_tab_stops_request): Construct integer from character
literal.

  (adjust): Store return value of `tok.ch()` in temporary local variable
  of `int`, not `char`, type.

commit f0bb39a6962f7b7882b1eb74d9973a3adf9b55f8
Author: G. Branden Robinson 
Date:   Fri Nov 28 19:24:21 2025 -0600

[troff]: Modestly refactor (4/8).

* src/roff/troff/number.cpp (get_incr_number)
  (is_valid_expression, is_valid_term):
  Construct integer from character literal.

  (is_valid_term): Store return value of `tok.ch()` in temporary local
  variable of `int`, not `char`, type.

commit e1aae13b1b1f247c99e2d065c3b18ae0f081a95c
Author: G. Branden Robinson 
Date:   Fri Nov 28 19:25:16 2025 -0600

[troff]: Modestly refactor (5/8).

* src/roff/troff/input.cpp (get_line_arg): Construct integer from
  character literal.

commit cfd134fe50644f4aff41cbd37ebe8fa51689484d
Author: G. Branden Robinson 
Date:   Thu Nov 27 00:41:53 2025 -0600

[troff]: Modestly refactor (6/8).

* src/roff/troff/input.cpp (read_size):
  Construct integer from character literal.

* src/roff/troff/input.cpp (read_size): Use local variables to avoid
  repeated member function calls.  (Presumably a smart optimizer would
  do the equivalent on its own, but this change also makes a lengthy
  compound conditional expression shorter.)

commit a761a966ad16d5377540edfa22240e78975240f8
Author: G. Branden Robinson 
Date:   Thu Nov 27 01:05:08 2025 -0600

[troff]: Modestly refactor (7/8).

* src/roff/troff/input.cpp (is_conditional_expression_true):
  Construct integer from character literal.  Store return value of
  `tok.ch()` in temporary local variable of `int`, not `unsigned char`,
  type.

commit 7825b80f5fc0a4efd7e2b2fb45a5284f81c97ee4
Author: G. Branden Robinson 
Date:   Thu Nov 27 02:16:17 2025 -0600

[troff]: Modestly refactor (8/8).

* src/roff/troff/input.cpp (read_drawing_command): Store return value of
  `tok.ch()` in local variable of `int`, not `unsigned char`, type.




___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #67735] [long-term] use "unsigned char" for troff's internal character type with militant consistency

2025-11-25 Thread G. Branden Robinson
Follow-up Comment #4, bug #67735 (group groff):

At 2025-11-25T16:44:55-0500, Collin Funk wrote:
> Follow-up Comment #3, bug #67735 (group groff):
>
> +1, unsigned char or uint8_t is best if you just want to represent
> bytes of data.

That's not what we want.  We want to read input bytes and encode them
into a much more semantically rich data type.

https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.h?h=1.23.0
https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/token.h?h=1.23.0

A factor that clarifies some things and obfuscates others is that
sometimes (but less often than people suppose; see _groff_char_(7))
these token represent Unicode Basic Latin characters that happen to have
identical code point assignments in ISO 10646 and ISO 646.



___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #67735] [long-term] use "unsigned char" for troff's internal character type with militant consistency

2025-11-25 Thread Collin Funk
Follow-up Comment #3, bug #67735 (group groff):

+1, unsigned char or uint8_t is best if you just want to represent bytes of
data.


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #67735] [long-term] use "unsigned char" for troff's internal character type with militant consistency

2025-11-25 Thread G. Branden Robinson
Follow-up Comment #2, bug #67735 (group groff):

[comment #1 comment #1:]
> [comment #0 original submission:]
>> And that in turn is necessary for resolution of bug #40720.
> 
> Not if that resolution is implemented as Ingo and I discuss in comments 4-6
> there.
> 
> (Not an argument against fixing this, just its relation to #40720.)

I'll respond to this point in bug #40720, since it has no bearing on the
resolution of _this_ ticket: militant consistency in the type GNU _troff_ uses
for its internal representation of a character code (be that `unsigned char`
or `char *`, or something else), is desirable for consistency and avoidance of
tricky bugs like signedness mismatch, and necessary for any revision of that
type.


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #67735] [long-term] use "unsigned char" for troff's internal character type with militant consistency

2025-11-25 Thread Dave
Follow-up Comment #1, bug #67735 (group groff):

[comment #0 original submission:]
> And that in turn is necessary for resolution of bug #40720.

Not if that resolution is implemented as Ingo and I discuss in comments 4-6
there.

(Not an argument against fixing this, just its relation to #40720.)


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #67735] [long-term] use "unsigned char" for troff's internal character type with militant consistency

2025-11-25 Thread G. Branden Robinson
URL:
  

 Summary: [long-term] use "unsigned char" for troff's internal
character type with militant consistency
   Group: GNU roff
   Submitter: gbranden
   Submitted: Tue 25 Nov 2025 05:08:24 PM UTC
Category: Core
Severity: 3 - Normal
  Item Group: Lint
  Status: In Progress
 Privacy: Public
 Assigned to: gbranden
 Open/Closed: Open
 Discussion Lock: Any
 Planned Release: None


___

Follow-up Comments:


---
Date: Tue 25 Nov 2025 05:08:24 PM UTC By: G. Branden Robinson 
"unsigned char" appears to already be used in a preponderance of cases, and
increasingly with recent code changes of mine, but there's still a lot of
punning to the C "char"s of undefined signedness.

It's necessary to nail this down to migrate the underlying representation type
to something wide enough to hold Unicode code points.  And that in turn is
necessary for resolution of bug #40720.

Perhaps we could pivot through `typedef grochar unsigned char;`.







___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature