from:"Gavin Smith"

Re: menu and sectioning consistency warning too strict?

2024-04-10 Thread Gavin Smith

On Wed, Apr 10, 2024 at 09:57:19PM +0200, Patrice Dumas wrote:
> Hello,
> 
> With CHECK_NORMAL_MENU_STRUCTURE set to 1, there is a warning by
> texi2any:
> 
> a.texi:10: warning: node `node after chap1' is next for `chap1' in menu but 
> not in sectioning
> 
> for the following code:
> 
> @node Top
> @top top
> 
> @menu
> * chap1::
> * node after chap1::
> @end menu
> 
> @node chap1
> @chapter Chapter 1
> 
> @node node after chap1,, chap1, Top
> 
> 
> 
> 
> I am not sure that this warning is warranted, this code seems ok to
> me, the lone node is not fully consistent with the sectioning structure,
> but not that inconsistent either.
> 
> If there is another chapter after the lone node, there are two warnings,
> but this seems ok to me, as in that case, there is a clearer
> inconsistency, since with sectioning there is this time a different next:

I agree that the warning is not really necessary.   I don't mind
either way.  It's up to you if you want to try to remove the warning.
It's questionable whether lone @node without a following sectioning
command is proper Texinfo. or what these constructs mean or how they should
be output.

Re: Further reducing special case code for XS structuring

2024-04-05 Thread Gavin Smith

On Fri, Apr 05, 2024 at 11:02:14PM +0200, Patrice Dumas wrote:
> > I've committed a change to remove the $no_build parameter but not the
> > $no_store parameter.
> 
> It seems to me that the changes are no complete, to me the _get_parser_info
> $no_build argument should be removed too, and the build_document should
> only be called if $no_store is set.  It seems to me that is would be
> better for parse_texi_line, it seems to me that if $no_store is not set,
> the tree should not be built in _get_parser_info, but in
> parse_texi_line, such that the decision to get a tree or only a handler
> is possible in future changes of parse_texi_line.

I've made this change.  I agree that it is better this way.

Re: Further reducing special case code for XS structuring

2024-04-05 Thread Gavin Smith

On Fri, Apr 05, 2024 at 12:20:19AM +0200, Patrice Dumas wrote:
> I do not think that having the same interface for XS and pure Perl is
> important.  It could even make clearer what is going on in XS.  What is
> important is to have a code that is easy to understand as a whole in my
> opinion.
> 
> > The current state of the change is below.  I'd also plan on reviewing
> > the other parse_texi_* functions in Parsetexi.pm (such as parse_texi_piece)
> > to see if the $no_build and $no_store arguments are necessary.
> 
> I expect no_build not to be relevant anymore, as with your changes
> building Perl structures is never done by the Parser if the XS Parser is
> used, building Perl structures is always done later on.
> 
> I am not sure that no_store is used, but I think that it should be
> considered completely separately from no_build and the on-demand Perl
> structures build, as it is a completely different issue.

parse_texi_piece is declared in Parsetexi.pm as
"sub parse_texi_piece($$;$$$)" while it is declared in ParserNonXS.pm as
"sub parse_texi_piece($$;$)", taking two fewer optional arguments.  (I
believe this hasn't led to an error message from Perl as the function
signatures are not checked when a sub is called with method call syntax.)
The $no_store argument doesn't appear to be used anywhere.

I've committed a change to remove the $no_build parameter but not the
$no_store parameter.

Re: Further reducing special case code for XS structuring

2024-04-04 Thread Gavin Smith

On Wed, Apr 03, 2024 at 11:53:09PM +0200, Patrice Dumas wrote:
> To avoid that, you could change the special call to $document->tree
> with an argument, like $document->tree(1) to be $document->tree(), as
> the argument, which is only taken in into account in XS is such that an
> handler only is returned, similar to calling get_document instead of
> build_document, while it is not what you want, you want the tree to be
> built at that point such that Texinfo::Transformations has a real tree
> to work with and not hash reference containing only the information to
> retrieve the tree in XS/C.
> 
> Only doing that is likely to lead to the Perl tree being incorrectly
> built if TEXINFO_XS_CONVERT is set to 1, while we want the tree not
> to be built at that point in that case.  So probably, $XS_convert need to be
> determined as in Texinfo/Convert/HTML.pm and be passed to
> $document->tree where a 1 is passed currently, so something like l 1528
> of texi2any.pl (and similar in test_utils.pl)
> 
>   my $tree = $document->tree($XS_convert);
> 
> I have not tested that part, so I may be wrong.

I am working on changes for this, but it still doesn't seem right to me,
in the case of XS structuring being on, and XS conversion being off.

As of commit 5f6800edc (2024-04-03 13:29:19 +0100), in texi2any.pl the
tree is retrieved on line 1526:

  goto NEXT;
}

my $tree = $document->tree(1);
if ($tree_transformations{'fill_gaps_in_sectioning'}) {
  Texinfo::Transformations::fill_gaps_in_sectioning($tree);
}

The changes involved changes to this part of the code:

@@ -1523,7 +1529,14 @@ while(@input_files) {
 goto NEXT;
   }

-  my $tree = $document->tree(1);
+  my $tree;
+  if (!$XS_convert) {
+$tree = $document->tree();
+  } else {
+# The argument prevents the tree being built as a Perl structure in
+# case XS code is being used all the way through to conversion.
+$tree = $document->tree(1);
+  }
   if ($tree_transformations{'fill_gaps_in_sectioning'}) {
 Texinfo::Transformations::fill_gaps_in_sectioning($tree);
   }

However, this is before several transformations take place on the
structure.  So if XS structuring is on, these transformations will (or
should) take place on the C tree, and the building of the Perl tree at
this stage seems needless.

For example, in the first transformation:

  if ($tree_transformations{'fill_gaps_in_sectioning'}) {
Texinfo::Transformations::fill_gaps_in_sectioning($tree);
  }

'fill_gaps_in_sectioning' in structuring_transfo/StructuringTransfoXS.xs
calls 'get_sv_tree_document' in main/get_perl_info.c, looking for a
"tree_document_descriptor" hash value, which is indeed set (even though the
rest of the Perl tree is there as well), so it proceeds correctly on the
C version of the tree.  So it does appear to be done correctly, in spite
of some unnecessary processing.

(It also seems a bit confusing that some transformations use a "tree"
object and some use a "document" object.)

So I thought it should depend on "XS structuring" being on, rather than
"XS conversion":

my $tree;
if (!$XS_structuring) {
  $tree = $document->tree();
} else {
  # The argument prevents the tree being built as a Perl structure at
  # this stage; only a "handle" is returned.
  $tree = $document->tree(1);
}

Does this look right?

(Admittedly, this is still special-casing for XS, which I was trying to
eliminate, although it's still simpler overall, in my opinion.)

"make check" passed with this, both with TEXINFO_XS_STRUCTURE=0 and with
TEXINFO_XS_STRUCTURE=1.

Then it should be possible to simplify parse_texi_file in Parsetexi.pm
not to take the extra argument ($no_build).  This would be good, as it
would match parse_texi_file in ParserNonXS.pm, which doesn't have the
extra argument.  I feel that XS and pure Perl functions should present
the same interface as far as is practicable.  (Anyone reading a call
to parse_texi_file in the Perl code might find it harder to delve into
the XS code to understand what the extra argument does.  There's an
immediate barrier in working out that Texinfo::Parser::parse_texi_file
is actually implemented in Texinfo/XS/parsetexi/Parsetexi.pm - not that
this is avoidable, of course.  Then they may have to understand the
role of *.xs files in defining XS subroutines, like 'get_document' or
'build_document' if they want to understand what these do.)

The current state of the change is below.  I'd also plan on reviewing
the other parse_texi_* functions in Parsetexi.pm (such as parse_texi_piece)
to see if the $no_build and $no_store arguments are necessary.  I expect
I'll do this in the next few days.

(I had to change t/protect_character_in_texinfo.t to keep the call to
Texinfo::Document::rebuild_tree working - this is the only place in the
code base this function is called.)

diff --git a/tp/Texinfo/Document.pm b/tp/Texinfo/Document.pm
index 4b8ceb1c99..b5335dcfb6 100644
--- a/tp/Texinfo/Document.pm

Re: Further reducing special case code for XS structuring

2024-04-04 Thread Gavin Smith

On Wed, Apr 03, 2024 at 11:53:09PM +0200, Patrice Dumas wrote:
> On Wed, Apr 03, 2024 at 09:23:28PM +0100, Gavin Smith wrote:
> > This does appear to make texi2any work with TEXINFO_XS_STRUCTURE=0, although
> > there are various test failures that I haven't investigated properly
> > It appears that the failures are to do with @item; e.g. for
> > "converters_tests.t indices_in_begin_tables_lists_entries_after_item", the
> > diff includes
> 
> To avoid that, you could change the special call to $document->tree
> with an argument, like $document->tree(1) to be $document->tree(), as
> the argument, which is only taken in into account in XS is such that an
> handler only is returned, similar to calling get_document instead of
> build_document, while it is not what you want, you want the tree to be
> built at that point such that Texinfo::Transformations has a real tree
> to work with and not hash reference containing only the information to
> retrieve the tree in XS/C.

Thanks I will work on it.

Re: Further reducing special case code for XS structuring

2024-04-03 Thread Gavin Smith

On Wed, Apr 03, 2024 at 09:23:29PM +0100, Gavin Smith wrote:
> We previously discussed the interface of the XS modules and trying to
> improve their abstraction so that the same interface was used for both
> XS and non-XS code.  (Emails from 2024-02-23 "Better abstraction of
> Texinfo::Document interface regardless of XS".)  We made changes in
> this direction.

Sorry, for the point of avoiding confusion, this was private mail between
me and Patrice.

The changes that were made at the time were to do with getting the
errors and warnings that were the result of the "structuring" step.
See changes in the ChangeLog referring to the "registrar" object around
2024-03-03, e.g. "Separate Parser and Document errors and registrars"
(Patrice Dumas, git commit c5b74c50ab), and earlier around 2024-02-24
("Reference Texinfo::Report in Texinfo::Document object.", git commit
9cccf0a6505).

Further reducing special case code for XS structuring

2024-04-03 Thread Gavin Smith

We previously discussed the interface of the XS modules and trying to
improve their abstraction so that the same interface was used for both
XS and non-XS code.  (Emails from 2024-02-23 "Better abstraction of
Texinfo::Document interface regardless of XS".)  We made changes in
this direction.

There is still code like this in multiple places in the code, e.g. in
texi2any.pl:

# XS parser and not explicitely unset
my $XS_structuring = ((not defined($ENV{TEXINFO_XS})
or $ENV{TEXINFO_XS} ne 'omit')
   and (not defined($ENV{TEXINFO_XS_PARSER})
or $ENV{TEXINFO_XS_PARSER} eq '1')
   and (not defined($ENV{TEXINFO_XS_STRUCTURE})
or $ENV{TEXINFO_XS_STRUCTURE} ne '0'));

I think this makes the code harder to understand and is probably not
necessary.  It will make the code harder to maintain in the future.
(E.g., it could be something that makes new contributors give up.)

I tried to eliminate this, but haven't succeeded so far in the time I've
spent on it.

I found the $XS_structuring variable was being passed to
Texinfo::Parser::parse_texi_file, and the difference that this made was
whether Texinfo::Document::build_document or Texinfo::Document::get_document
was called (in 'parse_texi_file', in Texinfo/XS/parsetexi/Parsetexi.pm).
If $XS_structuring was false, then it was 'build_document' that was called,
which builds more of the document data in Perl structures.

I first tried fixing the value of this variable so that get_document would
always be called:


diff --git a/tp/t/test_utils.pl b/tp/t/test_utils.pl
index de26a822e0..78238537df 100644
--- a/tp/t/test_utils.pl
+++ b/tp/t/test_utils.pl
@@ -119,6 +119,7 @@ my $XS_structuring = ((not defined($ENV{TEXINFO_XS})
 or $ENV{TEXINFO_XS_PARSER} eq '1')
and (not defined($ENV{TEXINFO_XS_STRUCTURE})
 or $ENV{TEXINFO_XS_STRUCTURE} ne '0'));
+$XS_structuring = 1;
 
 my $generated_texis_dir = 't_texis';
 
diff --git a/tp/texi2any.pl b/tp/texi2any.pl
index 7f6327f504..d4154ca91b 100755
--- a/tp/texi2any.pl
+++ b/tp/texi2any.pl
@@ -1430,6 +1430,8 @@ my $XS_structuring = ((not defined($ENV{TEXINFO_XS})
and (not defined($ENV{TEXINFO_XS_STRUCTURE})
 or $ENV{TEXINFO_XS_STRUCTURE} ne '0'));
 
+$XS_structuring = 1;
+
 if (defined($ENV{TEXINFO_XS_EXTERNAL_CONVERSION})
 and $ENV{TEXINFO_XS_EXTERNAL_CONVERSION}) {
   set_from_cmdline('XS_EXTERNAL_CONVERSION', 1);


I thought that the extra Perl structures were needed by the pure Perl
module Texinfo/Structuring.pm.  Access to these structures is through
accessor functions in Texinfo/Document.pm; (e.g. 'nodes_list') however,
the Perl structures are not built on demand in the pure Perl accessor
functions.  To try to compensate for this, I enabled the XS overrides in
Texinfo/Document.pm:

diff --git a/tp/Texinfo/Document.pm b/tp/Texinfo/Document.pm
index 4b8ceb1c99..76479499b1 100644
--- a/tp/Texinfo/Document.pm
+++ b/tp/Texinfo/Document.pm
@@ -95,7 +95,7 @@ sub import {
   for my $sub (keys %XS_overrides) {
 Texinfo::XSLoader::override ($sub, $XS_overrides{$sub});
   }
-  if ($XS_structuring) {
+  if ($XS_parser or $XS_structuring) {
 for my $sub (keys %XS_structure_overrides) {
   Texinfo::XSLoader::override ($sub, $XS_structure_overrides{$sub});
 }

(Another reason that this seems like a good change to make is that there
seems to be no reason that Texinfo/Document.pm should have a different
implementation depending on the value of TEXINFO_XS_STRUCTURE.)

This does appear to make texi2any work with TEXINFO_XS_STRUCTURE=0, although
there are various test failures that I haven't investigated properly
It appears that the failures are to do with @item; e.g. for
"converters_tests.t indices_in_begin_tables_lists_entries_after_item", the
diff includes

@@ -3848,15 +3846,13 @@ 
$result_texis{'indices_in_begin_tables_lists_entries_after_item'} = '\\input tex
 
 @itemize @minus
 @c comment in itemize
-@item 
 @cindex also a cindex in itemize
-e--mph item
+@item e--mph item
 @end itemize
 
 @itemize @bullet
-@item 
 @cindex index entry within itemize
-i--tem 1
+@item i--tem 1
 @item @cindex index entry right after @@item
 i--tem 2
 @end itemize

- which appears to be the result of some tree transformation not being
reflected.

I guess the main issue is with having a tree in both Perl and C and the
Perl transformation not being reflected in the C tree structure which
is then used for the output.

So I think this approach is close to working if I can work out what
is going on with the right version of the tree being used.  Hopefully,
then much of the special-casing for TEXINFO_XS_STRUCTURE throughout the
code for texi2any would no longer be needed.  Do you think this is right,
or have I missed something?

Another option is to get rid of

Re: renaming BODYTEXT as BODY_ELEMENT_ATTRIBUTES

2024-04-03 Thread Gavin Smith

On Wed, Apr 03, 2024 at 02:21:22PM +0200, Patrice Dumas wrote:
> Hello,
> 
> I propose to rename BODYTEXT as BODY_ELEMENT_ATTRIBUTES, I find BODYTEXT
> not specific enough and confusing.  It would also be more consistent
> with the HTML_ROOT_ELEMENT_ATTRIBUTES customization variable name, which
> is good in my opinion.
> 
> The description of BODYTEXT is:
>   The  attributes text.
> 

Renaming it makes sense to me.

Re: removing NO_USE_SETFILENAME and USE_UP_NODE_FOR_ELEMENT_UP

2024-04-03 Thread Gavin Smith

On Wed, Apr 03, 2024 at 05:09:28PM +0200, Patrice Dumas wrote:
> Hello,
> 
> I would like to remove two customization options:
> 
> NO_USE_SETFILENAME
> If set, do not use @setfilename to set the document name; instead, base the
> output document name only on the input file name. The default is false.
> 
> USE_UP_NODE_FOR_ELEMENT_UP
> Fill in up sectioning direction with node direction when there is no 
> sectioning
> up direction. In practice this can only happen when there is no @top section.
> Not set by default.
> 
> 
> These customization options are set when TEXI2HTML is set.  I think that
> they were added for the purpose of compatibility with texi2html.
> However, beside strict compatibility with texi2html (which was an
> important feature to have in 2010, but is not relevant today), I do not
> think that these customizations options are useful, even for users who
> would want compatibility with texi2html, as there are many ways to set
> setfilename and the USE_UP_NODE_FOR_ELEMENT_UP option is not clearly
> useful (or even confusing).
> 
> Unless there is some feedback, I will remove those variables soon.

I'm happy with these being removed.

Re: renaming COMPLEX_FORMAT_IN_TABLE

2024-04-03 Thread Gavin Smith

On Sun, Mar 31, 2024 at 10:58:08PM +0200, Patrice Dumas wrote:
> Hello,
> 
> If the customization variable COMPLEX_FORMAT_IN_TABLE is set,
> @indentedblock and indented fixed-width font @-commands, @example,
> @display and @lisp are indented with a table (instead of using ).
> 
> I think that the name is not good, because 'COMPLEX' does not fit well
> with the list of @-commands, relating to indentation would be better, in
> my opinion.
> 
> I propose to change to
> INDENTED_BLOCK_COMMANDS_IN_TABLE
> 
> Any comments, other ideas?

I like this as the words "complex format" are close to meaningless in this
context.

Re: Some links to external manuals not followable in Info output

2024-04-03 Thread Gavin Smith

On Wed, Apr 03, 2024 at 12:09:04PM +0200, Patrice Dumas wrote:
> On Tue, Apr 02, 2024 at 09:56:40PM +0100, Gavin Smith wrote:
> > If a cross-reference appears like
> > 
> > @xref{,,, texi2any_api, GNU Texinfo @command{texi2any} Output 
> > Customization}.
> > 
> > with no first argument, then this is output in Info like
> > 
> > *Note (texi2any_api)::.
> > 
> > - with no node argument following the manual.  The Info browser (at least
> > recent versions of it; I checked 6.8 and 7.1) cannot follow such links
> > (I didn't check with Emacs).  However, texi2any does not warn about
> > such constructs.
> 
> Here is a patch for the Info reader.  It passes the tests and in a test
> with a minimum info file, followed a cross-reference as above.  But I
> may have overlooked interactions with other parts of the code.

I've committed this in your name.  I found it hard to understand too
(even though I was responsible for much of this code).  scan_reference_marker
and scan_reference_target both find the targets of a reference.  However,
I don't feel like attempting to do a larger rewrite.


> 
> -- 
> Pat

> diff --git a/info/scan.c b/info/scan.c
> index 4b7dfe8754..007498dabc 100644
> --- a/info/scan.c
> +++ b/info/scan.c
> @@ -1135,7 +1135,7 @@ scan_reference_label (REFERENCE *entry, int in_index)
>len = read_quoted_string (inptr + label_len, ":", max_lines,
>  >nodename);
>canonicalize_whitespace (entry->nodename);
> -  if (!len)
> +  if (!len && !entry->filename)
>  return 0; /* Input invalid. */
>label_len += len;
>  }

Some links to external manuals not followable in Info output

2024-04-02 Thread Gavin Smith

If a cross-reference appears like

@xref{,,, texi2any_api, GNU Texinfo @command{texi2any} Output Customization}.

with no first argument, then this is output in Info like

*Note (texi2any_api)::.

- with no node argument following the manual.  The Info browser (at least
recent versions of it; I checked 6.8 and 7.1) cannot follow such links
(I didn't check with Emacs).  However, texi2any does not warn about
such constructs.

The Texinfo manual condones omitting the first argument:

 So, with cross-references to other manuals (*note Four and Five
  Arguments::), if the first argument is either ‘Top’ (capitalized just
  that way) or omitted entirely, and the third argument is omitted, the
  printed output includes no node or section name.  (The Info output
  includes ‘Top’ if it was given.)

(Info node (texinfo)Referring to a Manual as a Whole.)

I suggest that texi2any should supply the Top node name for links to
external manuals when it is left blank.

Re: proposal to remove AVOID_MENU_REDUNDANCY

2024-04-01 Thread Gavin Smith

On Sun, Mar 31, 2024 at 10:13:03PM +0200, Patrice Dumas wrote:
> Hello,
> 
> I propose to remove the HTML customization variable
> AVOID_MENU_REDUNDANCY:
> 
>  If set, and the menu entry and menu description are the
>  same, then do not print the menu description; default false.
> 
> It seems to me to be for a case that should never happen, is not really
> important, and my wild guess is that nobody uses it.
> 
> Comments?

I'm happy for it to be removed.  It seems that the user can easily
avoid this by leaving the menu description blank.

Re: organization of the documentation of customization variables

2024-03-17 Thread Gavin Smith

On Sat, Mar 16, 2024 at 03:53:45PM +0100, Patrice Dumas wrote:
> Hello,
> 
> In a recent thread, on help-texinfo, Eli raised valid concerns on the
> documentation of customization variables of texi2any, see for example:
> https://lists.gnu.org/archive/html/help-texinfo/2024-03/msg00014.html

Quoting from that message:

  What I was looking for is some kind of combination, in a single
  manual, where one could find both the general background and
  description of the conversion process (as texi2any_api does), and
  description of the customizations via variables in the context of that
  description of the conversion process.  Customization by writing Perl
  function should be considered a whole different level of
  customizations, which I believe very few will go to, especially given
  the large number of variables which presumably should already allow to
  do a lot, and should therefore be described in separate subsections.

> The current situation is that in the main Texinfo manual most of the
> customization variables are simply presented in a list, like a
> reference, not in a way that makes it easy to use them, as a tutorial
> with logical groupings and background information.  In the texi2any_api
> manual, the customization variables are presented in a more pedagogical
> manner, but, for now, only for some variables, and the explanation of
> the variables use appear in sections that may also cover some advanced
> way of customizating texi2any HTML output that require knowledge of
> Perl.

The first thing I would say is that the more the number of customization
variables can be minimized, the less problematic it is simply presenting
them in the manual as a simple list.

The relevant list in "(texinfo)HTML Customization Variables" is not
easy to read through from beginning to end.  I feel if the documentation
could be made more clear, users would be less likely to give up trying
to read through it.

Even the second variable in the list is obscure:

‘AFTER_BODY_OPEN’
 If set, the corresponding text will appear at the beginning of each
 HTML file; default unset.

It is not clear what "corresponding text" means - corresponding to what?
I think it means the value of the variable but this is not clear.

‘BODYTEXT’
 The text appearing in ‘’.  By default, sets the HTML ‘lang’
 attribute to the document language (*note @documentlanguage::).

I first thought this meant the text occuring between  and
, which would be most of the file, rather than text for the
attributes.

I do not personally know what all the customization variables are for
and have always found them overwhelming.  There may be variables which
aren't very useful or which are rarely used.

> I think that we should decide on how the customization variables
> documentation should be organized.  I think that we should keep some
> reference lists in the main Texinfo manual.  But I also think that we
> should make a choice regarding a pedagogical/tutorial-oriented
> documentation of the customization variable, with two options (there may
> be more, please add some if you have ideas):
> 1) document customization variables in the texi2any_api manual even
>though it will be mixed with more advanced customization which
>requires perl code
> 2) add sections presenting customization of texi2any output with
>customization variables in the main Texinfo manual presented in a
>tutorial fashion and remove the corresponding existing documentation
>from the texi2any_api manual.
> 
> What do you think?

It's of secondary importance what is done to the texi2any_api manual.
The more important thing, in my opinion, is to document the customization
variables in the texinfo manual as well as possible.  So if there is content
in the texi2any_api manual on customization variables which would improved
the texinfo manual, it would be good to use it.

I think that the main texinfo manual should be kept clear from discussion
of the Perl API, which may mean duplicating documentation of customization
variables between the two manuals in some places.

I suggest the the list in "(texinfo)HTML Customization Variables" could
be split up into smaller nodes, potentially with some more narrative
content in each node.  (For example, other non-HTML-specific variables
could be discussed, such as USE_NODES or NUMBER_FOOTNOTES.)  This would
hopefully be kept brief and not reach the length or complication of the
corresponding documentation in texi2any_api.

I took the list and tried to sort it into sections.  I may not have
done an especially good job of this, and there will likely be misplaced
variables.  I suggest this could be taken as a starting point for
reorganising the manual.

HTML Version control

 To allow for some variation in the version of HTML output.

 NO_CUSTOM_HTML_ATTRIBUTE
 USE_XML_SYNTAX

CSS

 Options relating to CSS.

 INLINE_CSS_STYLE
 NO_CSS
 SHOW_BUILTIN_CSS_RULES

Output structure

 Options affecting the overall structure of

Re: @itemize @asis is not well supported

2024-03-17 Thread Gavin Smith

On Thu, Mar 14, 2024 at 11:36:49PM +0100, Patrice Dumas wrote:
> > I didn't really know what this part meant:
> > 
> > Index entries and comments that are given before an @code{@@item}
> > including the first, are automatically moved (internally) to after the
> > @code{@@item}, so the output is as expected.  Historically this has
> > been a common practice.
> > 
> > Methinks comments don't affect the output so moving them is irrelevant.
> > I thought this could be rewritten to something like
> > 
> > Index entries that occur immediately before an @code{@@item}
> > are associated with the @code{@@item}.
> 
> Even though the information given on comments is relevant, I agree that
> it is expected and does not need to be mentionned.  I also agree with
> avoiding to mention some implementation details.

I've committed this.

> 
> There is also some text in @node Other Customization Variables
> TREE_TRANSFORMATIONS description that provides a similar information
> that you may consider changing:
> 
> @item move_index_entries_after_items
> In @code{@@enumerate} and @code{@@itemize}, move index entries
> appearing just before an @code{@@item} to just after the
> @code{@@item}.  Comment lines between index entries are moved too.  As
> mentioned, this is always done for HTML and DocBook output.

This is more of an internal setting that happens to be documented in the
manual.  It's possible that the user would never want to set this.

Re: @itemize @asis is not well supported

2024-03-17 Thread Gavin Smith

On Thu, Mar 14, 2024 at 11:36:49PM +0100, Patrice Dumas wrote:
> On Tue, Mar 12, 2024 at 06:39:46PM +0000, Gavin Smith wrote:
> > I think both should be documented as being valid.  I don't mind which one
> > is presented as more normal.  I have edited the node in the documentation
> > slightly.
> > 
> > It would be ok to change the example to show "@itemize @bullet{}" instead
> > of "@itemize @bullet" as long as the possibility of omitting the braces
> > was still shown somewhere.  This might potentially make the documentation
> > easier to read, as the discussion of braces could be given less prominence.
> 
> Here is a proposal, with also removal of some text that imply that
> braces always need to be omitted (and, in my opinion should be
> removed idependentely of the other issue).  Would that be ok to commit?

I've committed it in your name.

Re: @itemize @asis is not well supported

2024-03-14 Thread Gavin Smith

On Wed, Mar 13, 2024 at 03:42:19PM -0700, Raymond Toy wrote:
> > I think both should be documented as being valid.  I don't mind which one
> > is presented as more normal.  I have edited the node in the documentation
> > slightly.
> >
> > It would be ok to change the example to show "@itemize @bullet{}" instead
> > of "@itemize @bullet" as long as the possibility of omitting the braces
> > was still shown somewhere.  This might potentially make the documentation
> > easier to read, as the discussion of braces could be given less prominence.
> >
> 
> I haven't been following too closely, but I think the manual should only
> have examples of recommended practice.  If @bullet{} is the recommended
> way, then examples should only have that.  If the parser is lax and not
> enforcing this, then that's ok.  Just don't confuse people that @bullet and
> @bullet{} are both valid ways.  (Well, unless you are really saying both
> are valid now and in the probable future, in which case, ignore me.)

It's not that the parser is "lax" in accepting "@itemize @bullet" - it was
the valid, documented and usual practice.

Re: @itemize @asis is not well supported

2024-03-12 Thread Gavin Smith

On Thu, Mar 07, 2024 at 09:49:24AM +0100, Patrice Dumas wrote:
> On Wed, Mar 06, 2024 at 09:54:18PM +0100, Bruno Haible wrote:
> > Gavin Smith wrote:
> > > it is not worth changing and making practically every use of
> > > @itemize in a Texinfo manual being flagged as incorrect, in my opinion.
> > 
> > I agree. Counting the number of existing usages in Debian [1][2]:
> >   - 8753 times '@itemize @bullet' without braces,
> >   - 288 times '@itemize @bullet{}' with braces.
> 
> I understand that it is the current practice, and I do not advocate
> those uses to be flagged as incorrect.  I propose to change the
> documentation such that this use is not proposed as a valid use anymore,
> such that there are braces in future manuals, and also when people
> change manuals they use braces.

I think both should be documented as being valid.  I don't mind which one
is presented as more normal.  I have edited the node in the documentation
slightly.

It would be ok to change the example to show "@itemize @bullet{}" instead
of "@itemize @bullet" as long as the possibility of omitting the braces
was still shown somewhere.  This might potentially make the documentation
easier to read, as the discussion of braces could be given less prominence.

I didn't really know what this part meant:

Index entries and comments that are given before an @code{@@item}
including the first, are automatically moved (internally) to after the
@code{@@item}, so the output is as expected.  Historically this has
been a common practice.

Methinks comments don't affect the output so moving them is irrelevant.
I thought this could be rewritten to something like

Index entries that occur immediately before an @code{@@item}
are associated with the @code{@@item}.

Re: @itemize @asis is not well supported

2024-03-06 Thread Gavin Smith

On Tue, Mar 05, 2024 at 10:45:54PM +0100, Patrice Dumas wrote:
> On Tue, Mar 05, 2024 at 07:44:33PM +0000, Gavin Smith wrote:
> > I don't agree with changing the language to require braces.  There is
> > not a good reason to change it.  Although writing "@itemize @bullet{}"
> > may be more regular than "@itemize @bullet", it is a core and stable
> > part of the Texinfo lanugage, regardless of which one you might prefer.
> 
> It is not about which one I prefer, it is about consistency and
> simplicity of the language.

Even so, it is not worth changing and making practically every use of
@itemize in a Texinfo manual being flagged as incorrect, in my opinion.
If you were designing a new markup language from scratch that had no
existing users it would be a different matter.

We should try to accommodate both usages as best as we can in the source
code, as well as providing error messages or warnings, even if this makes
the implementation in texi2any slightly more complicated.

Re: @itemize @asis is not well supported

2024-03-06 Thread Gavin Smith

On Wed, Mar 06, 2024 at 03:05:06PM +0100, pertu...@free.fr wrote:
> > I suggest
> > 
> > ```
> > '@asis' needs braces here; say '@itemize @asis{}'.
> > ```
> 
> I do not like much that option, because we do not have any idea why the
> user ended up using '@itemize @asis'.  Maybe with @asis, it will be in
> general the case that @asis{} is correct, but with other @-command for
> example @strong, it is unlikely to be the case.  It could be an error,
> for instance the user wanted to use @table, it could be that the user
> actually wanted to use
> @itemize @strong{@bullet{}}
> it could be that user wanted something like
> @itemize @asis{} (or @itemize @w{}).
> 
> Saying that the user should say '@itemize @strong{}' is likely to be
> misleading in some cases, while saying that braces are needed
> forces the user to think again about what she/he wanted to do.  Another
> option could be to just say
> 
>   '@asis' needs braces here
>   '@strong' needs braces here
> 
> What do other think on that possibility?

It could just be the same error message as when used outwith @itemize:

@asis foo

- which gives an error message like:

test.texi:11: @asis expected braces

I agree that @itemize @strong is certainly a mistake and the user is very
unlikely to have wanted to have said "@itemize @strong{}" instead.

Although it is very unlikely input, I notice with the current code, there
is also a warning for the following:

@itemize A @bullet
@item QQbar
Q foo man
@end itemize

giving the warning:

test.texi:11: warning: command `@bullet' must not be followed by new line

I don't know how easy it would be to make the error messages consistent or
if it is worth doing.

Re: @itemize @asis is not well supported

2024-03-05 Thread Gavin Smith

On Tue, Mar 05, 2024 at 11:59:23AM +0100, Patrice Dumas wrote:
> On Mon, Mar 04, 2024 at 09:16:44PM +0000, Gavin Smith wrote:
> > \itemcontents expands to \asis and then TeX tries to take the } following
> > as the argument to \asis, which is invalid.
> > 
> > Basically @asis takes an argument in the TeX implementation, whereas
> > commands like @bullet or @minus don't, even though you usually should
> > write them as @bullet{} and @minus{}.
> 
> To me this supports even more changing the Texinfo language to have
> braces for @bullet or @minus on the @itemize line, as not having them
> seems to be permitted by an implementation detail of the TeX
> implementation.
> 
> I attach a diff for the Texinfo manual to remove the permission not to
> have braces for mark commands on the @itemize line.
> 
> Would that be ok to apply?

"@itemize @asis" is definitely wrong and could give a warning if not
an error.  That seems the main thing to change.

I don't agree with changing the language to require braces.  There is
not a good reason to change it.  Although writing "@itemize @bullet{}"
may be more regular than "@itemize @bullet", it is a core and stable
part of the Texinfo lanugage, regardless of which one you might prefer.

I would rather say that "@itemize @w{}" is the usage that is permiitted
by details of the implementation.  "@itemize @bullet" is the more typical
usage.

It seems like there are two permitted types of argument
* @itemize with a glyph command as an argument, without braces 
* @itemize with any valid Texinfo argument.  For example, "@itemize A"
to use the letter "A" as the bullet.

The second is hardly encouraged or used at all, except for "@itemize @w{}".
The last could be checked for as a special case.  We could try to check
in existing manuals how @itemize is used.  There may not be a benefit in
trying to be more general.

Re: @itemize @asis is not well supported

2024-03-04 Thread Gavin Smith

On Mon, Mar 04, 2024 at 03:59:26PM +0100, Bruno Haible wrote:
> @itemize @asis
> seems to work in HTML and info mode only, not in TeX mode.
> 
> 
> How to reproduce (with texinfo-7.1):

Patrice covered most of it.

It reproduces with any test file with "@itemize @asis" in it.

> /tmp/gnulib/doc/gnulib-tool.texi:472: Argument of @asis has an extra }.

It appears to be a deliberate failure:

  % Try typesetting the item mark so that if the document erroneously says
  % something like @itemize @samp (intending @table), there's an error
  % right away at the @itemize.  It's not the best error message in the
  % world, but it's better than leaving it to the @item.  This means if
  % the user wants an empty mark, they have to say @w{} not just @w.
  \def\itemcontents{#1}%
  \setbox0 = \hbox{\itemcontents}%

\itemcontents expands to \asis and then TeX tries to take the } following
as the argument to \asis, which is invalid.

Basically @asis takes an argument in the TeX implementation, whereas
commands like @bullet or @minus don't, even though you usually should
write them as @bullet{} and @minus{}.
>   * The workaround, which is to use @w{} instead of @asis, is hard to find 
> [3].
> 

It's documented in the Texinfo manual:

>  If you don't want any mark at all, but still want logical
> items, use ‘@w{}’ (in this case the braces are required).

Info node '(texinfo)@itemize'.
https://www.gnu.org/software/texinfo/manual/texinfo/html_node/_0040itemize.html

Re: how to mix or avoid perl and gnulib headers and object files together?

2024-03-03 Thread Gavin Smith

On Tue, Feb 27, 2024 at 10:38:37PM +0100, Patrice Dumas wrote:
> On Tue, Feb 27, 2024 at 08:10:31PM +0000, Gavin Smith wrote:
> > On Mon, Feb 26, 2024 at 01:15:47AM +0100, Patrice Dumas wrote:
> > > I did wrappers wrappers around functions to avoid mixing Perl and non-Perl
> > > memory allocation related functions in commit
> > 
> > I have a couple of queries about this:
> > 
> > * The comment in main/build_perl_info.c says that you can use a wrapper
> > perl_only_xasprintf to be sure that a Perl defined function is used.  This
> > refers to "vasprintf".  Did you check if Perl is actually guaranteed to
> > override vasprintf if it overrides malloc and new?  (Likewise for the
> > other perl_only_* functions.)  It could be more reliable to actually
> 
> No, and I am not sure that we can know beforehand if it is platform
> dependent.
> 
> > implement these functions ourselves.  E.g., replace asprintf with
> > malloc followed by sprintf, or strdup with malloc followed by memcpy.
> 
> I'll let you do that.

I've made some recent changes.

> > * Defining perl_only_* as wrappers in a different source file may make
> > it harder for the function calls to be inlined and optimised.  Could
> > using macros in header files be better, e.g. writing
> > "#define perl_only_malloc malloc"?  That is assuming that we need to
> > make it explicit that we intended to use any definition from the Perl
> > header files.
> 
> I am probably missing something, but this seems to undo the certainty that
> the malloc function comes from a specific file, which includes specific
> headers, as the malloc of the place where the macro is called would be
> used, which could depend on where it is used.

It would work by just using malloc instead of perl_only_malloc if the
source file included the Perl headers.  However, it is probably not a a
significant problem with the current usage of these functions.  I checked
this by running callgrind on "perl texi2any.pl --html" on an input file,
with TEXINFO_XS_CONVERT=1, and the perl_only_* functions were hardly
called (only once each, in fact).

Re: how to mix or avoid perl and gnulib headers and object files together?

2024-02-27 Thread Gavin Smith

On Mon, Feb 26, 2024 at 01:15:47AM +0100, Patrice Dumas wrote:
> On Fri, Feb 23, 2024 at 07:27:55PM +0000, Gavin Smith wrote:
> > It is bad at any time.  I'm not sure what situation you refer to by "link
> > time".  As far as I understand, the Perl headers redefine (or can redefine)
> > the symbol "free" to something else, e.g. "Perl_free".  This means the
> > ensuing object files will reference a different symbol, instead of just
> > "free".  That reference will persist and remain a problem at link time.
> 
> I did wrappers wrappers around functions to avoid mixing Perl and non-Perl
> memory allocation related functions in commit
> 
> https://git.savannah.gnu.org/cgit/texinfo.git/commit/?id=ba169cd7a8a8e67f280b305e19412fb63d23c71c
> 
> Hopefully it should solve this issue.
> 
> Do not hesitate to modify the comments for more clarity, especially in
> tp/Texinfo/XS/main/build_perl_info.c

I have a couple of queries about this:

* The comment in main/build_perl_info.c says that you can use a wrapper
perl_only_xasprintf to be sure that a Perl defined function is used.  This
refers to "vasprintf".  Did you check if Perl is actually guaranteed to
override vasprintf if it overrides malloc and new?  (Likewise for the
other perl_only_* functions.)  It could be more reliable to actually
implement these functions ourselves.  E.g., replace asprintf with
malloc followed by sprintf, or strdup with malloc followed by memcpy.

* Defining perl_only_* as wrappers in a different source file may make
it harder for the function calls to be inlined and optimised.  Could
using macros in header files be better, e.g. writing
"#define perl_only_malloc malloc"?  That is assuming that we need to
make it explicit that we intended to use any definition from the Perl
header files.

Re: Intermittent test failure in t/03coverage_braces.t definfoenclose_texinfo_commands

2024-02-25 Thread Gavin Smith

On Sun, Feb 25, 2024 at 09:00:55PM +0100, Patrice Dumas wrote:
> On Sun, Feb 25, 2024 at 07:43:42PM +0000, Gavin Smith wrote:
> > It appears that there was no effort made to reset the counters at all.
> > If a parser run finished with one of the counters active, it would
> > remain active for the next run.
> 
> Shouldn't there be an error message for the counters that should be
> reset 'naturally' by virtue of closing commands/stacks, resetting them
> unconditionnaly could hide some bugs maybe?

Yes, it would normally be the case that all counters are empty by the time
the parser finishes running.  Otherwise it could be a sign of a bug, or
of malformed input.

Re: Intermittent test failure in t/03coverage_braces.t definfoenclose_texinfo_commands

2024-02-25 Thread Gavin Smith

On Sun, Feb 25, 2024 at 07:01:22PM +, Gavin Smith wrote:
> So it appears there is a problem with the counter not being reset properly
> before the test runs, or sometime early on in the run.
> 
> I'll try and do more investigation.

It appears that there was no effort made to reset the counters at all.
If a parser run finished with one of the counters active, it would
remain active for the next run.  Then if the memory address of an element
assigned to a counter happened to be reused, by pure chance, the counter
would appear to refer to the new element, incorrectly.  This bug (possibly
only in the test suite, not in "real" runs of the program) has likely
always been present (since I wrote the original code for parsetexi);
the recently added test only allowed the bug to surface.

I've committed a fix in commit 23da32ceb93ef7495 (2024-02-25).

Re: Intermittent test failure in t/03coverage_braces.t definfoenclose_texinfo_commands

2024-02-25 Thread Gavin Smith

On Sun, Feb 25, 2024 at 04:47:46PM +, Gavin Smith wrote:
> I occasionally get test failures in the
> "perl -w t/03coverage_braces.t definfoenclose_texinfo_commands" test,
> but cannot usually repeat it reliably.  I suspect some issue with memory
> corruption or issue with uninitialised memory, although I've not managed
> to get anything to show up with valgrind so far.
> 
> To get it to fail, I need to do
> "while perl -w t/03coverage_braces.t  ; do true ; done"
> and then wait a while.

It's not particulary easy to debug but by inserting debugging print
statements it appears to be a problem with the
count_remaining_args counter:

diff --git a/tp/Texinfo/XS/parsetexi/parser.c b/tp/Texinfo/XS/parsetexi/parser.c
index 5d22df810c..fe4cf65baf 100644
--- a/tp/Texinfo/XS/parsetexi/parser.c
+++ b/tp/Texinfo/XS/parsetexi/parser.c
@@ -2387,6 +2387,9 @@ process_remaining_on_line (ELEMENT **current_inout, char 
**line_inout)
 {
   line++;
   /* comma as a command argument separator */
+  fprintf (stderr, "REMAINING ARGS %d (NVALUES %d)\n",
+   counter_value (_remaining_args, current->parent),
+   count_remaining_args.nvalues);
   if (counter_value (_remaining_args, current->parent) > 0)
 current = handle_comma (current, );
   else if (current->type == ET_line_arg && current->parent->cmd == CM_node)

Normally, with this test case, it prints

REMAINING ARGS -1 (NVALUES 0)

This shows that this counter is not being used.

However, on the occasions it fails, it prints

REMAINING ARGS -1 (NVALUES 1)

a few times before failing, and on the occasion it fails, it prints

REMAINING ARGS 1 (NVALUES 1)

So it appears there is a problem with the counter not being reset properly
before the test runs, or sometime early on in the run.

I'll try and do more investigation.

Intermittent test failure in t/03coverage_braces.t definfoenclose_texinfo_commands

2024-02-25 Thread Gavin Smith

I occasionally get test failures in the
"perl -w t/03coverage_braces.t definfoenclose_texinfo_commands" test,
but cannot usually repeat it reliably.  I suspect some issue with memory
corruption or issue with uninitialised memory, although I've not managed
to get anything to show up with valgrind so far.

To get it to fail, I need to do
"while perl -w t/03coverage_braces.t  ; do true ; done"
and then wait a while.

Here is the result of
"diff t/results/coverage_braces/definfoenclose_texinfo_commands.pl*"
on one of the occasions I was able to get it to fail after running
"TEXINFO_XS=require perl -w t/03coverage_braces.t".

The code was changed recently (2024-02-18, commit b447793774).


85c85,93
<   'text' => 'strong,(,)'
---
>   'text' => 'strong'
> }
>   ],
>   'type' => 'line_arg'
> },
> {
>   'contents' => [
> {
>   'text' => '(,)'
98,104d105
<   'extra' => {
< 'misc_args' => [
<   'strong',
<   '(',
<   ')'
< ]
<   },
215,221d215
<   'extra' => {
< 'begin' => '(',
< 'end' => ')'
<   },
<   'info' => {
< 'command_name' => 'strong'
<   },
224,225c218
<   },
<   'type' => 'definfoenclose_command'
---
>   }
316c309
< @definfoenclose strong,(,)
---
> @definfoenclose strong(,)
374a368,374
> 'error_line' => 'bad argument to @definfoenclose
> ',
> 'line_nr' => 3,
> 'text' => 'bad argument to @definfoenclose',
> 'type' => 'error'
>   },
>   {
397c397
TeX.  *in strong*.
406c406
< TeX. (in strong).
---
> TeX. in strong.


On another occasion, the diff was different:

$ diff t/results/coverage_braces/definfoenclose_texinfo_commands.pl*
119c119,127
<   'text' => 'quotation,q,e'
---
>   'text' => 'quotation'
> }
>   ],
>   'type' => 'line_arg'
> },
> {
>   'contents' => [
> {
>   'text' => 'q,e'
132,138d139
<   'extra' => {
< 'misc_args' => [
<   'quotation',
<   'q',
<   'e'
< ]
<   },
317c318
< @definfoenclose quotation,q,e
---
> @definfoenclose quotationq,e
382c383
< 'error_line' => 'cannot redefine with @definfoenclose: quotation
---
> 'error_line' => 'bad argument to @definfoenclose
385c386
< 'text' => 'cannot redefine with @definfoenclose: quotation',
---
> 'text' => 'bad argument to @definfoenclose',

Here is the contents of the test case:

['definfoenclose_texinfo_commands',
'@definfoenclose verb,;;,!!
@definfoenclose TeX,aa,bb
@definfoenclose strong,(,)
@definfoenclose quotation,q,e

@verb{*aaa*}.

@TeX{}. @strong{in strong}.

@quotation important
in quotation
@end quotation
'],

Re: how to mix or avoid perl and gnulib headers and object files together?

2024-02-23 Thread Gavin Smith

On Fri, Feb 23, 2024 at 05:33:53PM +0100, Patrice Dumas wrote:
> Hello,
> 
> There are notes in C code warning against calling malloc or free and
> functions potentially redefined by gnulib in files which include the
> perl headers.  It may be related to the discussion here:
> 
> https://lists.gnu.org/archive/html/bug-texinfo/2022-10/msg00381.html
> 
> The NOTE is
>   /* NOTE: Do not call 'malloc' or 'free' in any function called in this file.
>  Since this file (build_perl_info.c) includes the Perl headers,
>  we get the Perl redefinitions, which we do not want, as we don't use
>  them throughout the rest of the program. */
> 
>   /* Can't use asprintf here, because it might come from Gnulib, and
>  will then use malloc that is different from Perl's malloc, whereas
>  free below is redirected to Perl's implementation.  This could
>  cause crashes if the two malloc/free implementations were different.  */
> 
> There have been lots of reorganization of C code and some of these
> issues may be obsolete, while new issues may have been introduced (by
> me).
> 
> 
> First a question.  Is the mixing of perl and gnulib functions bad at
> compile time only or also at link time?

It is bad at any time.  I'm not sure what situation you refer to by "link
time".  As far as I understand, the Perl headers redefine (or can redefine)
the symbol "free" to something else, e.g. "Perl_free".  This means the
ensuing object files will reference a different symbol, instead of just
"free".  That reference will persist and remain a problem at link time.

>  A somewhat related question
> is can code like
> #if defined _WIN32 && !defined __CYGWIN__
>   #undef free
> #endif
> be removed if the C file does not include the gnulib headers but is
> eventually part of a binary object including binary objects compiled
> from gnulib code?

I would think that it can be.  The comment above that code is

/* Avoid warnings about Perl headers redefining symbols that gnulib
   redefined already. */

so the whole point of it is to cancel a redefinition from the gnulib
headers.  If it doesn't include the gnulib headers, there is no issue.
The #undef free only affects the meaning of the symbol "free" used
subsequently in the source file.  The binary objects compiled from
gnulib code may have gnulib's redefinition of "free" (e.g. "rpl_free")
but the code compiled from that particular file wouldn't reference it.
Does that make sense?

(Incidentally I complained on the gnulib lists about the redefinition
of "free", but don't believe it ever stopped happening:
.)

> This is the case of tp/Texinfo/XS/convert/main/build_perl_info.c, which
> is compiled and linked in libtexinfoxs.la, without any use of gnulib
> CPPFLAGS.  The compilation of the other library, libtexinfo.la uses
> gnulib.  Most XS objects link against both libtexinfoxs.la and
> libtexinfo.la.
> 
> 
> There are files which include the perl header and also include gnulib
> headers and call functions such as strndup or free, for example
> tp/Texinfo/XS/convert/build_html_perl_state.c
> tp/Texinfo/XS/convert/get_html_perl_info.c
> 
> Will this setup lead to problems?  I do not think that it is possible in
> general to avoid calling strdup/malloc/free in the file.  However it is
> possible to add #undef or use wrappers or segregate some code.

It may or may not cause a problem depending on how allocated memory is
passed between different parts of the code.  The use of strdup in
convert/get_html_perl_info.c could potentially be problematic, depending
on whether this is also redefined by Perl headers.

Here are some guidelines:

* The number of source files including Perl headers should be kept to
  a minimum.

* In source files that include Perl headers, memory can be allocated with
  "malloc", but needs to be freed with "free" in the same source file, and
  not passed to code in other source files that might free it.  (It would
  be possible to free it in code in other source files if those source files
  also included the Perl headers, perhaps.)  Use of other heap allocation
  functions, such as asprintf or strdup, need to be treated with caution,
  especially if they may be overridden by gnulib, and may need to be
  avoided.

* Likewise, code in source files that include Perl headers should not
  attempt to free memory that was allocated in code in source files that
  did _not_ include the Perl headers.

* Code in source files including Perl headers can still free memory by
  calling helper functions defined in other source files - they just can't
  call "free" directly.

As far as I remember, the reason for this requirement was that on
MS-Windows (not sure if it's for other platforms too), the Perl headers
redefine malloc and free.  If memory is allocated with non-redefined free,
and then freed with redefined free (or vice versa), then an error occurs
like "Free to wrong pool".

Re: Code from installed libtexinfo.so.0 run for non-installed texi2any

2024-02-21 Thread Gavin Smith

On Mon, Nov 06, 2023 at 07:52:40PM +0100, Andreas Schwab wrote:
> On Nov 06 2023, Gavin Smith wrote:
> 
> > $ readelf -d /usr/local/lib/texinfo/ConvertXS.so | grep RUNPATH
> >  0x001d (RUNPATH)Library runpath: 
> > [/usr/local/lib/texinfo]
> >
> > It still seems to be checking the other locations first, though.
> 
> Unlike DT_RPATH, DT_RUNPATH does not override LD_LIBRARY_PATH.

(Link to archived message:
<https://lists.gnu.org/archive/html/bug-texinfo/2023-11/msg00047.html>.)

If I unset LD_LIBRARY_PATH on my system, I don't have the issue any more.

The "ld.so" man page says that DT_RPATH is deprecated.  (I also
found various old mailing list messages saying the same thing.)

This may be a PEBKAC on my side, as I shouldn't have LD_LIBRARY_PATH
set as a matter of course.

https://lists.gnu.org/archive/html/libtool/2017-05/msg1.html
https://lists.gnu.org/archive/html/libtool/2007-10/msg00056.html

Re: not using OUTPUT_PERL_ENCODING to determine if _stream_encode should encode

2024-02-19 Thread Gavin Smith

On Sun, Feb 18, 2024 at 10:57:22PM +0100, Patrice Dumas wrote:
> On Sun, Feb 18, 2024 at 06:09:23PM +0000, Gavin Smith wrote:
> > If this is ok, then "convert" could set $self->{'encoding_disabled'}.
> 
> Should I do that part?

I tried making the change myself.  Do you think the following is correct?
It changes the conversion for "info" output in the test suite to use 'convert'
instead of 'output' ("file_info" still uses 'output').  It leads to quite
a few test suite result changes, which I haven't sent.

diff --git a/ChangeLog b/ChangeLog
index 5a0166b732..d23c77279c 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2024-02-19  Gavin Smith 
+
+   * tp/Texinfo/Convert/Plaintext.pm (convert):
+   Set $self->{'encoding_disabled'} to 1 so that 'convert' will
+   always return an unencoded character string.
+   * tp/t/test_utils.pl (convert_to_plaintext, convert_to_info):
+   Do not set OUTPUT_PERL_ENCODING to prevent encoding.
+   (convert_to_info): Check if there is an output file the same
+   way as in 'convert_to_plaintext', so that 'convert' is called
+   instead of 'output'.
+
 2024-02-18  Patrice Dumas  
 
Only allow highlighting commands to be redefined with @definfoenclose
diff --git a/tp/Texinfo/Convert/Plaintext.pm b/tp/Texinfo/Convert/Plaintext.pm
index 19df4a8912..98a4e514d1 100644
--- a/tp/Texinfo/Convert/Plaintext.pm
+++ b/tp/Texinfo/Convert/Plaintext.pm
@@ -619,6 +619,7 @@ sub convert($$)
   my ($self, $document) = @_;
 
   $self->conversion_initialization($document);
+  $self->{'encoding_disabled'} = 1;
 
   my $root = $document->tree();
 
diff --git a/tp/t/test_utils.pl b/tp/t/test_utils.pl
index 681ae6154e..b3daefed74 100644
--- a/tp/t/test_utils.pl
+++ b/tp/t/test_utils.pl
@@ -532,14 +532,6 @@ sub convert_to_plaintext($)
 }
   }
 
-  # If not outputing to a file, do not encode.  Return value from
-  # 'output' is a character string.  It will be encoded to
-  # UTF-8 in the results file.
-  if (defined($converter_options->{'OUTFILE'})
-  and $converter_options->{'OUTFILE'} eq '') {
-$converter_options->{'OUTPUT_PERL_ENCODING'} = '';
-  }
-
   my $converter = Texinfo::Convert::Plaintext->converter($converter_options);
 
   my $result;
@@ -574,19 +566,18 @@ sub convert_to_info($)
 = set_converter_option_defaults($converter_options, 'info',
 $self->{'DEBUG'});
 
-  # If not outputing to a file, do not encode.  Return value from
-  # 'output' is a character string.  This will be encoded to
-  # UTF-8 in the results file.  This may make byte offsets in the tag table
-  # incorrect, so if those needed to be tested, an separate output file
-  # would have to be used instead.
+  my $converter = Texinfo::Convert::Info->converter($converter_options);
+
+  my $result;
   if (defined($converter_options->{'OUTFILE'})
   and $converter_options->{'OUTFILE'} eq '') {
-$converter_options->{'OUTPUT_PERL_ENCODING'} = '';
+$result = $converter->convert($document);
+  } else {
+$result = $converter->output($document);
+close_files($converter);
+$result = undef if (defined($result) and ($result eq ''));
   }
 
-  my $converter = Texinfo::Convert::Info->converter($converter_options);
-  my $result = $converter->output($document);
-  close_files($converter);
   die if (!defined($converter_options->{'SUBDIR'}) and !defined($result));
 
   my $converter_errors = $converter->get_converter_errors();

Re: Patch: stable sort for /usr/share/info/dir

2024-02-18 Thread Gavin Smith

On Sun, Feb 18, 2024 at 01:27:12PM +0100, Roland Clobus wrote:
> Hello texinfo maintainers,
> 
> While working on reproducible live images [1], I noticed that the sort order
> in the file /usr/share/info/dir is unstable when nearly identical entries
> are added.
> 
> The order of the pairs 'diff', 'diff3' and 'who', 'whoami' was not
> guaranteed to be in dictionary order (i.e. shorter first)
> 
> Attached you find a diff based on the git checkout for version 7.1, which
> includes a test case that demonstrates the issue.
> 
> With kind regards,
> Roland Clobus
> 
> [1] https://reproducible-builds.org/

Thanks for explaining the problem and sending the fix.  I've applied
it in commit 01b5a4b9c33bef (2024-02-18).

Re: not using OUTPUT_PERL_ENCODING to determine if _stream_encode should encode

2024-02-18 Thread Gavin Smith

On Sun, Feb 18, 2024 at 02:13:13PM +0100, Patrice Dumas wrote:
> Hello,
> 
> This is in the new encode once only stream code, so I ask here before
> making a change, I could be missing something, Gavin you probably
> will have comments/advices.
> 
> I do not think that it is a good thing to set OUTPUT_PERL_ENCODING to an
> empty string to prevent _stream_encode to encode if a character string
> is wanted, as is done in test_utils.pl in convert_to_plaintext if
> convert() is called on a Plaintext converter (same for Info).  I think
> that it should be another information, set in the converter if called
> through convert.  For two reasons, first OUTPUT_PERL_ENCODING
> information could still be needed later on when outputting to a file
> (this is the case in test_utils.pl) and second because convert() should
> always return a character strings independentely of any customization
> information.

What you propose sounds reasonable.  I understand the idea of 'convert'
always returning an unencoded string regardless of output format, although
the Info converter will have incorrect tag tables so is not a widely
useful function.  It seems that 'convert' is only used for the test suite,
am I right?

> 
> A more minor issue, if $self->{'encoding_object'} is not set, it will
> be retested for 'output_perl_encoding' value each time _stream_encode is
> called.  Maybe it would be better to set it to 0 or to undef and test if
> (!exists($self->{'encoding_object'})).

I don't like the possibility of having $self->{'encoding_object'} being
either an integer (0) or the result of Encode::find_encoding.  I also
don't like relying on the distinction between "defined" and "true"
(0 is defined but not true).  The semantics of "exists" but undefined
is easy to forget and relying on this would make the code more obscure.

Does the following look ok?

diff --git a/tp/Texinfo/Convert/Plaintext.pm b/tp/Texinfo/Convert/Plaintext.pm
index bd7308f783..d944708c86 100644
--- a/tp/Texinfo/Convert/Plaintext.pm
+++ b/tp/Texinfo/Convert/Plaintext.pm
@@ -956,23 +956,24 @@ sub _stream_encode($$)
   my $self = shift;
   my $string = shift;
 
+  if ($self->{'encoding_disabled'}) {
+return $string;
+  }
+
   if (!defined($self->{'encoding_object'})) {
 my $encoding = $self->{'output_perl_encoding'};
-if ($encoding and $encoding ne 'ascii') {
-  my $Encode_encoding_object = Encode::find_encoding($encoding);
-  if (!defined($Encode_encoding_object)) {
-Carp::croak "Unknown encoding '$encoding'";
-  }
-  $self->{'encoding_object'} = $Encode_encoding_object;
+if (!$encoding or $encoding eq 'ascii') {
+  $self->{'encoding_disabled'} = 1;
+  return $string;
 }
+my $Encode_encoding_object = Encode::find_encoding($encoding);
+if (!defined($Encode_encoding_object)) {
+  Carp::croak "Unknown encoding '$encoding'";
+}
+$self->{'encoding_object'} = $Encode_encoding_object;
   }
 
-  if ($self->{'encoding_object'}) {
-my $encoded = $self->{'encoding_object'}->encode($string);
-return $encoded;
-  } else {
-return $string;
-  }
+  return $self->{'encoding_object'}->encode($string);
 }
 


If this is ok, then "convert" could set $self->{'encoding_disabled'}.

Re: index sorting in texi2any in C issue with spaces

2024-02-14 Thread Gavin Smith

On Tue, Feb 13, 2024 at 05:37:03PM +0100, Patrice Dumas wrote:
> On Mon, Feb 05, 2024 at 06:14:16PM +0000, Gavin Smith wrote:
> > On Sun, Feb 04, 2024 at 10:27:00PM +0100, Patrice Dumas wrote:
> > > removed at any time.  Also calling it something else, like
> > > XS_STRXFRM_COLLATION_LOCALE.
> >
> > That's fine by me, either accessing it with an obscure testing variable
> > or not offering it at all.
> 
> This is implemented (only in XS), but not documented.  Should it be?

I am happy not documenting it at all, as that makes it less likely
that people rely on it, and then it is easier for us to remove it in
the future.

> As a side note, there is a test in optional tests
> other/index_collation_test_collation_locale_sv that tests
> XS_STRXFRM_COLLATION_LOCALE with sv_SE.utf8, which seems to exist on
> my debian, as /usr/lib/locale/sv_SE.utf8/LC_COLLATE, but the linguistic
> rules implemented in Unicode::Collate::Locale for sv do not seem to be
> used.  I also think that it does not matter as this interface is
> unlikely to be used.

Maybe a bug in the Swedish locale (from glibc?) if it is not using
correct collation rules for that language?

> As another side note, it should not be too difficult to add other
> collation possibilities in the XS/C code for other platforms in
> tp/Texinfo/XS/main/manipulate_indices.c by modifying get_sort_key and
> setup_collator, possibly adding another collation type/another
> customization variable, if people wanted to contribute code to support
> their platform specific interfaces that were described in the thread by
> Eli.  I do not think that it is important, though, as the current default
> of using Perl modules leads to correct output, even if it is probably
> slower than what could be obtained with a full C implementation.
> 
> > > I think that we should decide it now in order to have a fully specified
> > > interface, even if it is not fully implemented.  To me it would seems
> > > much more logical to have it be similar to COLLATION_LANGUAGE as it is
> > > the correct one.
> > 
> > Makes sense, that way we can avoid being stuck with bad decisions.  Could
> > it be DOCUMENTLANGUAGE_COLLATION, so
> > 
> > texi2any -c DOCUMENTLANGUAGE_COLLATION=1
> 
> This is implemented

Thanks for doing this.

> , added to NEWS (could be revised for language), not
> in the manual.  I think that it would be better if you added it to the
> manual as I will not write it well.

Done.

> Regarding the alternative names, no problem with changing the names now,
> just tell me.

I think the names are fine as they are now but we could change them if
there was a better proposal before we make a release.

Re: Segmentation fault with GNU texinfo 7.1

2024-02-11 Thread Gavin Smith

On Fri, Feb 09, 2024 at 04:56:36PM +, Francesco Turco wrote:
> Hello.
> 
> I have GNU texinfo 7.1 on a Gentoo Linux system.
> 
> I can make texinfo reliably crash with the following simple steps:
> 1. M-x set-variable
> 2. Type "link-style" (without quotes)
> 3. Type "foo" (without quotes)
> 
> I can reproduce this bug on Void Linux as well, but I don't remember
> which version of texinfo I have on that system).

Thanks for the report and to Andreas for the investigation and fix.

Re: `@unmacro` regression

2024-02-09 Thread Gavin Smith

On Sun, Feb 04, 2024 at 04:18:27PM +, Werner LEMBERG wrote:
> Indeed, because it is not too an exotic feature.  In LilyPond, we have
> a bunch of smaller files that get included in a master document; every
> file comprises a section of the document and essentially starts with
> 
> ```
> @unmacro foo
> @macro foo
> ...
> @end macro
> ```
> 
> Doing this ensures that we can easily change the order of the sections
> if ever necessary, without the need to adjust the macro code.
> 
> > However, since the error message has existed in texinfo.tex for so
> > long I will leave it a few days to see if anybody has any other
> > commments on the matter.
> 
> OK.

Committed today and it will be updated on ftp.gnu.org shortly.

Re: Build from git broken - missing gperf?

2024-02-07 Thread Gavin Smith

On Tue, Feb 06, 2024 at 07:13:09PM +0100, Patrice Dumas wrote:
> On Mon, Feb 05, 2024 at 07:35:59PM +0000, Gavin Smith wrote:
> > I don't know if uniconv/u8-conv-from-enc is a necessary module.  It's
> > not easy to find out how the module is used as the documentation is
> > lacking, but it appears to match libunistring.  The documentation is
> > here:
> > https://www.gnu.org/software/libunistring/manual/html_node/uniconv_002eh.html
> > 
> > I found uses of "u8_strconv_from_encoding" throughout the XS code,
> > although most of the uses (I didn't check them all) have "UTF-8" as one
> > of the arguments, making it appear that we are converting from UTF-8
> > to UTF-8.
> 
> It is the case.  We actually already discussed that issue peviously, in
> the codes I did, and in order to follow what I understood from the
> libunistring documentation, char * is converted to uint8_t by calling
> u8_strconv_from_encoding even though the string is already UTF-8.  In
> your code in xspara.c you simply cast to uint8_t.  It could also be done
> like that in other codes, I do not know what is best.

The immediate solution is to require gperf as a tool for developers, just
like automake, autoconf, etc.

Getting away from u8_strconv_from_encoding could take some more effort
and isn't immediately necessary, but would be nice to do to reduce bloat.
Since we only use it for UTF-8 validation, we could do this in some other
function that is simpler and doesn't pull in as much from gnulib.

I saw your private email from November 2023.  Here's part of what
I wrote in my response (for the benefit of the mailing list):

  We can assume the text strings coming out of Perl are encoded already
  in UTF-8, so running a conversion on them is pointless and confusing.

  According to the libunistring manual:

The five types char *, uint8_t *, uint16_t *, uint32_t *, and wchar_t
* are incompatible types at the C level. Therefore, ‘gcc -Wall’
will produce a warning if, by mistake, your code contains a mismatch
between these types. In the context of using GNU libunistring, even
a warning about a mismatch between char * and uint8_t * is a sign of
a bug in your code that you should not try to silence through a cast.

https://www.gnu.org/software/libunistring/manual/libunistring.html#In_002dmemory-representation

  However, I don't understand how this can possibly be avoided, other than
  by running pointless conversions.  SvPV, which we use in XSParagraph.xs
  to get the pointer, returns a char * value.  Unless the Perl API can
  give a value with a type of uint8_t * to represent a UTF-8 string,
  then we can only avoid such warnings with a cast.

I can see the appeal of not fully trusting Perl's API to provide correct
values for use in our own XS code.  I suggest that if we do use a cast
we can do it in one single place in the code along with any validation
we do on the UTF-8.  We could start with a wrapper around
u8_strconv_from_encoding.  I'm happy to work on this myself when I have
time to.

> That being said, we also directly use gnulib iconv, so I think that
> iconv_open would still be brought in anyway.

We'd have to see if this module was still worth using for the platforms
it supports and the problems it solves.

Re: Build from git broken - missing gperf?

2024-02-05 Thread Gavin Smith

On Sun, Feb 04, 2024 at 07:59:37PM +, Gavin Smith wrote:
> I can't build from commit 5154587461e (2024-02-04).  It ends with the
> errors:
> 
> make[2]: Entering directory '/home/g/src/texinfo/GIT/tp/Texinfo/XS/gnulib/lib'
> gperf -m 10 ./iconv_open-aix.gperf > ./iconv_open-aix.h-t && \
> mv ./iconv_open-aix.h-t ./iconv_open-aix.h
> /bin/bash: line 1: gperf: command not found
> make[2]: *** [Makefile:3550: iconv_open-aix.h] Error 127
> make[2]: Leaving directory '/home/g/src/texinfo/GIT/tp/Texinfo/XS/gnulib/lib'
> make[1]: *** [Makefile:2849: all-recursive] Error 1
> make[1]: Leaving directory '/home/g/src/texinfo/GIT/tp/Texinfo/XS'
> make: *** [Makefile:1970: all] Error 2
> 
> It appears that some of the gnulib modules use gperf.  Any idea why this
> would have stopped working?

By running "git log" on the offending file (gnulib/lib/iconv_open-aix.h)
I found the commit 0bfbc74287b9b8 (2023-01-30 21:52:45 +0100).

The commit message there is

Update tp/Texinfo/XS/gnulib/, add u8-strconv-from-enc and u8-mbsnlen

It's also in the ChangeLog for 2023-01-30:

in tp/Texinfo/XS gnulib-tool --add-import uniconv/u8-strconv-from-enc 
unistr/u8-mbsnlen.

I found it was being rebuilt by "make" because a dependency was updated:

$ ls -l gnulib/lib/iconv_open-aix.gperf 
-rw-rw-r-- 1 g g 1.8k Jan 31 18:24 gnulib/lib/iconv_open-aix.gperf

which came from a gnulib update to another module (all that happened
was that copyright years changed from 2023 to 2024).

Gnulib documents that "gperf" is a required tool for using the "iconv_open"
module.  It's not especially easy, I find, to find why a particular gnulib
module was brought in, but looking at the files under the "modules" directory
of a gnulib checkout, I found the chain of dependencies

uniconv/u8-strconv-from-enc -> uniconv/u8-conv-from-enc -> striconveha
  -> striconveh -> iconv_open

(Of course, there could other dependency chains that also brought this module
in.)

Short of extirpating this dependency, the only solution appears to
be to require anyone building from git to have gperf installed, which
doesn't seem like a good situation, as it was never required before.

I don't know if uniconv/u8-conv-from-enc is a necessary module.  It's
not easy to find out how the module is used as the documentation is
lacking, but it appears to match libunistring.  The documentation is
here:
https://www.gnu.org/software/libunistring/manual/html_node/uniconv_002eh.html

I found uses of "u8_strconv_from_encoding" throughout the XS code,
although most of the uses (I didn't check them all) have "UTF-8" as one
of the arguments, making it appear that we are converting from UTF-8
to UTF-8.

Re: index sorting in texi2any in C issue with spaces

2024-02-05 Thread Gavin Smith

On Sun, Feb 04, 2024 at 10:27:00PM +0100, Patrice Dumas wrote:
> > > However, if there is a
> > > possibility to get variable elements set to "non-ignorable" in C,
> > > possibly by using an hardcoded locale of en_US, it will not possible to
> > > get automatically both the correct and more rapid option.  The user
> > > would still have to set COLLATION_LOCALE to get it.
> > 
> > If this is possible, then we silently switch to using the C sorting 
> > if we can detect that we can treat variable elements in such a way.
> > This would be the "default", non-tailored collation.
> 
> Ok.  In that case, it is even clearer to me that COLLATION_LOCALE is not
> really useful, I think that it will be quite challenging to understand
> what it really does and will probably be unused.  I think that we should
> not propose COLLATION_LOCALE at all to users.  It seems that we should
> consider it as a short-term way to be able to use collation in C, that
> may not be worth having in the long term, and keep it as a feature
> documented as being only for XS and only for development/testing to
> allow developers to test what output it leads to and that it could be
> removed at any time.  Also calling it something else, like
> XS_STRXFRM_COLLATION_LOCALE.
> 

That's fine by me, either accessing it with an obscure testing variable
or not offering it at all.

> > There wouldn't any harm in implementing it as an option.  We'd have to
> > decide if it went via strxfrm_l, Unicode::Collate::Locale, or configurable
> > for either.
> 
> I think that we should decide it now in order to have a fully specified
> interface, even if it is not fully implemented.  To me it would seems
> much more logical to have it be similar to COLLATION_LANGUAGE as it is
> the correct one.

Makes sense, that way we can avoid being stuck with bad decisions.  Could
it be DOCUMENTLANGUAGE_COLLATION, so

texi2any -c DOCUMENTLANGUAGE_COLLATION=1

Is there not a customization variable called "documentlanguage" as well?
One possibility is to get rid of COLLATION_LANGUAGE and do it through
"documentlanguage", like

texi2any -c DOCUMENTLANGUAGE_COLLATION=1 -c documentlanguage=se

instead of

texi2any -c COLLATION_LANGUAGE=se

This would not be as flexible, though, as documentlanguage affects
other things (e.g. translations, what language the output is marked
as in some output formats).

Can we also think if there is an alternative to the word "language"
in the name of COLLATION_LANGUAGE?  The manual for Unicode::Collate::Locale
gives a possibility of "de__phonebook" and "German phonebook" does not
count as a language.  COLLATE_RULES?  COLLATE_TAILORING?  Maybe
COLLATE_LOCALE?  (The last matches the usage of Unicode::Collate::Locale.
This is disregarding the meaning of COLLATE_LOCATE that I previously
suggested).

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Gavin Smith

On Sun, Feb 04, 2024 at 08:38:45PM +0100, Patrice Dumas wrote:
> Thanks.  This is very confusing to me, then, as it is not told that way
> in perllocale, especially the section: 
> https://perldoc.perl.org/perllocale#Category-LC_COLLATE%3A-Collation%3A-Text-Comparisons-and-Sorting
> There is more information in the end of the page that may correspond
> better to the perlop information.  Not important at all anyway
> since we agree that using the user locale is not a good idea in any case.

Yes, it appears to say the opposite:

   Perl uses the platform's C library collation functions "strcoll()" and
   "strxfrm()".  That means you get whatever they give.  On some
   platforms, these functions work well on UTF-8 locales, giving a
   reasonable default collation for the code points that are important in
   that locale.  (And if they aren't working well, the problem may only be
   that the locale definition is deficient, so can be fixed by using a
   better definition file.  Unicode's definitions (see "Freely available
   locale definitions") provide reasonable UTF-8 locale collation
   definitions.)  Starting in Perl v5.26, Perl's use of these functions
   has been made more seamless.  This may be sufficient for your needs.
   For more control, and to make sure strings containing any code point
   (not just the ones important in the locale) collate properly, the
   Unicode::Collate module is suggested.

So COLLATE_LOCALE (if we go with that naming) could potentially be
implemented in Perl as well, if we are able to temporarily switch
the locale.  Speed could be an issue, though.  (Although the documentation
says the result of strxfrm is cached, so maybe not.)

I guess that the other documentation is either out of date, or they
were mandating Unicode::Collate as more portable than relying on the
platform's C library.

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Gavin Smith

On Sun, Feb 04, 2024 at 08:38:45PM +0100, Patrice Dumas wrote:
> >offer much more powerful solutions to collation issues.
> > 
> > - from "man perlop".)
> 
> Thanks.  This is very confusing to me, then, as it is not told that way
> in perllocale, especially the section: 
> https://perldoc.perl.org/perllocale#Category-LC_COLLATE%3A-Collation%3A-Text-Comparisons-and-Sorting
> There is more information in the end of the page that may correspond
> better to the perlop information.  Not important at all anyway
> since we agree that using the user locale is not a good idea in any case.
> 
> > > Here is my updated thinking on the possibilities
> > > 
> > > 1) lexicographic sorting on unicode strings (corresponds to
> > >  USE_UNICODE_COLLATION=0 currently)
> > > 2) unicode default sorting obtained by Unicode::Collate in Perl and
> > >strxfrm_l in C with "en_US.utf-8", the current default ("en_US.utf-8"
> > >could be different on different platforms, a list instead of only one
> > >possibility if "en_US.utf-8" is not always available...)
> > > 3) sorting based on @documentlanguage using, in perl
> > >Unicode::Collate::Locale with locale @documentlanguage and in C
> > >strxfrm_l with "@documentlanguage.utf-8" (at least on GNU/Linux,
> > >the locale name setup for strxfrm_l could be different on other 
> > > platforms).
> > > 4) sorting based on a customization variable, such as COLLATION_LANGUAGE.
> > >it would be the same as the previous one, with @documentlanguage
> > >replaced by COLLATION_LANGUAGE.
> > > 5) sorting based on the user locale, using strxfrm in C and
> > >"use locale" and regular sorting on unicode (internal perl encoded) 
> > > strings
> > >in Perl.
> > > 
> > 
> > My concern here is that there are far too many options for the user to
> > decide between.  They also interact with whether XS or pure Perl modules
> > are being used (which depends on environment variables such as
> > TEXINFO_XS_STRUCTURE and other things).  As far as possible the
> > interface should not specify whether the sorting is done in C or Perl.
> 
> Ok, in principle, but I am not sure that it is really possible given the
> differences.
> 
> > What's of interest to the user are the following three things: speed,
> > correctness, and language-specific tailoring.
> 
> Point is that C is better for speed, but Perl is better for correctness.
> 
> > I think many possibilities can be covered with three customization
> > variables, USE_UNICODE_COLLATION, COLLATION_LOCALE and COLLATION_LANGUAGE:
> > 
> > 1) would be done with USE_UNICODE_COLLATION=0 as you say.  This could
> > also be implemented in C with strcmp (as Andreas pointed out).
> 
> strcmp is always used as a transformation on the string is done with
> strxfrm_l for the collation in C.  If USE_UNICODE_COLLATION=0 the string
> is not transformed, which amounts to using strcmp on the original
> string.  Therefore it is already implemented that way in C, as can be
> seen in tp/Texinfo/XS/main/manipulate_indices.c.

Does this always happen with "texi2any -c USE_UNICODE_COLLATION=0" if
the XS modules are available or are there more restrictions?

I noticed a potential problem:

static void
set_sort_key (locale_t collation_locale, const char *input_string,
  char **result_key)
{
  if (collation_locale)
{
  #ifdef HAVE_STRXFRM_L
  size_t len = strxfrm_l (0, input_string, 0, collation_locale);
  size_t check_len;

  *result_key
= (char *) malloc ((len +1) * sizeof (char));
  check_len = strxfrm_l (*result_key, input_string, len+1,
 collation_locale);
  if (check_len != len)
fatal ("strxfrm_l returns a different length");
  #endif
}
  else
*result_key = strdup (input_string);
}

It looks like *result_key is not set if HAVE_STRXFRM_L is not defined.

> > 2) is two different types of sorting; as you said earlier, the
> > sorting in C may have a different treatment of "variable elements".
> > The first would be accessed with USE_UNICODE_COLLATION=1 and the second
> > with COLLATION_LOCALE=en_US.UTF-8 or possibly COLLATION_LOCALE=en_US.
> > Using strxfrm with en_US would not be the default because of the handling
> > of spaces and also because the interface isn't very portable.
> 
> So, what you mean here is is that with USE_UNICODE_COLLATION=1 and no
> COLLATION_LOCALE, C code should call the perl code that sort indices using
> Unicode::Locale instead of doing the sorting in C.  Did I get it right?

Yes, that's right.  Since there's no C implementation of Unicode
collation that matches how we use Unicode::Collate to do it, then
Unicode::Collate should be the default.


> If COLLATION_LOCALE is set, in C strxfrm_l would be used to do the
> string transformation and sorting.
> 
> If COLLATION_LOCALE is set in Perl, it is not clear to me what would be
> the output.  Would it be ignored?

If by "set in Perl" you mean in an

Build from git broken - missing gperf?

2024-02-04 Thread Gavin Smith

I can't build from commit 5154587461e (2024-02-04).  It ends with the
errors:

make[2]: Entering directory '/home/g/src/texinfo/GIT/tp/Texinfo/XS/gnulib/lib'
gperf -m 10 ./iconv_open-aix.gperf > ./iconv_open-aix.h-t && \
mv ./iconv_open-aix.h-t ./iconv_open-aix.h
/bin/bash: line 1: gperf: command not found
make[2]: *** [Makefile:3550: iconv_open-aix.h] Error 127
make[2]: Leaving directory '/home/g/src/texinfo/GIT/tp/Texinfo/XS/gnulib/lib'
make[1]: *** [Makefile:2849: all-recursive] Error 1
make[1]: Leaving directory '/home/g/src/texinfo/GIT/tp/Texinfo/XS'
make: *** [Makefile:1970: all] Error 2

It appears that some of the gnulib modules use gperf.  Any idea why this
would have stopped working?

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Gavin Smith

On Fri, Feb 02, 2024 at 08:57:01AM +0200, Eli Zaretskii wrote:
> > An alternative is not to have such a variable but just to have an option
> > to collate according to the user's locale.  Then the user would run e.g.
> > "LC_COLLATE=ll_LL.UTF-8 texi2any ..." to use collation from the ll_LL.UTF-8
> > locale.  They would have to have the locale installed that was appropriate
> > for whichever manual they were processing (assuming the "variable weighting"
> > option is appropriate.)
> 
> What would be the default then, though?  AFAIR, we decided by default
> to use en_US.utf-8 for collation, with the purpose of making the
> sorting locale-independent by default, so that Info manuals produced
> with the default settings are identical regardless of the user's
> locale.

I agree that sorting should be locale-independent by default.

Re: index sorting in texi2any in C issue with spaces

2024-02-04 Thread Gavin Smith

On Sun, Feb 04, 2024 at 12:17:16PM +0100, Patrice Dumas wrote:
> On Thu, Feb 01, 2024 at 10:16:07PM +0000, Gavin Smith wrote:
> > An alternative is not to have such a variable but just to have an option
> > to collate according to the user's locale.  Then the user would run e.g.
> > "LC_COLLATE=ll_LL.UTF-8 texi2any ..." to use collation from the ll_LL.UTF-8
> > locale.  They would have to have the locale installed that was appropriate
> > for whichever manual they were processing (assuming the "variable weighting"
> > option is appropriate.)
> 
> I do not like that possibility, I think that we should avoid using the user
> locales when it comes to document output in general.  If we use the user
> locale I think that it should be by using strxfrm in C and "use locale" in
> Perl, not by checking a specific LC_COLLATE value in the environment.

(Note that "cmp" is documented not to work with "use locale" for UTF-8
strings:

   "lt", "le", "ge", "gt" and "cmp" use the collation (sort) order
   specified by the current "LC_COLLATE" locale if a "use locale" form
   that includes collation is in effect.  See perllocale.  Do not mix
   these with Unicode, only use them with legacy 8-bit locale encodings.
   The standard "Unicode::Collate" and "Unicode::Collate::Locale" modules
   offer much more powerful solutions to collation issues.

- from "man perlop".)

> Here is my updated thinking on the possibilities
> 
> 1) lexicographic sorting on unicode strings (corresponds to
>  USE_UNICODE_COLLATION=0 currently)
> 2) unicode default sorting obtained by Unicode::Collate in Perl and
>strxfrm_l in C with "en_US.utf-8", the current default ("en_US.utf-8"
>could be different on different platforms, a list instead of only one
>possibility if "en_US.utf-8" is not always available...)
> 3) sorting based on @documentlanguage using, in perl
>Unicode::Collate::Locale with locale @documentlanguage and in C
>strxfrm_l with "@documentlanguage.utf-8" (at least on GNU/Linux,
>the locale name setup for strxfrm_l could be different on other platforms).
> 4) sorting based on a customization variable, such as COLLATION_LANGUAGE.
>it would be the same as the previous one, with @documentlanguage
>replaced by COLLATION_LANGUAGE.
> 5) sorting based on the user locale, using strxfrm in C and
>"use locale" and regular sorting on unicode (internal perl encoded) strings
>in Perl.
> 

My concern here is that there are far too many options for the user to
decide between.  They also interact with whether XS or pure Perl modules
are being used (which depends on environment variables such as
TEXINFO_XS_STRUCTURE and other things).  As far as possible the
interface should not specify whether the sorting is done in C or Perl.

What's of interest to the user are the following three things: speed,
correctness, and language-specific tailoring.

I think many possibilities can be covered with three customization
variables, USE_UNICODE_COLLATION, COLLATION_LOCALE and COLLATION_LANGUAGE:

1) would be done with USE_UNICODE_COLLATION=0 as you say.  This could
also be implemented in C with strcmp (as Andreas pointed out).

2) is two different types of sorting; as you said earlier, the
sorting in C may have a different treatment of "variable elements".
The first would be accessed with USE_UNICODE_COLLATION=1 and the second
with COLLATION_LOCALE=en_US.UTF-8 or possibly COLLATION_LOCALE=en_US.
Using strxfrm with en_US would not be the default because of the handling
of spaces and also because the interface isn't very portable.

3) and 4) are again potentially different between C and pure
Perl.  I propose that COLLATION_LOCALE would be used for accessing
system locales (with strxfrm or strcoll in C, but in theory this is
language-independent.)  COLLATION_LANGUAGE would be an argument to use
for Unicode::Collate::Locale to get language-specific tailoring, which
in language-independent terms means to use the UCA with tailoring, with
variable collation elements treated as "non-ignorable".  If there is
ever a separate implementation of the UCA in texi2any with access to
tailoring, COLLATION_LANGUAGE would govern it as well.

For 3), accessing @documentlanguage seems like an unnecessary extra
at the moment.  Again, there would be the problem of strxfrm_l and
Unicode::Collate::Locale doing different things with variable collation
elements.  There is no guarantee that the user has the appropriate
locale installed either (for use with strxfrm_l) or that the language
is supported by Unicode::Collate::Locale.

I'm ok with 5) not being implemented (using L

Re: `@unmacro` regression

2024-02-02 Thread Gavin Smith

On Thu, Feb 01, 2024 at 12:37:29PM +, Werner LEMBERG wrote:
> 
> In the texinfo manual of version 7.1 I can find the following about
> `@unmacro`:
> 
> 
>You can undefine a macro FOO with ‘@unmacro FOO’.  It is not an
>error to undefine a macro that is already undefined.  For example:
> 
>  @unmacro foo
> 
> 
> However, this doesn't work.

I don't know if this is a regression as the code for this error message
existed as far back as I could see in the git repository, back to 2002
(commit fc8e4fc5f364ca4f, 2002-09-03).  The text in the manual has also
existed for a very long time (commit 16440098c622, 2002-08-25).  It's
odd that nobody seems to have noticed this before.

It does not appear to cause any error with texi2any.  It would be
straightforward to remove the error message from texinfo.tex, and resolve
the inconsistency with both texi2any and the documentation.  However, since
the error message has existed in texinfo.tex for so long I will leave it
a few days to see if anybody has any other commments on the matter.

> ./u.texi:3: Macro foo not defined.
> @\unmacro ...se @errmessage {Macro #1 not defined}
>   @fi 
> l.3 @unmacro foo
> ```
>

Re: index sorting in texi2any in C issue with spaces

2024-02-01 Thread Gavin Smith

On Thu, Feb 01, 2024 at 09:01:42AM +0200, Eli Zaretskii wrote:
> > Date: Wed, 31 Jan 2024 23:11:02 +0100
> > From: Patrice Dumas 
> > 
> > > Moreover, en_US.utf-8 will use collation appropriate for (US) English.
> > > There may be language-specific "tailoring" for other languages (e.g.
> > > Swedish) that the user may wish to use instead.  Hence, it may be
> > > a good idea to allow use of a user-specified locale for collation through
> > > the C code.
> > 
> > That would not be difficult to implement as a customization variable.
> > What about COLLATION_LANGUAGE?
> 
> What would be the possible values of this variable, and in what format
> will those values be specified?

I imagine it would be a locale name for passing to newlocale and thence
to strxfrm_l.  What Patrice implemented hardcord the name "en_US.utf-8"
but this would be a possible value.

(If there are locale names on MS-Windows that are different, it would
be fine to support them the same way, only the invocation of texi2any
would vary to use a different locale name.)

An alternative is not to have such a variable but just to have an option
to collate according to the user's locale.  Then the user would run e.g.
"LC_COLLATE=ll_LL.UTF-8 texi2any ..." to use collation from the ll_LL.UTF-8
locale.  They would have to have the locale installed that was appropriate
for whichever manual they were processing (assuming the "variable weighting"
option is appropriate.)

It is probably not justified to provide an interface to the flags of
CompareStringW on MS-Windows if we can't provide the same functionality
with strcoll/strxfrm/strxfrm_l.

It seems not very important to provide more of these collation options
for indices as it is not something users are complaining about.

Re: Customization variable to disable Unicode collation?

2024-01-31 Thread Gavin Smith

On Wed, Jan 31, 2024 at 10:38:36AM +0100, Patrice Dumas wrote:
> With collation also possible with XS/C, but with a different result than
> in perl, I think that there should be a way to use perl unicode
> collation from C too, in addition to using a unicode collation or not.
> 
> Should it be a separate customization variable, or should
> USE_UNICODE_COLLATION be replaced by a variable with a textual value
> taking more possibilities, for example USE_COLLATION with possible
> values:
> unicode
> unicodeperl
> basic

Hi, I think we should wait until we decide exactly what we are doing
with collation in C first.  If we are aiming to get exactly the same
collation in C as in Perl, then I don't think we should have a separate
option for this.  If collation with C allows something different, then
we could have an option.  There is the issue of language-specific tailoring
which could be potentially achieved with locale-based sorting in C.

Re: index sorting in texi2any in C issue with spaces

2024-01-31 Thread Gavin Smith

On Wed, Jan 31, 2024 at 10:15:08AM +0100, Patrice Dumas wrote:
> Hello,
> 
> I implemented index sorting in C with XS interface in texi2any.
> When unicode collation is wanted, based on my understanding of
> Eli suggestions, a collation locale is set to "en_US.utf-8", by
>   newlocale (LC_COLLATE_MASK, "en_US.utf-8", 0)
> and then strxfrm_l is used (which should be the same as using
> strcoll_l).  With conversion in C/with XS set with environment variable
> TEXINFO_XS_CONVERT=1 and for now only for HTML, if TEST customization
> variable is not set.

It seems like a pretty obscure interface.  It is barely
documented - newlocale is in the Linux Man Pages but not the
glibc manual, and strxfrm_l was only in the Posix standard
(https://pubs.opengroup.org/onlinepubs/9699919799/functions/strxfrm.html).
I don't know of any other way of accessing the collation functionality.

Do you know how portable it is?  The documentation for the corresponding
Gnulib module says the following:

  Portability problems not fixed by Gnulib:

  This function is missing on many platforms: FreeBSD 6.0, NetBSD 5.0,
  OpenBSD 6.0, Minix 3.1.8, AIX 5.1, HP-UX 11, IRIX 6.5, Solaris 11.3,
  Cygwin 1.7.x, mingw, MSVC 14, Android 4.4.

Could it be possible to have an option of "current locale" collation
which could use more standard interfaces?

Moreover, en_US.utf-8 will use collation appropriate for (US) English.
There may be language-specific "tailoring" for other languages (e.g.
Swedish) that the user may wish to use instead.  Hence, it may be
a good idea to allow use of a user-specified locale for collation through
the C code.

> On my debian GNU/Linux, the result is good except for the treatment of
> spaces.  Indeed, spaces (and non alphanumeric characters, but it is
> not really an issue) are ignored when sorting, which sticks to the Unicode
> collation standard, but leads to an awkward sorting for indices, for
> example 'H r' is sorted after 'Ha'.  In perl, it is possible to
> customize the Unicode::Collate collation, we use 'variable' => 
> 'Non-Ignorable'.

I think either way is in accordance with the collation standard.  The
standard gives four options and "Non-ignorable" is one of them:

http://www.unicode.org/reports/tr10/#Variable_Weighting

I doubt it is possible to customize the collation of a locale with
a function such as newlocale.  I expect the collation order is fixed
when the locale is defined.

I found some locale definition files on my system under
/usr/share/i18n/locales (location mention in man page of the "locale"
command) and there is a file iso14651_t1_common which appears to be
based on the Unicode Collation tables.  I have only skimmed this file
and don't understand the file format well (it's supposed to be documented
in the output of "man 5 locale"), but is really part of glibc internals.

In that file, space has a line

 IGNORE;IGNORE;IGNORE; % SPACE

which appears to define space as a fourth-level collation element,
corresponding to the Shifted option at the link above:

  "Shifted: Variable collation elements are reset to zero at levels one
  through three. In addition, a new fourth-level weight is appended..."

In the Default Unicode Collation Element Table (DUCET), space has the line

0020  ; [*0209.0020.0002] # SPACE

with the "*" character denoting it as a "variable" collation element.

I expect it would require creating a glibc locale to change the collation
order, which is not something we can do.

Re: rendering of singlequote

2024-01-31 Thread Gavin Smith

On Tue, Jan 30, 2024 at 10:34:43PM -0500, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider]]]
> [[[ whether defending the US Constitution against all enemies, ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> 
> When I format the GNU C Intro and Reference, singlequote inside @code
> comes out as a comma-like singlequote, rather that the upright singlequote
> I would expect.  This is made with Texinfo 2018-01-09, which is installed
> in Trisquel 11.

Indeed, we did change the default for this.  In Texinfo 7.1 (18 October 2023)
we changed the glyphs for ` and ' inside @code:

 . in @code, ` and ' output by default with backtick and undirected
   single quote glyphs in the typewriter font.  you can still configure
   this using the @codequoteundirected/@codequotebacktick commands.

- as this is the output people are more likely to expect.

If you are using an older version of Texinfo you can get this output by
putting lines in the file:

@codequoteundirected on
@codequotebacktick on

See "Inserting Quote Characters" in the Texinfo manual.

Re: texinfo does not compile without libintl.h even though configure checks for it

2024-01-28 Thread Gavin Smith

On Mon, Jan 29, 2024 at 01:10:52AM +, Erik A Johnson wrote:
> Without libintl.h, tp/Texinfo/XS/parsetexi/api.c does not compile -- even 
> though configure tests for libintl.h.  In api.c, is the line
> 
>   #include 
> 
> actually needed?  (Commenting that line out in texinfo-7.1 allows it to 
> compile, complete tests, etc. on macOS Sonoma 14.2.1.)
> 
> 

Possibly not.  In the development code some of the code from api.c was
moved to tp/Texinfo/XS/main/build_perl_info.c and there, the #include line
is surrounded by an #ifdef block:

#ifdef ENABLE_NLS
#include 
#endif

This change was made on 2023-08-14.  ChangeLog entry:

2023-08-14  Patrice Dumas  

* tp/Texinfo/XS/parsetexi/build_perl_info.c: include libintl.h only
if ENABLE_NLS.

* tp/Texinfo/XS/parsetexi/build_perl_info.h: do not include errors.h.

Re: Customization variable to disable Unicode collation?

2024-01-28 Thread Gavin Smith

On Sun, Jan 28, 2024 at 09:25:38PM +0100, Patrice Dumas wrote:
> On Sun, Jan 28, 2024 at 08:15:33PM +0000, Gavin Smith wrote:
> > Below is a patch to introduce a new variable to avoid using the
> > Unicode::Collate module.  Turning the module off shortens run times by
> > about 5% (depending on the size of indices in the document).
> > 
> > Users can use this if they don't find texi2any fast enough, if they
> > don't care about having the indices sorted correctly, if they don't
> > have many non-ASCII characters in index entry text, or just for working
> > on a manual.
> > 
> > I propose that USE_UNICODE_COLLATE is on by default, as is currently
> > the case, to have correct index sorting by default, as the impact is
> > relatively small.
> 
> With not unicode collation, the index sorting could be done in C, and my
> guess is that the impact would be important.

It might be - would I be right in saying that this code isn't written?

> > Patrice, is this the correct way to add a customization variable (as
> > I am not familiar with the new Texinfo/options_data.txt file).
> 
> Yes, it looks good.  As a side note, shouldn't the variable name rather
> be USE_UNICODE_COLLATION?

I was referring to the Unicode::Collate module but USE_UNICODE_COLLATION
is just as good a name if not better (as it is grammatical English and
not referring so much to the implementation details of which module is
used to accomplish it).

Re: ffmpeg documentation no longer builds after Texinfo 7.1 (due to --init-file=)

2024-01-28 Thread Gavin Smith

On Sun, Jan 28, 2024 at 09:13:21PM +0100, Patrice Dumas wrote:
> On Sat, Jan 27, 2024 at 05:55:24PM +0000, Gavin Smith wrote:
> > The get_constant function has been recently added but I think the name of
> > this is too generic.
> 
> Any idea on another name?

It reminds me of the output of perl -V.  We can't call them
"configuration variables" as we already use that name for something
else.  But it is a kind of "configuration".

Some ideas:
get_build_constant
get_build_conf - similar to get_conf
get_build_configuration - avoids looking like get_conf
get_build_configuration_constant - quite long
get_build_configuration_variable
get_build_configuration_value - avoids question of whether it is a variable
or a constant
get_compiled_constant - but not everything in Texinfo is compiled
get_constant_conf

I like get_build_constant or get_build_configuration_value.  I couldn't
think of an alternative to the word "build" to represent a certain
installed version of Texinfo but there might be one.

Customization variable to disable Unicode collation?

2024-01-28 Thread Gavin Smith

Below is a patch to introduce a new variable to avoid using the
Unicode::Collate module.  Turning the module off shortens run times by
about 5% (depending on the size of indices in the document).

Users can use this if they don't find texi2any fast enough, if they
don't care about having the indices sorted correctly, if they don't
have many non-ASCII characters in index entry text, or just for working
on a manual.

I propose that USE_UNICODE_COLLATE is on by default, as is currently
the case, to have correct index sorting by default, as the impact is
relatively small.

Patrice, is this the correct way to add a customization variable (as
I am not familiar with the new Texinfo/options_data.txt file).

Sample timings:

$ time ../tp/texi2any.pl ../../libc/libc.texinfo -c USE_UNICODE_COLLATE=1
creature.texi:309: warning: `.' or `,' must follow @xref, not f

real0m6.009s
user0m5.586s
sys 0m0.421s


$ time ../tp/texi2any.pl ../../libc/libc.texinfo -c USE_UNICODE_COLLATE=0
creature.texi:309: warning: `.' or `,' must follow @xref, not f

real0m5.821s
user0m5.460s
sys 0m0.360s


$ time ../tp/texi2any.pl ../../emacs-lispref-27.2/elisp.texi -c 
USE_UNICODE_COLLATE=1
functions.texi:2390: warning: @inforef is obsolete
errors.texi:226: warning: unexpected argument on @ignore line: The following 
seem to be unused now.

real0m5.383s
user0m5.146s
sys 0m0.237s

$ time ../tp/texi2any.pl ../../emacs-lispref-27.2/elisp.texi -c 
USE_UNICODE_COLLATE=0
functions.texi:2390: warning: @inforef is obsolete
errors.texi:226: warning: unexpected argument on @ignore line: The following 
seem to be unused now.

real0m4.960s
user0m4.739s
sys 0m0.221s


diff --git a/tp/Texinfo/Indices.pm b/tp/Texinfo/Indices.pm
index a9c31b2d24..c1032bb199 100644
--- a/tp/Texinfo/Indices.pm
+++ b/tp/Texinfo/Indices.pm
@@ -392,7 +392,8 @@ sub setup_sortable_index_entries($)
   my $collator;
   eval { require Unicode::Collate; Unicode::Collate->import; };
   my $unicode_collate_loading_error = $@;
-  if ($unicode_collate_loading_error eq '') {
+  if ($unicode_collate_loading_error eq ''
+and $customization_information->get_conf('USE_UNICODE_COLLATE')) {
 $collator = Unicode::Collate->new(%collate_options);
   } else {
 $collator = Texinfo::CollateStub->new();
diff --git a/tp/Texinfo/options_data.txt b/tp/Texinfo/options_data.txt
index e972d75f6b..d0788b1eed 100644
--- a/tp/Texinfo/options_data.txt
+++ b/tp/Texinfo/options_data.txt
@@ -185,6 +185,7 @@ TEXTCONTENT_COMMENTconverter_customization 
undef   integer
 # be good to update from time to time to avoid test results that are not
 # valid against their reported DTD.
 TEXINFO_DTD_VERSIONconverter_customization 7.1 char
+USE_UNICODE_COLLATEconverter_customization 1   integer
 
 # Some are for all converters, EXTENSION for instance, some for
 # some converters, for example CLOSE_QUOTE_SYMBOL and many

Re: ffmpeg documentation no longer builds after Texinfo 7.1 (due to --init-file=)

2024-01-27 Thread Gavin Smith

On Sat, Jan 27, 2024 at 06:47:24PM +, Gavin Smith wrote:
> On Sat, Jan 27, 2024 at 07:20:05PM +0100, Patrice Dumas wrote:
> > > I don't know how ffmpeg or other packages are supposed to adapt to the
> > > new interface.  Perhaps we could put special case code in get_conf to
> > > check for "PACKAGE_VERSION" and redirect to the new interface?
> > 
> > That seems complicated.  To adapt to the change in name of the variable,
> > they can try both PACKAGE_VERSION and PACKAGE_VERSION_OPTION with
> > get_conf, that seems ok to me?
> 
> If there is still a customization variable to access for this, then
> I don't see why the name has to be changed.  I haven't had any new thoughts
> on this since my previous emails on the topic, I'm afraid.  I remember
> that not using PACKAGE_VERSION was mainly due to a clash with a symbol
> defined by automake or autoconf, which is an implementation problem that
> we could try to work around, rather than changing our public interface.
> I'll try to propose a change once I've had time to look at it.

Here's the patch to change it back to PACKAGE_VERSION.  I defined macros
with the original values using a _CONFIG suffix as you suggested, which
is needed in one place in the code (in the argument to bindtextdomain).

diff --git a/tp/Texinfo/Convert/Converter.pm b/tp/Texinfo/Convert/Converter.pm
index 5d271a37c5..a96e353c72 100644
--- a/tp/Texinfo/Convert/Converter.pm
+++ b/tp/Texinfo/Convert/Converter.pm
@@ -162,9 +162,9 @@ my %common_converters_defaults = (
 # values are what is used in tests of the Converters.  These variables are
 # customization options, set in the main program when a converter is
 # called from the main program.
-  'PACKAGE_AND_VERSION_OPTION'  => 'texinfo',
-  'PACKAGE_VERSION_OPTION'  => '',
-  'PACKAGE_URL_OPTION'  => 'http://www.gnu.org/software/texinfo/',
+  'PACKAGE_AND_VERSION'  => 'texinfo',
+  'PACKAGE_VERSION'  => '',
+  'PACKAGE_URL'  => 'http://www.gnu.org/software/texinfo/',
   'PROGRAM'  => '',
 );
 
diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm
index 4153982a41..5688bf9c25 100644
--- a/tp/Texinfo/Convert/HTML.pm
+++ b/tp/Texinfo/Convert/HTML.pm
@@ -10828,11 +10828,11 @@ sub _default_format_program_string($)
   my $self = shift;
   if (defined($self->get_conf('PROGRAM'))
   and $self->get_conf('PROGRAM') ne ''
-  and defined($self->get_conf('PACKAGE_URL_OPTION'))) {
+  and defined($self->get_conf('PACKAGE_URL'))) {
 return $self->convert_tree(
   $self->gdt('This document was generated on @emph{@today{}} using 
@uref{{program_homepage}, @emph{{program}}}.',
  { 'program_homepage' => {'text'
-   => $self->get_conf('PACKAGE_URL_OPTION')},
+   => $self->get_conf('PACKAGE_URL')},
'program' => {'text' => $self->get_conf('PROGRAM')} }));
   } else {
 return $self->convert_tree(
@@ -10974,8 +10974,8 @@ sub _file_header_information($$;$)
   my $after_body_open = '';
   $after_body_open = $self->get_conf('AFTER_BODY_OPEN')
 if (defined($self->get_conf('AFTER_BODY_OPEN')));
-  my $program_and_version = $self->get_conf('PACKAGE_AND_VERSION_OPTION');
-  my $program_homepage = $self->get_conf('PACKAGE_URL_OPTION');
+  my $program_and_version = $self->get_conf('PACKAGE_AND_VERSION');
+  my $program_homepage = $self->get_conf('PACKAGE_URL');
   my $program = $self->get_conf('PROGRAM');
   my $generator = '';
   if (defined($program) and $program ne '') {
diff --git a/tp/Texinfo/Convert/Info.pm b/tp/Texinfo/Convert/Info.pm
index f26b3397cd..9bface112d 100644
--- a/tp/Texinfo/Convert/Info.pm
+++ b/tp/Texinfo/Convert/Info.pm
@@ -416,7 +416,7 @@ sub _info_header($$$)
   # This ensures that spaces in file are kept.
   $result .= add_next($paragraph, $output_filename);
   my $program = $self->get_conf('PROGRAM');
-  my $version = $self->get_conf('PACKAGE_VERSION_OPTION');
+  my $version = $self->get_conf('PACKAGE_VERSION');
   if (defined($program) and $program ne '') {
 $result .=
 add_text($paragraph, ", produced by $program version $version from ");
diff --git a/tp/Texinfo/XS/configure.ac b/tp/Texinfo/XS/configure.ac
index 8b5573968f..9e271fda20 100644
--- a/tp/Texinfo/XS/configure.ac
+++ b/tp/Texinfo/XS/configure.ac
@@ -175,5 +175,16 @@ esac
 AC_SUBST([perl_conf_CFLAGS], [$perl_conf_CFLAGS])
 AC_SUBST([perl_conf_LDFLAGS], [$perl_conf_LDFLAGS])
 
+# Output these with the _CONFIG suffix as we use the originals as names
+# of customization variables.
+AC_DEFINE_UNQUOTED([PACKAGE_CONFIG], ["$PACKAGE"],
+   [autoconf PACKAGE value])
+AC_DEFINE_UNQUOTED([PACKAGE_NAME_CONFIG], ["$PACKAGE_NAME"],
+   [autoconf PACKAGE_NAME value])
+AC_DEFINE_UNQU

Re: ffmpeg documentation no longer builds after Texinfo 7.1 (due to --init-file=)

2024-01-27 Thread Gavin Smith

On Sat, Jan 27, 2024 at 07:20:05PM +0100, Patrice Dumas wrote:
> > I don't know how ffmpeg or other packages are supposed to adapt to the
> > new interface.  Perhaps we could put special case code in get_conf to
> > check for "PACKAGE_VERSION" and redirect to the new interface?
> 
> That seems complicated.  To adapt to the change in name of the variable,
> they can try both PACKAGE_VERSION and PACKAGE_VERSION_OPTION with
> get_conf, that seems ok to me?

If there is still a customization variable to access for this, then
I don't see why the name has to be changed.  I haven't had any new thoughts
on this since my previous emails on the topic, I'm afraid.  I remember
that not using PACKAGE_VERSION was mainly due to a clash with a symbol
defined by automake or autoconf, which is an implementation problem that
we could try to work around, rather than changing our public interface.
I'll try to propose a change once I've had time to look at it.

Re: ffmpeg documentation no longer builds after Texinfo 7.1 (due to --init-file=)

2024-01-27 Thread Gavin Smith

On Sat, Jan 27, 2024 at 04:26:10PM +0100, Patrice Dumas wrote:
> Hello,
> 
> On Sun, Nov 05, 2023 at 12:13:00PM +0100, Arsen Arsenović wrote:
> > 
> > Patrice Dumas  writes:
> > 
> > > my $texinfo_version = texinfo_get_conf('PACKAGE_VERSION');
> > > $texinfo_version =~ s/\+dev$//;
> > > # determine if texinfo is at least version 6.8
> > > my $program_version_num = version->declare($texinfo_version)->numify;
> > 
> > Oh, interesting.  On my system, I'm using texinfo built from master, and
> > it completely lacks PACKAGE_VERSION!  (if I rebuild from releases/7.1,
> > it's present).  It looks like it got removed in
> > 6a1a78b941a832033b8939ab4cdce1646dd56904.
> 
> I have added a new interface in the version in development to get the
> PACKAGE_VERSION and other PACKAGE* variables as set by configure:
> 
> my $texinfo_version = Texinfo::Common::get_constant('PACKAGE_AND_VERSION');
> 
> However, since you need to know the version to know if you can call that
> function, this is probably be useful for you only when you do not
> support versions older than the first version with that function anymore.

We should think carefully about the best interface for this as it is
breaking backwards compatibility again.  We should have a good interface
that we won't need to change in the future.  So I would not encourage
users of the interface to change their compatibility code until we have
finalised it.

The get_constant function has been recently added but I think the name of
this is too generic.  It's only seems to be used for version information.
Since the use of this is very limited, why not have special functions
like, for example, Texinfo::Common::get_version which don't take any
parameter?

I notice there is still use of PACKAGE_VERSION_OPTION in the source code.
Is your plan to replace this?

I don't know how ffmpeg or other packages are supposed to adapt to the
new interface.  Perhaps we could put special case code in get_conf to
check for "PACKAGE_VERSION" and redirect to the new interface?

Re: getting included texinfo files interface

2024-01-26 Thread Gavin Smith

On Fri, Jan 26, 2024 at 03:57:24PM +0100, Patrice Dumas wrote:
> Hello,
> 
> With an objective to improve gendoc.sh texinfo sources tarball
> generation code, I implemented, in the developpement version in git
> the possibility with texi2any to output the @include files included
> from a manual, without doing anything else.  A bit like the -M option of
> cpp.  For now this feature is set through the customization option
> SHOW_INCLUDE_FILE_PATHS.

Is this so that gendoc.sh (a gnulib script, I believe), can get the list
of files so it can include them in a Texinfo source tarball?   How do
projects currently get the right files included, or do they not manage
to do this?  Have you made changes to this script to process the list
of included files?

> For example, if mymanual.texi contains in itself or in an included
> file
> 
> @include toto/aa.texi
> @include version
> 
> Then calling texi2any like:
> 
> texi2any -c SHOW_INCLUDE_FILE_PATHS=1 mymanual.texi
> 
> outputs on stdout as sole effect:
> 
> toto/aa.texi
> version
> 
> 
> Is that interface to the feature ok?  Would there be another way to
> specify included files that is common across diverse applications?
> Should we try to do something like -M?

I don't like the idea of outputting Makefile rules but a list of included
files makes sense.  I feel that a command-line argument (--trace-includes?)
would be better as the interface for this.

Re: makeinfo 7.1 misses menu errors

2024-01-23 Thread Gavin Smith

On Mon, Jan 22, 2024 at 10:50:07PM +0100, pertu...@free.fr wrote:
> On Sun, Jan 21, 2024 at 03:13:25PM -0700, Karl Berry wrote:
> > Hi Patrice,
> > 
> > I understand the principle, but for me the lossage in practice is even
> > more unfortunate (by far). It sure seems to me that the "rare" case
> > should be the one to have to make the config file setting. Indeed, the
> > very fact of making that config file setting would helpfully alert
> > contributors and builders that "this is not a normally-structured
> > manual".
> > 
> > I dearly hope you will reconsider. --thanks, karl.
> 
> I am not sure that I can change my mind, to me, when there is a doubt,
> correctness in principle is more important than usability.  However, I
> understand the arguments, which are sound.
> 
> Maybe Gavin, you could decide?

I had in mind at least to set CHECK_NORMAL_MENU_STRUCTURE before the
next release to get the warnings that were output before, if the warnings
cannot be improved.  Perhaps this could change in the future if more manuals
stop using @menu blocks following the "normal" structure.

Re: makeinfo 7.1 misses menu errors

2024-01-20 Thread Gavin Smith

On Sat, Jan 20, 2024 at 06:45:06PM +0100, Patrice Dumas wrote:
> > That reply pertained to the case of a missing menu entry.  Your case
> > is the opposite: a superfluous menu entry.
> 
> To me, the manual with an entry leading to a node that do not
> corresponds to the sectioning structure is perfectly acceptable.  I
> still do not see any issue in having the need to set
> CHECK_NORMAL_MENU_STRUCTURE to get warnings.  I can't see why the user
> would not want to have a menu that do not follow the sectioning
> structure.

I have difficulty following the last sentence here.  Users may want
to have menus that follow the menu structure if they have existing
Texinfo documents that are structured that way, and they are not
ready or willing to delete all of their @menu blocks to generate all
menus automatically.

> To me it was true before, menus were/are as much a list of links as
> structuring commands, and it is even more so today, as now fully
> automatic menus can be obtained with descriptions with @nodedescription.
> Explicit menus are now needed only if one want a structure not following
> the sectioning structure.

This goes against the practice of the vast majority of existing Texinfo
manuals, so this existing practice should be well supported.

It seems to me that we should warn about the extra menu entry in some
way, either by turning CHECK_NORMAL_MENU_STRUCTURE on, or by devising
better warnings.

> I think that it could be possible to change the error location to be the
> menu, but it is not clear that it indeed is the menu, the error could be
> in the choice of the sectioning command, I think that we cannot really
> know where the real error is, the user need to have a look and decide if
> the menu or the sectioning is right.

I suppose it's possible the choice of section command could be wrong,
e.g. @section instead of @subsection etc. generating errors.  If we
change the error messages we should think about what are some common
mistakes a user might make in editing a document.

Re: makeinfo 7.1 misses menu errors

2024-01-19 Thread Gavin Smith

On Thu, Jan 18, 2024 at 04:57:15PM -0700, Karl Berry wrote:
> I believe this is an intentional feature in recent Texinfo versions.
> To get the warnings back, you need to run makeinfo with the
> command-line option "-c CHECK_NORMAL_MENU_STRUCTURE=1".
> 
> Thanks for the hint. I reported a similar thing in July 2023,
> https://lists.gnu.org/archive/html/help-texinfo/2023-07/msg4.html
> 
> and my understanding of Patrice's reply is that the config setting
> was no longer intended to be needed in 7.1:
> https://lists.gnu.org/archive/html/help-texinfo/2023-07/msg5.html

That reply pertained to the case of a missing menu entry.  Your case
is the opposite: a superfluous menu entry.

Here's the documentation, for reference:

‘CHECK_MISSING_MENU_ENTRY’
 When a ‘@menu’ block occurs in a node, check if there is a menu
 entry for all subordinate nodes in the document sectioning
 structure.  On by default.

‘CHECK_NORMAL_MENU_STRUCTURE’
 Warn if the node pointers (either explicitly or automatically set)
 are not consistent with the order of node menu entries.  This is a
 more thorough structure check than that provided by
 ‘CHECK_MISSING_MENU_ENTRY’.  Off by default.

In 7.1, the CHECK_MISSING_MENU_ENTRY setting was introduced and turned
on by default, but this doesn't check for extra entries, i.e. if all
the menu entries are to nodes that have the current node as "Up".

The problem as I remember it was that the error messages are awful:

amdisorder.texi:8791: warning: node `Flag Variables Ordering' is next for 
`Errors with distclean' in menu but not in sectioning
amdisorder.texi:8791: warning: node prev pointer for `Errors with distclean' is 
`distuninstallcheck' but prev is `Limitations on File Names' in menu
amdisorder.texi:8791: warning: node up pointer for `Errors with distclean' is 
`Checking the Distribution' but up is `FAQ' in menu
amdisorder.texi:12435: warning: node next pointer for `Limitations on File 
Names' is `Flag Variables Ordering' but next is `Errors with distclean' in menu
amdisorder.texi:12498: warning: node prev pointer for `Flag Variables Ordering' 
is `Limitations on File Names' but prev is `Errors with distclean' in menu

(with "texi2any amdisorder.texi -c CHECK_NORMAL_MENU_STRUCTURE=1").

There are five error messages here for one mistake.  They are not very
easy to make sense of, in my opinion.

The most relevant warning of these is this one:

amdisorder.texi:8791: warning: node up pointer for `Errors with distclean' is 
`Checking the Distribution' but up is `FAQ' in menu

but this could be better expressed.  The message refers to a "node up
pointer" for the "Errors with distclean" node, but the user has not
provided such a "node up pointer" - it is something that is inferred by
the program.  The line number does not refer to the real location of the
error: 8791 is the location of "@node Errors with distclean", whereas
the true error in the document is elsewhere, in the @menu block in the
"FAQ" node (line 12072).

Moreover, if a node is incorrectly referenced in multiple menus, this
warning about the "node up pointer" is not given for them all.  If I
add the entry to the menu in another node, only errors about "node next"
and "node prev" appear.  For example, adding it to the menu in the node
"Autotools Introduction",

@menu
* GNU Build System::Introducing the GNU Build System
* Use Cases::   Use Cases for the GNU Build System
* Why Autotools::   How Autotools Help
* Hello World:: A Small Hello World Package
* Errors with distclean::   Files left in build directory after distclean   
@end menu

the warnings output are

amdisorder.texi:1129: warning: node `Errors with distclean' is next for `Hello 
World' in menu but not in sectioning
amdisorder.texi:8792: warning: node `Flag Variables Ordering' is next for 
`Errors with distclean' in menu but not in sectioning
amdisorder.texi:8792: warning: node prev pointer for `Errors with distclean' is 
`distuninstallcheck' but prev is `Limitations on File Names' in menu
amdisorder.texi:8792: warning: node up pointer for `Errors with distclean' is 
`Checking the Distribution' but up is `FAQ' in menu
amdisorder.texi:12436: warning: node next pointer for `Limitations on File 
Names' is `Flag Variables Ordering' but next is `Errors with distclean' in menu
amdisorder.texi:12499: warning: node prev pointer for `Flag Variables Ordering' 
is `Limitations on File Names' but prev is `Errors with distclean' in menu

Again, the line numbers given in the error (1129) is nowhere where
the real problem is (line 196).

Looking at the code in 'complete_node_tree_with_menus' in Structuring.pm,
it appears that the CHECK_NORMAL_MENU_STRUCTURE errors only have one
possibility for each direction (Next, Prev, Up) in the "menu directions".
However, this seems wrong, as a node could be referenced in more than
one menu.

It seems to me that the issue is with error

Re: Info: hide-note-references=On disables find-menu

2024-01-17 Thread Gavin Smith

On Tue, Jan 16, 2024 at 08:50:29AM +, Bugsy Abatantuono wrote:
> With `hide-note-references=On` (“hide some Info file syntax in the text
> of nodes”) it’s more readable; namely, “see X” reads better than
> `*note X::`.  However, with that, `find-menu` will always report “No
> menu in this node” evidently because the text `* Menu` is hidden.
> 
> I would love to have my cake and eat it too ;)
> 

Thanks for the report; it should be fixed in commit d5c0f6a (2024-01-17).

I'm surprised anybody is using these obscure "info" features.

Re: Info: segfault on get-help-window with ‘invalid’ infokey

2024-01-17 Thread Gavin Smith

On Tue, Jan 16, 2024 at 08:30:14AM +, Bugsy Abatantuono wrote:
> Steps to reproduce bug:
> 
> 1. Content of the file ./keys:
> 
> #info
> a invalid
> 
> 2. Run: info —init-file ./keys
> 3. Type: H (segfault)
> 
> --version: info (GNU texinfo) 7.0.2
> 

Thanks for the report; it should be fixed in commit 179b5a55 (2024-01-17).

Re: Info: segfault on delete-window after visit-menu

2024-01-16 Thread Gavin Smith

On Tue, Jan 16, 2024 at 08:32:37AM +, Bugsy Abatantuono wrote:
> Steps to reproduce bug:
> 
> 1. Run: info —init-file /dev/null '(info-stnd)Xref Commands'
> 2. Type: M-x visit-menu
> 3. Type: x (segfault)
> 
> --version: info (GNU texinfo) 7.0.2
> 

Thanks for the report.  It should be fixed in commit 4f7bda3236 (2024-01-16).

I had actually been unaware of the existence of 'visit-menu' although I
am sure I have touched the code for this before.  It does not appear
to be documented anywhere, apart from the list of commands in the  built-in
help.

Re: tiny typo in page

2024-01-03 Thread Gavin Smith

On Wed, Jan 03, 2024 at 04:17:02AM +, D Chate wrote:
> Hi
> 
> Sorry to even mention: the 2nd link near the end of
> this page:
>   https://www.gnu.org/software/texinfo
> currently reads:
>   "Info brower introduction"
> not:
>   "Info browser introduction"
> 

Thanks for reporting it, it's no trouble at all to fix it.

> Anyway thanks so much _for_everything_ and best wishes to
> all - rms especially.  Get well soon Richard.
> 
> D Chate
> 
>

Re: [PATCH] use https with texinfo homepage

2023-12-27 Thread Gavin Smith

On Tue, Dec 26, 2023 at 08:17:32PM -0500, Mike Frysinger wrote:
> There are other http:// URIs that could be updated, but I wanted to focus
> on the texinfo homepage to start with.  I also left the DTD URI alone as
> those act more as permanent unique ids rather than exact URIs that people
> visit, and typically existing ones stay as http://.

Thanks for sending this.  I've worked through the files and made the changes
as appropriate.  Comments below.

>  contrib/txipsfonts-bronger.tex |  4 ++--

I've removed this file from the repository as it is not maintained and
I don't want the responsibility of making any more changes to it.

>  man/pod2texi.1 |  2 +-

This file is automatically generated (although tracked in git for some
reason, unlike the other files man/*.1).

>  po/ca.po   |  8 

All of the po files are provided by a third party (the Translation Project)
and then automatically processed by Makefile rules, so we rarely make changes
to these files directly.

Re: INFO command erro

2023-12-23 Thread Gavin Smith

On Fri, Dec 22, 2023 at 03:49:26PM -0300, Rammon Resende wrote:
>  INFO command in linux ubuntu is installed and is giving the following
> error in the terminal: aborted (core image saved)
> 
> I try to use the command but it always shows this error, I have already
> reinstalled and installed it and it persists.Info gnome-shell[2571]:
> [2565:2594:1222/143805.051407:ERROR:connection_factory_impl.cc(428)] Failed
> to connect to MCS endpoint with error -106
> 
> error image in attachment

(Sending a screenshot of an error message is not the best way to report
an error.  It takes up too much disk space so people receiving the email
will likely delete the attachment.)

I am not familiar with this way of error reporting in Ubuntu.  I don't
know what an MCS endpoint is.

It is possibly an old bug, as in your screenshot, it said the version was
6.7.  There was a bug when Info was run in a Brazilian Portuguese locale,
which may be the case (I can't tell personally from the language in the
screenshot), which was fixed in the 7.0 release.  This was due to an
faulty translation file for that language.

To check if it is locale-related, you could try running "LC_ALL=C info".
You could also try to install a newer version of Info than 6.7.

This was fixed in Debian for at least some of their Texinfo packages
(see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1013283) (relevant
as Ubuntu is based on Debian, I believe).  If it turns out to be this same
problem, you could raise a bug with your distribution (Ubuntu) to ask them
to apply the fix for your version of the package.

Re: "make distclean" does not bring back build tree to previous state

2023-12-17 Thread Gavin Smith

On Wed, Dec 13, 2023 at 01:41:01PM +0200, Eli Zaretskii wrote:
> > We'll probably have to investigate each of these changes separately to
> > see where they occur.  I am probably not going to fix them all in one
> > go.
> > 
> > First, could you confirm which version of Texinfo you got these results
> > with.
> > 
> > I tested with Texinfo 7.1, ran configure with the same configure line
> > as you, then ran "make distclean".
> 
> Let me point out that "make distclean" is NOT supposed to revert the
> tree to a clean state as far as Git is concerned.  "make distclean" is
> supposed to remove any files built or modified as part of building a
> release tarball, and release tarballs can legitimately include files
> that are not versioned, and therefore are not managed by Git.  So
> looking at the results of "git status" is not the correct way of
> finding files that "make distclean" is supposed to remove.  Instead,
> one should:
> 
>   . create a release tarball
>   . unpack and build the release tarball in a separate directory
>   . run "make distclean" in the directory where the tarball was built
>   . unpack the release tarball in another directory, and then compare
> that other directory with the one where you run "make distclean"
> 

I've just done this on the release/7.1 branch and the results look
good.

I think we have investigated everything we can now based on the original
report, without getting more information.  I suspect there is some more
Debian-specific context about how the package is being built that we are
missing.

Regarding these changes:

modified:   man/makeinfo.1
modified:   man/texi2dvi.1
modified:   man/texindex.1

These files aren't tracked in the official git repository.  Since they
are distributed files that can also be rebuilt, it is always possible
that they are modified if the rules to rebuild them trigger.  This
seems unavoidable.

Non-ASCII file names in git

2023-12-17 Thread Gavin Smith

On Sat, Oct 14, 2023 at 09:41:46AM +0300, Eli Zaretskii wrote:
> > Eli, are you able to test this from git or do you need me to make another
> > pretest release?
> 
> Git is a bit problematic, as some of the file names include non-ASCII
> characters.  For this reason, and also for others (e.g., I have
> already made too many changes to 7.0.93 sources), I'd prefer another
> pretest.

The test suite tests file names with non-ASCII characters for the
purpose of testing encoding issues and better supporting use of different
character encodings.

However, this has the ironic effect of making the Texinfo source code
*less* accessible, especially on MS-Windows, or any system where non-ASCII
bytes in file names downloaded from git cause an issue.  It seems that
these systems are the ones where character encoding problems would most
likely occur, so we would want users to be testing and reporting any
such problems.

Running

$ LC_ALL=C find . -name '*[^ -~]*

to find files with non-ASCII names, all the results are under tp/tests.
Hence, it would be straightforward to eliminate such file names, with
the cost of also eliminating these tests.

Since these tests contain these characters in reference output file names,
there would seem to be no simple way to adjust them to work with fully-ASCII
file names.  (The only possibility to me appears to be to store them in
git in some kind of coded form, and then to decode them when the tests run.
For example, as a tar archive file with an ASCII file name.  This would
be some work to implement, of course.)

For example:

$ find ./tp/tests/encoded/res_parser/non_ascii_test_epub
./tp/tests/encoded/res_parser/non_ascii_test_epub
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8.2
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8.1
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/xhtml
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/xhtml/osé_utf8.xhtml
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/xhtml/Chapteur.xhtml
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/xhtml/nav_toc.xhtml
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/images
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/images/2-an_image.png
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/images/1-an_image.png
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/osé_utf8.opf
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/mimetype
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/META-INF
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/META-INF/container.xml

The question is how beneficial it would be to have wholly ASCII file names
for all files tracked in git.  By default, we will keep these tests, but
could decide otherwise if there is feedback from users of systems where
they cause an issue.

Re: "make distclean" does not bring back build tree to previous state

2023-12-17 Thread Gavin Smith

On Tue, Dec 12, 2023 at 08:21:29PM +, Gavin Smith wrote:
> > modified:   tp/Texinfo/XS/TestXS.c
> 
> This file was modified.  It is probably because xsubpp ran to regenerate
> TestXS.c.
> 
> This issue was previously reported:
> 
> https://lists.gnu.org/archive/html/bug-texinfo/2021-02/msg00160.html
> https://lists.gnu.org/archive/html/bug-texinfo/2021-03/msg00030.html
> https://lists.gnu.org/archive/html/bug-texinfo/2021-08/msg00021.html
> 
> - but I did not want to make any changes at that time, due to how
> fragile the XS build code was and the risk of breaking something.
> 
> Obviously clobbering the pristine TestXS.c with no way to get back the
> original version (different because of varying output for different
> versions of xsubpp) is not acceptable.
> 
> I am going to make a change not to distribute TestXS.c, so that xsubpp
> will always be required for building XS modules.

I've made this change.  Arguably it was not necessary, as the same issue
occurs with distributed Info files, which could be overwritten.  However,
it's potentially less of a distraction this way, as distributing a
generated file is the exceptional case.

We will have to wait and see if not distributing the *.c files causes
any build issues for users without a working xsubpp.

Re: "make distclean" does not bring back build tree to previous state

2023-12-17 Thread Gavin Smith

On Tue, Dec 12, 2023 at 08:21:29PM +, Gavin Smith wrote:
> > modified:   po_document/ca.po
> 
> Likewise.  po files being updated is a perennial nuisance that I thought
> we had got right with documenting release procedures exactly, but evidently
> it still isn't right.

I think we needed to run "make update-po" an extra time after adjusting
some of the translation files.  For me, it wasn't just ca.po that changed
but several others too.

Re: Unable to build from master (cbeb65df962)

2023-12-13 Thread Gavin Smith

On Wed, Dec 13, 2023 at 09:31:16AM +0100, Patrice Dumas wrote:
> On Tue, Dec 12, 2023 at 09:01:30PM +0000, Gavin Smith wrote:
> > I cannot currently build from master.  It gets this far:
> 
> There was a missing dTHX. Here is a patch:
> https://git.savannah.gnu.org/cgit/texinfo.git/commit/?id=202e32dd3a932cec7c6b3efa13de68f85cf863fe
> 
> It did not produce an error for me. I probably missed it because I now
> use perl 5.38 to avoid valgrind issues. There are indeed many more
> valgrind false positive from perl getenv/setenv when running with
> TEXINFO_XS_CONVERT=1.

Thanks that makes sense!

Unable to build from master (cbeb65df962)

2023-12-12 Thread Gavin Smith

I cannot currently build from master.  It gets this far:


libtool: compile:  x86_64-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I./main 
-I./structuring_transfo -I./convert -I. -DDATADIR=\"/usr/local/share\" 
-I./gnulib/lib -I./gnulib/lib -DVERSION=\"0\" -DXS_VERSION=\"0\" 
-I/usr/lib/x86_64-linux-gnu/perl/5.34/CORE -D_REENTRANT -D_GNU_SOURCE -DDEBIAN 
-fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE 
-D_FILE_OFFSET_BITS=64 -O2 -g -fPIC -MT 
convert/ConvertXS_la-build_html_perl_state.lo -MD -MP -MF 
convert/.deps/ConvertXS_la-build_html_perl_state.Tpo -c 
convert/build_html_perl_state.c  -fPIC -DPIC -o 
convert/.libs/ConvertXS_la-build_html_perl_state.o
In file included from convert/build_html_perl_state.c:23:
convert/build_html_perl_state.c: In function ‘build_simpletitle’:
/usr/lib/x86_64-linux-gnu/perl/5.34/CORE/perl.h:155:17: error: ‘my_perl’ 
undeclared (first use in this function)
  155 | #  define aTHX  my_perl
  | ^~~
/usr/lib/x86_64-linux-gnu/perl/5.34/CORE/perl.h:160:25: note: in expansion of 
macro ‘aTHX’
  160 | #  define aTHX_ aTHX,
  | ^~~~
/usr/lib/x86_64-linux-gnu/perl/5.34/CORE/embed.h:244:64: note: in expansion of 
macro ‘aTHX_’
  244 | fine hv_common_key_len(a,b,c,d,e,f)  Perl_hv_common_key_len(aTHX_ 
a,b,c,d,e,f)
  | ^

/usr/lib/x86_64-linux-gnu/perl/5.34/CORE/hv.h:486:13: note: in expansion of 
macro ‘hv_common_key_len’
  486 | ((SV**) hv_common_key_len((hv), (key), (klen),  
\
  | ^
convert/build_html_perl_state.c:1012:3: note: in expansion of macro ‘hv_store’
 1012 |   hv_store (converter_hv, "simpletitle_tree",
  |   ^~~~
/usr/lib/x86_64-linux-gnu/perl/5.34/CORE/perl.h:155:17: note: each undeclared 
identifier is reported only once for each function it appears in
  155 | #  define aTHX  my_perl
  | ^~~
/usr/lib/x86_64-linux-gnu/perl/5.34/CORE/perl.h:160:25: note: in expansion of 
macro ‘aTHX’
  160 | #  define aTHX_ aTHX,
  | ^~~~
/usr/lib/x86_64-linux-gnu/perl/5.34/CORE/embed.h:244:64: note: in expansion of 
macro ‘aTHX_’
  244 | fine hv_common_key_len(a,b,c,d,e,f)  Perl_hv_common_key_len(aTHX_ 
a,b,c,d,e,f)
  | ^

/usr/lib/x86_64-linux-gnu/perl/5.34/CORE/hv.h:486:13: note: in expansion of 
macro ‘hv_common_key_len’
  486 | ((SV**) hv_common_key_len((hv), (key), (klen),  
\
  | ^
convert/build_html_perl_state.c:1012:3: note: in expansion of macro ‘hv_store’
 1012 |   hv_store (converter_hv, "simpletitle_tree",
  |   ^~~~
make[2]: *** [Makefile:2302: convert/ConvertXS_la-build_html_perl_state.lo] 
Error 1
make[2]: Leaving directory '/home/g/src/texinfo/GIT/tp/Texinfo/XS'
make[1]: *** [Makefile:2790: all-recursive] Error 1
make[1]: Leaving directory '/home/g/src/texinfo/GIT/tp/Texinfo/XS'
make: *** [Makefile:1924: all] Error 2

This is after running a plain "./configure" followed by "make".  Any idea
why my_perl is not defined?

Re: "make distclean" does not bring back build tree to previous state

2023-12-12 Thread Gavin Smith

On Sun, Dec 10, 2023 at 04:00:56PM +0100, Preuße, Hilmar wrote:
> Hello,
> 
> I got a report telling that "make distclean" does not bring back the build
> tree into the original state. After running a build (configure line below)
> and calling "make distclean", we have a few differences. Some files were
> deleted, some files were re-generated and hence show a different time stamp
> inside the file:

We'll probably have to investigate each of these changes separately to
see where they occur.  I am probably not going to fix them all in one
go.

First, could you confirm which version of Texinfo you got these results
with.

I tested with Texinfo 7.1, ran configure with the same configure line
as you, then ran "make distclean".

> 
> Changes not staged for commit:
>   (use "git add/rm ..." to update what will be committed)
>   (use "git restore ..." to discard changes in working directory)
> deleted:INSTALL

This file was not deleted for me.  But it definitely should not be
deleted and if it is, there is a bug.  Please explain exactly what you
did to make this happen.

> deleted:build-aux/compile
> deleted:build-aux/config.rpath
> deleted:build-aux/depcomp
> deleted:build-aux/mdate-sh

I checked that all these files were still present.

> modified:   doc/stamp-1
> modified:   doc/stamp-2
> modified:   doc/stamp-vti
> modified:   doc/version-stnd.texi
> modified:   doc/version-texi2any_api.texi
> modified:   doc/version.texi

These files were not modified, although it is possible that this was because
I did not run "make" first.

> modified:   man/makeinfo.1
> modified:   man/texi2dvi.1
> modified:   man/texindex.1

Likewise.

> modified:   po_document/ca.po

Likewise.  po files being updated is a perennial nuisance that I thought
we had got right with documenting release procedures exactly, but evidently
it still isn't right.

> modified:   tp/Texinfo/XS/TestXS.c

This file was modified.  It is probably because xsubpp ran to regenerate
TestXS.c.

This issue was previously reported:

https://lists.gnu.org/archive/html/bug-texinfo/2021-02/msg00160.html
https://lists.gnu.org/archive/html/bug-texinfo/2021-03/msg00030.html
https://lists.gnu.org/archive/html/bug-texinfo/2021-08/msg00021.html

- but I did not want to make any changes at that time, due to how
fragile the XS build code was and the risk of breaking something.

Obviously clobbering the pristine TestXS.c with no way to get back the
original version (different because of varying output for different
versions of xsubpp) is not acceptable.

I am going to make a change not to distribute TestXS.c, so that xsubpp
will always be required for building XS modules.

Later then I plan to add this and other fixes to the release branch and
make a new Texinfo 7.1.1 release where "make distclean" works correctly.

> (I do the test inside a git repo to make the difference easily visible).
> 
> Would be nice if you could have a look at the issue. Are the deleted /
> re-generated files really needed for building? Feel free to contact for
> questions, please keep me in Cc, I'm not subscribed.
> 
> AWK=awk ./configure \
>   --with-external-Text-Unidecode=yes \
>   --prefix=/usr \
>   --libexecdir='$${prefix}/lib' \
>   --infodir='$${prefix}/share/info' --mandir='$${prefix}/share/man'
> 
> Many thanks,
>   Hilmar
> -- 
> sigfault
>

Re: Texinfo.tex, problem with too-long table inside @float

2023-12-03 Thread Gavin Smith

On Sat, Dec 02, 2023 at 01:28:49PM -0700, arn...@skeeve.com wrote:
> Hi.
> 
> Thanks for the response. I do have a number of floats (it's a good-
> sized book) but this may be the only table that is so large.
> It is important that I refer to the floats from the text with an xref.
> 
> Since the \vbox is never actually moved around, maybe just don't
> put the enclosed stuff inside one?  That would suit me fine for both
> texinfo.tex and makeinfo --latex (which I will use to submit the
> book for production).

The \vbox is not moved around for texinfo.tex but it is possible that
the @float may be moved around in other output formats: LaTeX or DocBook,
for example.

(For LaTeX, there is the 'longtable' package
(https://ctan.org/pkg/longtable?lang=en) which appears to allow splitting
tables across pages, although using this would require you to edit
the LaTeX output manually.)

The solution that occurs to me is to recognise a third argument for
@float.  @float was introduced in Texinfo 4.7, in 2004.  From NEWS:

  . new commands @float, @caption, @shortcaption, @listoffloats for
initial implementation of floating material (figures, tables, etc).
Ironically, they do not yet actually float anywhere.

The comments in texinfo.tex state:

% @float FLOATTYPE,LABEL,LOC ... @end float for displayed figures, tables,
% etc.  We don't actually implement floating yet, we always include the
% float "here".  But it seemed the best name for the future.

[...]

% #1 is the optional FLOATTYPE, the text label for this float, typically
% "Figure", "Table", "Example", etc.  Can't contain commas.  If omitted,
% this float will not be numbered and cannot be referred to.
%
% #2 is the optional xref label.  Also must be present for the float to
% be referable.
%
% #3 is the optional positioning argument; for now, it is ignored.  It
% will somehow specify the positions allowed to float to (here, top, bottom).
%

(I can't find any discussions from the time about the new feature
in the mailing list archives.) 

If the position is given as "here" (or whatever we decide), we could
then omit the \vtop wrapping.  So in your example, the @float line
would become

@float Table,table-errno-values,here

This would also need modifications to texi2any to recognise the additional
argument.

As I said before, with the label at the bottom of the table, if the
table is split across pages, then it does not function properly as
a label.  So simply removing the \vtop wrapping may not produce
satisfactory output.

Re: Adding emacs info-mode to https://www.gnu.org/software/texinfo/links.html

2023-12-03 Thread Gavin Smith

On Thu, Nov 30, 2023 at 05:03:21PM -0600, axelbo...@yahoo.com wrote:
> Hello,
> 
>  
> 
> please consider adding emacs info-mode to the list of info readers at
> https://www.gnu.org/software/texinfo/links.html.
> 

Emacs Info mode was the main reason for the sentence, "The Texinfo system
is well-integrated with GNU Emacs" on the main page.

Emacs Info mode is more important than anything else on the links
page, so I don't think it belongs there.

I have added a link to
https://www.gnu.org/software/emacs/manual/html_node/info/index.html
on the main page in case people are using the Texinfo landing page
to look for it, although much of that manual is more appropriate to read
inside Info itself, rather than a web browser.

Re: Texinfo.tex, problem with too-long table inside @float

2023-12-01 Thread Gavin Smith

On Fri, Dec 01, 2023 at 10:45:14AM +0200, Aharon Robbins wrote:
> Hi.
> 
> Using version 2023-10-19.19 of texinfo.tex, there is a problem if
> a table inside @float goes over one page.  See the attached file.
> 
> If the table is not inside @float, it formats just fine across
> multiple pages.

Unfortunately, it's inherent in the meaning of the @float command that
the enclosed material will fit on a single page.  As long as it is output
inside a \vbox or \vtop, it can't be split across pages.

If I convert your test file to LaTeX with 'texi2any --latex', and then
process the output with pdflatex, the output PDF has exactly the same
problem, with the table disappearing off the bottom of the page.

You could use an @anchor to refer to the table, although without
the automatic numbering of the table, unfortunately.  @float has
the dual purpose of providing numbering and cross-referencing,
as well as controlling the formatting of the block.

(It seems that the @float command at one time was intended to change
the position of the block on the page, but this was never implemented
in texinfo.tex.)

The float caption would not be satisfactorily placed even if the
table did split across pages, as it is placed at the end of the
float, so would not be visible on the first page where the table
appears.

How important is it to you that you can refer to tables with text
like "See Table 1"?  Do you have many numbered tables in the document?

Could it be appropriate to place such long tables as these in a section
of their own which then could be cross-referenced as usual?

> This example is culled from a book I'm working on. I need the @float
> in order to be able to use an @ref to the table from the inline prose.
> 
> Much thanks!
> 
> Arnold

Re: Use "simple parsers" for translations again?

2023-11-25 Thread Gavin Smith

On Sat, Nov 25, 2023 at 09:35:28AM +0100, Patrice Dumas wrote:
> > It seems that we could stop these structures being modified with the
> > addition of a flag on the parser (e.g. $parser->{'locked'} or
> > $parser->{'simple'}).  Then check this flag any time:
> > 
> > * A new Texinfo command is to be defined (@macro, @alias, @definfoenclose).
> > * A new index is to be added (@defindex, @defcodeindex).
> > 
> > Then the change to the state wouldn't be made if the flag was set, and
> > possibly a warning message could be given too.
> > 
> > I can try to work on a patch for this if this approach sounds like a
> > good idea.
> 

I've sent a series of three patches to bug-texinfo@gnu.org.

[PATCH 3/3] * tp/Texinfo/ParserNonXS.pm (_new_macro), (_parse_line_command_args) <@alias, @definfoenclose> <@defindex, @defcodeindex>: Do nothing if 'restricted' flag set.

2023-11-25 Thread Gavin Smith

* tp/t/init_files_tests.t
(macro_defined_txiinternalvalue_in_translation): Remove test,
as @macro no longer works inside a translated string.
---
 ChangeLog |  10 ++
 tp/Makefile.tres  |   1 -
 tp/Texinfo/ParserNonXS.pm |  14 +-
 .../translate_txiinternalvalue_macro.init |  30 
 tp/t/init_files_tests.t   |   7 -
 ...defined_txiinternalvalue_in_translation.pl | 155 --
 6 files changed, 21 insertions(+), 196 deletions(-)
 delete mode 100644 tp/t/init/translate_txiinternalvalue_macro.init
 delete mode 100644 
tp/t/results/init_files_tests/macro_defined_txiinternalvalue_in_translation.pl

diff --git a/ChangeLog b/ChangeLog
index fec8ab7980..0c336d739c 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@
+2023-11-25  Gavin Smith 
+
+   * tp/Texinfo/ParserNonXS.pm (_new_macro),
+   (_parse_line_command_args) <@alias, @definfoenclose>
+   <@defindex, @defcodeindex>: Do nothing if 'restricted' flag set.
+
+   * tp/t/init_files_tests.t
+   (macro_defined_txiinternalvalue_in_translation): Remove test,
+   as @macro no longer works inside a translated string.
+
 2023-11-25  Gavin Smith 
 
Parser 'restricted' flag
diff --git a/tp/Makefile.tres b/tp/Makefile.tres
index 98c1121336..11f0bf806b 100644
--- a/tp/Makefile.tres
+++ b/tp/Makefile.tres
@@ -987,7 +987,6 @@ test_files_generated_list = 
$(test_tap_files_generated_list) \
   t/results/init_files_tests/customize_translations/res_html \
   t/results/init_files_tests/documentation_examples.pl \
   t/results/init_files_tests/documentation_examples/res_html \
-  t/results/init_files_tests/macro_defined_txiinternalvalue_in_translation.pl \
   t/results/init_files_tests/modified_translation.pl \
   t/results/init_files_tests/redefined_need.pl \
   t/results/init_files_tests/sc_formatting_with_css.pl \
diff --git a/tp/Texinfo/ParserNonXS.pm b/tp/Texinfo/ParserNonXS.pm
index 1c5bac1320..feaee9620c 100644
--- a/tp/Texinfo/ParserNonXS.pm
+++ b/tp/Texinfo/ParserNonXS.pm
@@ -6556,6 +6556,8 @@ sub _new_macro($$$)
   my $name = shift;
   my $current = shift;
 
+  return if $self->{'restricted'};
+
   my $macrobody;
   if (defined($current->{'contents'})) {
 $macrobody =
@@ -7601,7 +7603,9 @@ sub _parse_line_command_args($$$)
 
   if ($command eq 'alias') {
 # REMACRO
-if ($line =~ 
s/^([[:alnum:]][[:alnum:]-]*)(\s*=\s*)([[:alnum:]][[:alnum:]-]*)$//) {
+if ($self->{'restricted'}) {
+  # do nothing
+} elsif ($line =~ 
s/^([[:alnum:]][[:alnum:]-]*)(\s*=\s*)([[:alnum:]][[:alnum:]-]*)$//) {
   my $new_command = $1;
   my $existing_command = $3;
   $args = [$1, $3];
@@ -7631,7 +7635,9 @@ sub _parse_line_command_args($$$)
   } elsif ($command eq 'definfoenclose') {
 # REMACRO
 # FIXME how to handle non ascii space?  As space or in argument?
-if ($line =~ 
s/^([[:alnum:]][[:alnum:]\-]*)\s*,\s*([^\s,]*)\s*,\s*([^\s,]*)$//) {
+if ($self->{'restricted'}) {
+  # do nothing
+} elsif ($line =~ 
s/^([[:alnum:]][[:alnum:]\-]*)\s*,\s*([^\s,]*)\s*,\s*([^\s,]*)$//) {
   $args = [$1, $2, $3 ];
   my ($cmd_name, $begin, $end) = ($1, $2, $3);
   $self->{'definfoenclose'}->{$cmd_name} = [ $begin, $end ];
@@ -7681,7 +7687,9 @@ sub _parse_line_command_args($$$)
 }
   } elsif ($command eq 'defindex' || $command eq 'defcodeindex') {
 # REMACRO
-if ($line =~ /^([[:alnum:]][[:alnum:]\-]*)$/) {
+if ($self->{'restricted'}) {
+  # do nothing
+} elsif ($line =~ /^([[:alnum:]][[:alnum:]\-]*)$/) {
   my $name = $1;
   if ($forbidden_index_name{$name}) {
 $self->_line_error(sprintf(
diff --git a/tp/t/init/translate_txiinternalvalue_macro.init 
b/tp/t/init/translate_txiinternalvalue_macro.init
deleted file mode 100644
index 8269fd1a7f..00
--- a/tp/t/init/translate_txiinternalvalue_macro.init
+++ /dev/null
@@ -1,30 +0,0 @@
-
-texinfo_register_command_formatting('sp', \_sp_formatting);
-
-sub my_sp_formatting($)
-{
-  my $self = shift;
-  my $cmdname = shift;
-  my $command = shift;
-  my $args = shift;
-  my $content = shift;
-
-  if (defined($command->{'extra'})
-  and defined($command->{'extra'}->{'misc_args'})
-  and defined($command->{'extra'}->{'misc_args'}->[0])) {
-my $sp_nr = $command->{'extra'}->{'misc_args'}->[0];
-my $translated_tree = $self->gdt('@macro txiinternalvalue
-user internalvalue
-@end macro
-{myarg} @TeX{}', {'myarg' => {'contents' => [{'text' => $sp_nr}]}});
-#print STDERR "T 
".Texinfo::Common::debug_print_tree($translated_tree)."\n";
-$result = $self->convert_tree($translated_tree);
-return $result;
-  }
-
-  return &{$self->default_command_conversion($cmdname)}($self,
-$cmdname, $command, $args, $content);
-}
-
-
-1;
diff --git a/tp/t/init_files_tests.t b/tp/t/init_file

[PATCH 2/3] Parser 'restricted' flag

2023-11-25 Thread Gavin Smith

* tp/Texinfo/ParserNonXS.pm (simple_parser): Add 'restricted'
flag on parser.
(_enter_index_entry): Do nothing if flag is set.  This prevents
index information leaking among simple_parser objects.
(simple_parser): Re-use indices information.

* tp/t/init/translation_in_parser_in_translation.pm:
Do not access index information from translation parser, as it
isn't there.
---
 ChangeLog | 14 +++
 tp/Texinfo/ParserNonXS.pm |  9 -
 .../translation_in_parser_in_translation.pm   | 39 ---
 .../res_html/chap.html|  2 +-
 .../res_html/index.html   |  8 ++--
 5 files changed, 26 insertions(+), 46 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index b50d9056e8..fec8ab7980 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,17 @@
+2023-11-25  Gavin Smith 
+
+   Parser 'restricted' flag
+
+   * tp/Texinfo/ParserNonXS.pm (simple_parser): Add 'restricted'
+   flag on parser.
+   (_enter_index_entry): Do nothing if flag is set.  This prevents
+   index information leaking among simple_parser objects.
+   (simple_parser): Re-use indices information.
+
+   * tp/t/init/translation_in_parser_in_translation.pm:
+   Do not access index information from translation parser, as it
+   isn't there.
+
 2023-11-25  Gavin Smith 
 
Re-add simple_parser
diff --git a/tp/Texinfo/ParserNonXS.pm b/tp/Texinfo/ParserNonXS.pm
index 6a1a7ac2df..1c5bac1320 100644
--- a/tp/Texinfo/ParserNonXS.pm
+++ b/tp/Texinfo/ParserNonXS.pm
@@ -635,6 +635,10 @@ sub simple_parser(;$)
   my $parser = dclone(\%parser_default_configuration);
   bless $parser;
 
+  # Flag to say that some parts of the parser should not be modified,
+  # as they are reused among all parsers returned from this function.
+  $parser->{'restricted'} = 1;
+
   _setup_conf($parser, $conf);
   # This is not very useful in perl, but mimics the XS parser
   print STDERR " RESETTING THE PARSER !\n"
@@ -644,8 +648,7 @@ sub simple_parser(;$)
   $parser->{'brace_commands'} = $simple_parser_brace_commands;
   $parser->{'valid_nestings'} = $simple_parser_valid_nestings;
   $parser->{'no_paragraph_commands'} = $simple_parser_no_paragraph_commands;
-  #$parser->{'index_names'} = $simple_parser_index_names;
-  $parser->{'index_names'} = dclone(\%index_names);
+  $parser->{'index_names'} = $simple_parser_index_names;
   $parser->{'command_index'} = $simple_parser_command_index;
   $parser->{'close_paragraph_commands'} = 
$simple_parser_close_paragraph_commands;
   $parser->{'close_preformatted_commands'} = 
$simple_parser_close_preformatted_commands;
@@ -3339,6 +3342,8 @@ sub _enter_index_entry()
 {
   my ($self, $command_container, $element, $source_info) = @_;
 
+  return if $self->{'restricted'};
+
   my $index_name = $self->{'command_index'}->{$command_container};
   my $index = $self->{'index_names'}->{$index_name};
 
diff --git a/tp/t/init/translation_in_parser_in_translation.pm 
b/tp/t/init/translation_in_parser_in_translation.pm
index 16b7d26a5a..737da26d4e 100644
--- a/tp/t/init/translation_in_parser_in_translation.pm
+++ b/tp/t/init/translation_in_parser_in_translation.pm
@@ -56,43 +56,4 @@ sub 
_texi2any_test_translation_in_parser_format_translate_message($$$;$)
 texinfo_register_formatting_function('format_translate_message',
\&_texi2any_test_translation_in_parser_format_translate_message);
 
-# there are no indices id output for the @def* commands used in Next
-# button translation, as their index information is with the tree used
-# in gdt, not with the main tree.  Output an id in any case directly
-# when formatting the definition line to check if it is translated.
-sub _texi2any_tests_def_line_show_id()
-{
-  my $self = shift;
-  my $type = shift;
-  my $element = shift;
-  my $content = shift;
-
-  my $no_unidecode;
-  $no_unidecode = 1 if (defined($self->get_conf('USE_UNIDECODE'))
-and !$self->get_conf('USE_UNIDECODE'));
-
-
-  my $region = '';
-  $region = "$element->{'extra'}->{'element_region'}-"
-if (defined($element->{'extra'}->{'element_region'}));
-
-  my $entry_reference_content_element
-= Texinfo::Common::index_content_element($element);
-
-  my $normalized_index = '';
-  if ($entry_reference_content_element) {
-$normalized_index =
- Texinfo::Convert::NodeNameNormalization::normalize_transliterate_texinfo(
-$entry_reference_content_element, $no_unidecode);
-  }
-  my $target_base = "index-" . $region .$normalized_index;
-
-
-  return "INDEXID could be: $target_base"
-   . &{$self->default_type_conversion($type)}($self, $type,
-$element, $content);
-}
-
-texinfo_register_type_formatting('def_line', 
\&_texi2any_tests_def_line_show_id);
-
 1

[PATCH 1/3] Re-add simple_parser

2023-11-25 Thread Gavin Smith

* tp/Texinfo/ParserNonXS.pm, tp/Texinfo/XS/parsetexi/Parsetexi.pm
(parser, simple_parser): Re-add simple_parser, used for document
translations.  The only difference from before is that we
do not share the indices between parsers.
* tp/Texinfo/Translations.pm (replace_convert_substrings):
Call simple_parser again.

This reverses the change on 2023-08-10.
---
 ChangeLog|  13 +++
 tp/Texinfo/ParserNonXS.pm| 114 +--
 tp/Texinfo/Translations.pm   |   2 +-
 tp/Texinfo/XS/parsetexi/Parsetexi.pm |   4 +
 4 files changed, 107 insertions(+), 26 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 318e84f2e5..b50d9056e8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,16 @@
+2023-11-25  Gavin Smith 
+
+   Re-add simple_parser
+
+   * tp/Texinfo/ParserNonXS.pm, tp/Texinfo/XS/parsetexi/Parsetexi.pm
+   (parser, simple_parser): Re-add simple_parser, used for document
+   translations.  The only difference from before is that we
+   do not share the indices between parsers.
+   * tp/Texinfo/Translations.pm (replace_convert_substrings):
+   Call simple_parser again.
+
+   This reverses the change on 2023-08-10.
+
 2023-11-24  Patrice Dumas  
 
* tp/Texinfo/Common.pm (parse_node_manual), tp/Texinfo/Convert/HTML.pm
diff --git a/tp/Texinfo/ParserNonXS.pm b/tp/Texinfo/ParserNonXS.pm
index 85b9a165bd..6a1a7ac2df 100644
--- a/tp/Texinfo/ParserNonXS.pm
+++ b/tp/Texinfo/ParserNonXS.pm
@@ -572,31 +572,8 @@ sub parser(;$$)
   my $parser = dclone(\%parser_default_configuration);
   bless $parser;
 
-  $parser->{'set'} = {};
-  if (defined($conf)) {
-foreach my $key (keys(%$conf)) {
-  if (exists($parser_settable_configuration{$key})) {
-# we keep registrar instead of copying on purpose, to reuse the object
-if ($key ne 'values' and $key ne 'registrar' and ref($conf->{$key})) {
-  $parser->{$key} = dclone($conf->{$key});
-} else {
-  $parser->{$key} = $conf->{$key};
-}
-if ($initialization_overrides{$key}) {
-  $parser->{'set'}->{$key} = $parser->{$key};
-}
-  } else {
-warn "ignoring parser configuration value \"$key\"\n";
-  }
-}
-  }
-  # restrict variables found by get_conf, and set the values to the
-  # parser initialization values only.  What is found in the document
-  # has no effect.
-  foreach my $key 
(keys(%Texinfo::Common::default_parser_customization_values)) {
-$parser->{'conf'}->{$key} = $parser->{$key};
-  }
-
+  _setup_conf($parser, $conf);
+  # This is not very useful in perl, but mimics the XS parser
   print STDERR " RESETTING THE PARSER !\n"
 if ($parser->{'DEBUG'});
 
@@ -611,6 +588,7 @@ sub parser(;$$)
   $parser->{'close_paragraph_commands'} = {%default_close_paragraph_commands};
   $parser->{'close_preformatted_commands'} = {%close_preformatted_commands};
 
+  # following is common with simple_parser
   # other initializations
   $parser->{'definfoenclose'} = {};
   $parser->{'source_mark_counters'} = {};
@@ -637,6 +615,66 @@ sub parser(;$$)
   return $parser;
 }
 
+# simple parser initialization.  The only difference with a regular parser
+# is that the dynamical @-commands groups and indices information references
+# that are initialized in each regular parser are initialized once for all
+# and shared among simple parsers.  It is used in gdt() and this has a sizable
+# effect on performance.
+my $simple_parser_line_commands = dclone(\%line_commands);
+my $simple_parser_brace_commands = dclone(\%brace_commands);
+my $simple_parser_valid_nestings = dclone(\%default_valid_nestings);
+my $simple_parser_no_paragraph_commands = {%default_no_paragraph_commands};
+my $simple_parser_index_names = dclone(\%index_names);
+my $simple_parser_command_index = {%command_index};
+my $simple_parser_close_paragraph_commands = 
{%default_close_paragraph_commands};
+my $simple_parser_close_preformatted_commands = {%close_preformatted_commands};
+sub simple_parser(;$)
+{
+  my $conf = shift;
+
+  my $parser = dclone(\%parser_default_configuration);
+  bless $parser;
+
+  _setup_conf($parser, $conf);
+  # This is not very useful in perl, but mimics the XS parser
+  print STDERR " RESETTING THE PARSER !\n"
+if ($parser->{'DEBUG'});
+
+  $parser->{'line_commands'} = $simple_parser_line_commands;
+  $parser->{'brace_commands'} = $simple_parser_brace_commands;
+  $parser->{'valid_nestings'} = $simple_parser_valid_nestings;
+  $parser->{'no_paragraph_commands'} = $simple_parser_no_paragraph_commands;
+  #$parser->{'index_names'} = $simple_parser_index_names;
+  $parser->{'index_names'} = dclone(\%index_names);
+  $parser->{'command_index'} = $simple_parser_command_index;
+  $parser-

Re: Use "simple parsers" for translations again?

2023-11-25 Thread Gavin Smith

On Sat, Nov 25, 2023 at 09:35:28AM +0100, Patrice Dumas wrote:
> On Fri, Nov 24, 2023 at 11:24:55PM +0000, Gavin Smith wrote:
> > As I understand it, it is not any part of the state that is problematic
> > to modify for simple_parser.  It is only the structures that are referred
> > to when creating the simple parser:
> 
> I think so too.
> 
> > Many of these relate to the creation of new Texinfo commands, or indices.
> > These structures should be the same for all of the parser objects used
> > for translations so copying them (with dclone) for every single translated
> > string is a waste of time.
> 
> I am not sure about that.  The underlying question is what limit we set
> on the users defined translations.  This is very unclear to me, we could
> imagine large/complex Texinfo code to be used in user defined
> translations if they have, for example, repetitive parts that they want
> to generate ahd have them different for different languages.

Should the result of customized translations be able to create index
entries?  That is what broke the "translation_in_parser_in_translation"
test with the "simple parser" patch: the customized translation entered
an index entry (with @deftypeop), which persisted into following translations. 

This then led to an infinite loop in complete_indices, as this itself
created a new parser to translate "{name} on {class}", but this new parser
referenced the same index entries which also needed translating, and so on.

On the previous patch I posted, all that is needed is the following change:

-  $parser->{'index_names'} = $simple_parser_index_names;
+  $parser->{'index_names'} = dclone(\%index_names);

- then the test suite passes.  It is still not ideal, but still worthwhile,
as execution time is still saved on the other dclone calls.

(Even without the "simple parsers", it seems that a customization of the
translations could generate such an infinite loop by making the output of
"{name} on {class}" itself create an index entry that then needed
"{name} on {class}" to be translated.  If the translation of this string
actually worked in t/init/translation_in_parser_in_translation.pm then
that would lead to an infinte loop too.)

The example in t/init/translation_in_parser_in_translation.pm is contrived
as it would make no sense to translate the string "Next" as something using
@deftypeop or @deftypemethod.

Can we simply forbid, or disable, creating index entries in parsers for
translations?  (I doubt it makes a difference as I expect that such index
entries are not actually used.)

> I think that we should not put important restrictions on translated
> strings, maybe we could forbid using @node and sectioning commands, but
> it is not clear to me that there couldn't be use cases for those.

It seems to me that it's likely that people run texi2any with XS modules
disabled (although I have no information on how prevalent this is), but
use of customization files is much rarer, and use of customization files
that customize translating strings to give bizarre translations involving
index entries (or sectioning commands...) is not done and is not likely
to be done.  We should prioritize the existing use cases rather than
rare or hypothetical use cases.

Use "simple parsers" for translations again?

2023-11-24 Thread Gavin Smith

On Sun, Nov 05, 2023 at 12:53:19PM +0100, Patrice Dumas wrote:
> Indeed, I don't know why I want it to have an effect, as it should
> not...
> 
> > If there is a real problem, then we could fix it another way, maybe
> > by "locking" the parser configuration at certain points, making any
> > changes ineffective?
> 
> There is a real problem, the state of the parsers is shared among calls
> to simple_parser.  As long as in gdt code there is only simple code it
> is not an issue, though, it only becomes an issue if the users put
> @-commands that use the conversion state in gdt.  In that regard, the
> translation_in_parser_in_translation test is quite extreme, but
> @-commands modifying and needing the state are not so uncommon, it
> includes commands that could be in translations, such as @acronym,
> @anchor and others.

As I understand it, it is not any part of the state that is problematic
to modify for simple_parser.  It is only the structures that are referred
to when creating the simple parser:

  $parser->{'line_commands'} = $simple_parser_line_commands;
  $parser->{'brace_commands'} = $simple_parser_brace_commands;
  $parser->{'valid_nestings'} = $simple_parser_valid_nestings;
  $parser->{'no_paragraph_commands'} = $simple_parser_no_paragraph_commands;
  $parser->{'index_names'} = $simple_parser_index_names;
  $parser->{'command_index'} = $simple_parser_command_index;
  $parser->{'close_paragraph_commands'} = 
$simple_parser_close_paragraph_commands;
  $parser->{'close_preformatted_commands'} = 
$simple_parser_close_preformatted_commands;

Many of these relate to the creation of new Texinfo commands, or indices.
These structures should be the same for all of the parser objects used
for translations so copying them (with dclone) for every single translated
string is a waste of time.

It seems that we could stop these structures being modified with the
addition of a flag on the parser (e.g. $parser->{'locked'} or
$parser->{'simple'}).  Then check this flag any time:

* A new Texinfo command is to be defined (@macro, @alias, @definfoenclose).
* A new index is to be added (@defindex, @defcodeindex).

Then the change to the state wouldn't be made if the flag was set, and
possibly a warning message could be given too.

I can try to work on a patch for this if this approach sounds like a
good idea.

> I think that the approach of having shared state is both unusual and not
> correct. Another possibility, which could be better and allow some gains
> too would be to reuse a parser instead without initializing it.  It
> would still maybe need some more cleanup of state between runs, maybe
> with a separate function that would do nothing in XS, such that it does
> not need to be run when the parser is not reused.  This would also
> probably be more similar to what should be the final XS implementation.

Reusing the same parser object, with cleanup in between, seems to me to
be even less correct as it means that the whole parser object is being
shared across translations, rather than just certain parts of it, making
it more likely that something persists between translations (e.g. @anchor
locations).

> 
> > I would like to commit this, except that the
> > "perl -w t/init_files_tests.t translation_in_parser_in_translation"
> > test breaks.
> 
> It will break other redefinitions of translated strings.

I've still not had the time to investigate what is going on with this
test.  Hopefully I'll get to it this weekend.

In general, it seems like we're making the program significantly slower
for a theoretic and very unlikely use case.

Re: extra_integer associated type: long or int?

2023-11-24 Thread Gavin Smith

On Fri, Nov 24, 2023 at 10:07:30PM +0100, Patrice Dumas wrote:
> Hello,
> 
> The extra_integer associated type is used for extra information in
> Texinfo tree elements in C that correspond to integers.  Currently the
> associated type is long, but in most cases, it is used as an int.  I
> think that it should be homogeneous, both storage and use should use the
> same type.  Should it be better to use long or int? 
> 
> (Note that previously there were casts from/to intptr_t, so homogeneity
> was not possible anyway but it is not like that anymore).

I think an int is better (not that it matters that much).  It is only
used for very small integers, usually no more than about 10, as you know.

Re: CC and CFLAGS are ignored by part of the build

2023-11-22 Thread Gavin Smith

On Tue, Nov 14, 2023 at 08:40:54AM +0100, Patrice Dumas wrote:
> The other is that PERL_CONF_optimize is used for AM_CFLAGS
> (XSLIBS_CFLAGS in the development version), the objective being that the
> same flags as those used to build perl are used for perl XS modules.  I
> guess that these flags could be overriden by setting another flag, but
> will still be on the command line.

I checked that the flags are overridden.  $(CFLAGS) appears on the command
line after $(AM_CFLAGS) (or the equivalent target-specific variable).
Optimisation flags given in PERL_EXT_CFLAGS will override any in e.g.
XSParagraph_la_CFLAGS.  (See Info node (automake)Flag Variables Ordering.)

When running "make", it is CFLAGS instead, which was set from PERL_EXT_CFLAGS
in configure, so you can do 'make XSParagraph_la-XSParagraph.lo CFLAGS=-g'.

(The GNU Coding Standards say that overriding CFLAGS when running make
should be possible.)

Re: CC and CFLAGS are ignored by part of the build

2023-11-22 Thread Gavin Smith

On Tue, Nov 14, 2023 at 02:22:00PM +0200, Eli Zaretskii wrote:
> AFAIU, that's impossible in general, because CFLAGS could include
> flags that cannot be applied to both CC and PERL_CONF_cc due to
> compatibility issues, since Perl could have been built using a very
> different compiler.
> 
> IMNSHO, it isn't a catastrophe that compiling Perl extensions needs a
> separate C flags variable.  It is basically similar to CFLAGS and
> CXXFLAGS being separate for building the same project (which happens
> inj practice, for example, in GDB, which is part C and part C++).  And
> if the GCS doesn't cater for these (relatively rare and specialized)
> situations, then I think the GCS needs to be amended.  There's no need
> to be dogmatic about this.
> 

These variables were mentioned in the output of ./configure --help
but we never documented anywhere why they were needed, as far as I could
tell.  I've added text to the INSTALL file:

+  When `configure' is running in the XS subdirectory, instead of the
+  standard CC, CFLAGS, LDFLAGS etc., it uses special variables with a
+  PERL_EXT_ prefix.  These are all listed in the output of `configure
+  --help'.  This is necessary because it is possible that the C compiler
+  being used to compile Perl extension modules is a different compiler
+  to that used for the rest of the package.  (However, if you need to
+  override these variables when running `make', use the unprefixed
+  variants, e.g. CFLAGS instead of PERL_EXT_CFLAGS.  Exception: you
+  cannot override CPPFLAGS this way as gnulib uses this variable.)

XS build flags

2023-11-20 Thread Gavin Smith

On Thu, Sep 21, 2023 at 06:06:34PM +0200, Patrice Dumas wrote:
> On Wed, Sep 20, 2023 at 08:44:34PM +0100, Gavin Smith wrote:
> > One change at a time please.  Any gnulib tests should be run
> > with the same compiler flags as used to compile the shared library.
> 
> On that subject, and independently of the main issue of CPPFLAGS, this
> is not what is actually done.  In configure.ac, we have
> 
> LDFLAGS="$PERL_EXT_LDFLAGS $PERL_CONF_ccdlflags $PERL_CONF_cccdlflags"
> 
> In Makefile.am, the equivalent part is (rewritten for brievity)
> 
> AM_LDFLAGS = -avoid-version -module $(PERL_CONF_ccdlflags)
> if HOST_NEEDS_NO_UNDEFINED
>   AM_LDFLAGS += -no-undefined -L$(PERL_INC) $(PERL_CONF_libperl)
> endif
> 
> So, there are additional flags if HOST_NEEDS_NO_UNDEFINED, but there is
> no use of $PERL_CONF_cccdlflags.  According to my reading of the
> documentation, it is actually ok not to have $PERL_CONF_cccdlflags
> in AM_LDFLAGS as it is a CFLAGS, it should be in AM_CFLAGS if it is
> kept (and I think it should), and should in any case be removed from
> configure.ac LDFLAGS.

I see you made some changes to this (on 2023-09-24).

I've just made a further change (121d5104101) to make it clearer that
the same flags are used in both configure.ac and in Makefile.am:

Author: Gavin Smith 
Date:   Mon Nov 20 20:52:51 2023 +

XS build flags

* tp/Texinfo/XS/configure.ac (perl_conf_CFLAGS, perl_conf_LDFLAGS):
Set and output with AC_SUBST.  Append to CFLAGS and LDFLAGS for
checks only.
* tp/Texinfo/XS/Makefile.am: Reuse perl_conf_CFLAGS and
perl_conf_LDFLAGS.  This makes it clearer that we are adding the
same flags for building as for configure checks.

We could check if -avoid-version -module should be used for configure
checks too (likewise -no-undefined etc.) and if so, put them in
perl_conf_LDFLAGS the same way.

> There is another inconsistency between Makefile.am and configure.ac,
> in configure.ac CFLAGS is:
> CFLAGS="$PERL_EXT_CFLAGS $PERL_CONF_ccflags"
> 
> In Makefile.am, we have
> AM_CFLAGS = $(PERL_CONF_ccflags) $(PERL_CONF_optimize)
> Such that, to be consistent, in configure.ac CFLAGS should be
> CFLAGS="$PERL_EXT_CFLAGS $PERL_CONF_ccflags $PERL_CONF_optimize"

I think you did this.

> (and also, but it does not seems to me to be important)
> AM_CFLAGS += -DVERSION=\"$(VERSION)\" -DXS_VERSION=\"$(VERSION)\"
> AM_CFLAGS += -I$(PERL_INC)

Possibly not important for the checks, but it could potentially be added
as well if needed (the -I flag may make a difference even if the -D flags
don't).  I see you put this in XSLIBS_CPPFLAGS.  We could have a similar
approach and add an extra variable to pass the extra flags from
configure.ac to Makefile.am.  But I would want to understand whether these
flags might be important for configure checks first.

Re: c32width gives incorrect return values in C locale

2023-11-18 Thread Gavin Smith

On Wed, Nov 15, 2023 at 09:42:21AM +0100, Patrice Dumas wrote:
> On Wed, Nov 15, 2023 at 12:22:24AM -0800, Paul Eggert wrote:
> > On 2023-11-13 01:28, Patrice Dumas wrote:
> > > According to your mail
> > > https://lists.gnu.org/archive/html/bug-libunistring/2023-11/msg0.html
> > > 
> > > char32_t is less portable
> > 
> > That should be OK, if Gnulib provides a char32_t substitute that works well
> > enough. The mail you refer to merely says that literals like U'x' don't
> > work, but this is not a show-stopper for char32_t.
> 
> Indeed, the solaris10 automatic build now compiles ok with char32_t with
> Gnulib uchar after the changes Gavin made.

Is this the OpenCSW buildbot?  How are you checking this?

Everytime I have checked

https://buildfarm.opencsw.org/buildbot/waterfall?tag=texinfo

recently, I have a page with a bunch of error messages on it:


web.Server Traceback (most recent call last):
twisted.internet.defer.FirstError: FirstError[#0, [Failure instance: Traceback: 
: (OperationalError) unable to open 
database file None None /opt/csw/lib/python2.7/threading.py:774:__bootstrap 
/opt/csw/lib/python2.7/threading.py:801:__bootstrap_inner 
/opt/csw/lib/python2.7/threading.py:754:run ---  --- 
/opt/csw/lib/python2.7/site-packages/twisted/python/threadpool.py:191:_worker 
/opt/csw/lib/python2.7/site-packages/twisted/python/context.py:118:callWithContext
 
/opt/csw/lib/python2.7/site-packages/twisted/python/context.py:81:callWithContext
 /opt/csw/lib/python2.7/site-packages/buildbot/db/pool.py:185:__thd 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/engine/threadlocal.py:61:contextual_connect
 /opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:272:connect 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:431:__init__ 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:867:_do_get 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:225:_create_connection 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:318:__init__ 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:379:__connect 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py:80:connect 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/engine/default.py:283:connect ]]
twisted.internet.defer.FirstError: FirstError[#0, [Failure instance: Traceback: 
: (OperationalError) unable to open 
database file None None /opt/csw/lib/python2.7/threading.py:774:__bootstrap 
/opt/csw/lib/python2.7/threading.py:801:__bootstrap_inner 
/opt/csw/lib/python2.7/threading.py:754:run ---  --- 
/opt/csw/lib/python2.7/site-packages/twisted/python/threadpool.py:191:_worker 
/opt/csw/lib/python2.7/site-packages/twisted/python/context.py:118:callWithContext
 
/opt/csw/lib/python2.7/site-packages/twisted/python/context.py:81:callWithContext
 /opt/csw/lib/python2.7/site-packages/buildbot/db/pool.py:185:__thd 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/engine/threadlocal.py:61:contextual_connect
 /opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:272:connect 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:431:__init__ 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:867:_do_get 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:225:_create_connection 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:318:__init__ 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/pool.py:379:__connect 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py:80:connect 
/opt/csw/lib/python2.7/site-packages/sqlalchemy/engine/default.py:283:connect ]]

Apparent memory leak from call to switch_to_global_locale

2023-11-18 Thread Gavin Smith

valgrind was reporting a memory leak in texi2any, like:

==3404772== 364 bytes in 1 blocks are definitely lost in loss record 128 of 151
==3404772==at 0x4848899: malloc (in 
/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3404772==by 0x498D11F: newlocale (newlocale.c:199)
==3404772==by 0x2B6D32: ??? (in /usr/bin/perl)
==3404772==by 0x2B79BF: Perl_set_numeric_standard (in /usr/bin/perl)
==3404772==by 0x2BA4DE: Perl_setlocale (in /usr/bin/perl)
==3404772==by 0x61BF01B: ??? (in 
/usr/lib/x86_64-linux-gnu/perl-base/auto/POSIX/POSIX.so)
==3404772==by 0x21BAD6: Perl_pp_entersub (in /usr/bin/perl)
==3404772==by 0x203AC5: Perl_runops_standard (in /usr/bin/perl)
==3404772==by 0x17B918: Perl_call_sv (in /usr/bin/perl)
==3404772==by 0x17CA81: Perl_call_list (in /usr/bin/perl)
==3404772==by 0x167018: ??? (in /usr/bin/perl)
==3404772==by 0x168BEA: Perl_newATTRSUB_x (in /usr/bin/perl)

This was with perl 5.34.0 for x86_64-linux-gnu-thread-multi (on
Linux Mint 21.2).

This backtrace didn't show where the problem was, but by a process of
trial and error, inserting croak statements into the XS code at various
places, I tracked it down to the call to switch_to_global_locale in
translate_string in main/translations.c.

diff --git a/tp/Texinfo/XS/main/translations.c 
b/tp/Texinfo/XS/main/translations.c
index de4ed82e03..4659da9afa 100644
--- a/tp/Texinfo/XS/main/translations.c
+++ b/tp/Texinfo/XS/main/translations.c
@@ -185,6 +185,8 @@ translate_string (OPTIONS *options, const char * string,
  uses per thread locale, the result is unpredictable.  So we switch to
  global locales as we use setlocale */
   call_switch_to_global_locale ();
+  croak_message ("after switch_to_global_locale");
+
 
   /*  
   We need to set LC_MESSAGES to a valid locale other than "C" or "POSIX"

Adding the line there gives the leak, before call_switch_to_global_locale
doesn't.

(croak rather than exit, etc. as it allows Perl to clean up its allocations,
especially with a "destruct level" set)

There may have been a similar leak report made when xspara.c used to
call switch_to_global_locale.

I haven't tested it with any newer versions of Perl, but at least this
doesn't appear to be a bug in texi2any's code.  I am going to try to
add this to the suppressions file tp/texi2any.supp because we have
investigated it, so valgrind remains useful for checking for memory
leaks that we do have to fix.

(It could have been easier to track down if I compiled Perl with some
debugging information, I don't know.)

Re: obstack module has alignment issues on sparc? (Re: set_labels_identifiers_target -fsanitize=undefined error)

2023-11-15 Thread Gavin Smith

On Tue, Nov 14, 2023 at 10:16:33PM +0100, Bruno Haible wrote:
> Sam James wrote:
> > > It appears that the obstack gnulib module is the culprit.
> 
> I replied:
> > Therefore, if it is a bug in gnulib, it is also a bug in glibc.
> 
> Sam was right. I was wrong. It is a bug in the 'obstack' gnulib module.

It is surprising it worked as well as it did with mixing the two different
struct definitions.

Thanks for the detailed investigation.  Will this be fixed in gnulib at
some point?

>  The root cause
>  --
> 
> Gnulib generally uses idioms for overriding functions that are safe to use
> in shared libraries and will avoid collisions. This is the business with
>   REPLACE_FOO=1
> and
>   #define foo rpl_foo
> and so on.
> 
> But the Gnulib module 'obstack' has never been updated to use these idioms.
> It is still at the state of 1997 and uses a clunky _OBSTACK_INTERFACE_VERSION
> mechanism.

I get the impression it's not a very frequently used part of glibc.

> --- texinfo-7.1/tp/Texinfo/XS/gnulib/lib/obstack.h.bak2023-08-13 
> 22:10:03.0 +0200
> +++ texinfo-7.1/tp/Texinfo/XS/gnulib/lib/obstack.h2023-11-14 
> 20:30:55.584463250 +0100
> @@ -164,6 +164,12 @@
>  # endif
>  #endif
>  
> +#define _obstack_begin rpl_obstack_begin
> +#define _obstack_newchunk rpl_obstack_newchunk
> +#define _obstack_allocated_p rpl_obstack_allocated_p
> +#define _obstack_free rpl_obstack_free
> +#define _obstack_memory_used rpl_obstack_memory_used
> +
>  #ifdef __cplusplus
>  extern "C" {
>  #endif

That's great, I'll apply this patch to the release branch and revert
the other change (using obstack_alignment_mask).

Re: set_labels_identifiers_target -fsanitize=undefined error

2023-11-13 Thread Gavin Smith

On Sun, Nov 12, 2023 at 12:41:58PM +0100, John Paul Adrian Glaubitz wrote:

> > diff tree.c.old tree.c -u
> > --- tree.c.old  2023-11-04 16:15:13.632755680 +
> > +++ tree.c  2023-11-04 16:16:36.211072521 +
> > @@ -43,7 +43,10 @@
> >if (obs_element_first)
> >  obstack_free (_element, obs_element_first);
> >else
> > -obstack_init (_element);
> > +{
> > +  obstack_alignment_mask (_element) = 7; /* 8-byte alignment */
> > +  obstack_init (_element);
> > +}
> >  
> >obs_element_first = obstack_alloc (_element, sizeof (int));
> > 
> > 
> > Can you check if that works?
> 
> Yes, I can confirm that this patch fixes the crash for me.
> 
> Would be great if this fix could be included for the next release!

I've added it to the release branch so if there is ever a Texinfo 7.1.1
release, it will be included.  In the current development code, there
are significant changes and obstacks don't appear to be used at all,
making it a moot point.

The problem should probably to be reported to the gnulib developers to
investigate why incorrectly aligned memory was being returned.

Re: c32width gives incorrect return values in C locale

2023-11-13 Thread Gavin Smith

On Mon, Nov 13, 2023 at 10:28:35AM +0100, Patrice Dumas wrote:
> According to your mail
> https://lists.gnu.org/archive/html/bug-libunistring/2023-11/msg0.html
> 
> char32_t is less portable, and indeed, the solaris automatic build sees
> to fail because of char32_t is not available, while uint32_t and ucs4_t
> are necessarily present with libunistring unless I issed something.  So
> it would probably be more portable to use uint32_t/ucs4_t, though it may
> be less practical if a conversion is needed before they can be used.

I've imported the "uchar" gnulib module so that the char32_t type will
be defined in uchar.h.  "uchar" seems significantly less bloated that
"uchar-c23" and not sure if "uchar-c23" is necessary for our purposes.

Re: c32width gives incorrect return values in C locale

2023-11-12 Thread Gavin Smith

On Sat, Nov 11, 2023 at 11:54:52PM +0100, Bruno Haible wrote:
> [CCing bug-libunistring]
> Gavin Smith wrote:
> > I did not understand why uc_width was said to be "locale dependent":
> > 
> >   "These functions are locale dependent."
> > 
> > - from 
> > <https://www.gnu.org/software/libunistring/manual/html_node/uniwidth_002eh.html#index-uc_005fwidth>.
> 
> That's because some Unicode characters have "ambiguous width" — width 1 in
> Western locales, width 2 is East Asian locales (for historic and font choice
> reasons).
> 
> > I also don't understand the purpose of the "encoding" argument -- can this
> > always be "UTF-8"?
> 
> Yes, it can be always "UTF-8"; then uc_width will always choose width 1 for
> these characters.
> 
> > I'm also unclear on the exact relationship between the types char32_t,
> > ucs4_t and uint32_t.  For example, uc_width takes a ucs4_t argument
> > but u8_mbtouc writes to a char32_t variable.  In the code I committed,
> > I used a cast to ucs4_t when calling uc_width.
> 
> These types are all identical. Therefore you don't even need to cast.
> 
>   - char32_t comes from  (ISO C 11 or newer).
>   - ucs4_t comes from GNU libunistring.
>   - uint32_t comes from .

Thanks for the advice.

PACKAGE_VERSION should not be renamed

2023-11-12 Thread Gavin Smith

On Sun, Nov 05, 2023 at 12:13:00PM +0100, Arsen Arsenović wrote:
> Oh, interesting.  On my system, I'm using texinfo built from master, and
> it completely lacks PACKAGE_VERSION!  (if I rebuild from releases/7.1,
> it's present).  It looks like it got removed in
> 6a1a78b941a832033b8939ab4cdce1646dd56904.

It's still stated as PACKAGE_VERSION in doc/texinfo.texi, even though it's
mentioned in NEWS now.  If the customization variable is changing name,
then the manual should be updated as well.

However, I doubt that it is justified to rename it.  We know it is an
option, so what is the point of putting _OPTION on the end to get
PACKAGE_VERSION_OPTION?

Especially since it is used as part of the cutomization API, it
seems that this should be a stable option.  Here is a link to a patch
to fix building ffmpeg docs:

http://ffmpeg.org/pipermail/ffmpeg-devel/2023-November/316605.html

Including version checks using PACKAGE_VERSION:

   # determine if texinfo is at least version 6.8
  -my $program_version_num = 
version->declare(get_conf('PACKAGE_VERSION'))->numify;
  +my $program_version_num = 
version->declare(ff_get_conf('PACKAGE_VERSION'))->numify;
   my $program_version_6_8 = $program_version_num >= 6.008000;

   # print the TOC where @contents is used
   if ($program_version_6_8) {
  -set_from_init_file('CONTENTS_OUTPUT_LOCATION', 'inline');
  +ff_set_from_init_file('CONTENTS_OUTPUT_LOCATION', 'inline');
   } else {
  -set_from_init_file('INLINE_CONTENTS', 1);
  +ff_set_from_init_file('INLINE_CONTENTS', 1);
   }

If the customization API is likely to break easily (and speaking generally,
it seems to me that any API like this is likely to, and the more it is
integrated into the "guts" of the program, the more likely it is to, although
I am not very familiar with the texi2any customization API), then at
least PACKAGE_VERSION gives packages a chance to adapt, so should be stable
if nothing else is.

In commit 6a1a78b941a832033b (2023-09-13) (*), it is said to avoid
clashing with macros defined in config.h.  This is the details of the
implementation and something we could work around, rather than changing
the user-visible interface.  It's currently used in the definition of
the OPTIONS type in tp/Texinfo/XS/main/options_types.h.

This could be dealt with in a variety of ways.  The most straightforward
would be to "#undef PACKAGE_VERSION" at the beginning of option_types.h.
That way, any source code file using the OPTIONS type and including
this file will not have that preprocessor definition active throughout
the file.

We do not use the PACKAGE_VERSION from the build system for the XS code
and it simply has the value "0".  (In the top-level build system, it is
the version of the package, currently "7.1dev".)

It's also not necessary that we use the symbol PACKAGE_VERSION in the XS/C
code to refer to this option, so could be renamed to o_PACKAGE_VERSION
as a special case if the #undef option does not work or is not appropriate
for some reason.

(*) My preference is to always state at least the date as well as the
git commit ID -- that way, if we ever move to a version control system
other than git, then somebody reading the mailing list archives will
find it easier to find which commit it is.  It's useful information to
have, in any case.

Re: c32width gives incorrect return values in C locale

2023-11-11 Thread Gavin Smith

On Sat, Nov 11, 2023 at 09:06:41PM +0100, Bruno Haible wrote:
> [CCing bug-gnulib]
> Indeed, the c32* functions by design work only on those Unicode characters
> that can be represented as multibyte sequences in the current locale.
> 
> I'll document this better in the Gnulib manual.
> 
> Since you want texinfo to work on UTF-8 encoded text with characters outside
> the repertoire of the current locale, you'll need the libunistring functions,
> documented in
> .
> Namely, replace c32width with uc_width.

Thanks, that seems to work perfectly.

I also changed c32isupper to uc_is_upper.  The gnulib manual stated
(node "isupper"):

  ‘c32isupper’
   This function operates in a locale dependent way, on 32-bit wide
   characters.  In order to use it, you first have to convert from
   multibyte to 32-bit wide characters, using the ‘mbrtoc32’ function.
   It is provided by the Gnulib module ‘c32isupper’.

  ...

  ‘uc_is_upper’
   This function operates in a locale independent way, on Unicode
   characters.  It is provided by the Gnulib module
   ‘unictype/ctype-upper’.

- and we wanted the "locale independent way".

I did not understand why uc_width was said to be "locale dependent":

  "These functions are locale dependent."

- from 
.

I also don't understand the purpose of the "encoding" argument -- can this
always be "UTF-8"?

I'm also unclear on the exact relationship between the types char32_t,
ucs4_t and uint32_t.  For example, uc_width takes a ucs4_t argument
but u8_mbtouc writes to a char32_t variable.  In the code I committed,
I used a cast to ucs4_t when calling uc_width.

c32width gives incorrect return values in C locale

2023-11-11 Thread Gavin Smith

On Fri, Nov 10, 2023 at 07:39:43PM +, Gavin Smith wrote:
> Is the expected output
> 
>å å (å) Å Å (Å) æ æ (æ) œ œ (œ) Æ Æ (Æ) Œ Œ (Œ) ø ø (ø) Ø Ø (Ø) ß ß (ß)
> 
> (width 74) or
> 
>@aa å (å) @AA Å (Å) @ae æ (æ) @oe œ (œ) @AE Æ (Æ) @OE Œ (Œ) @o ø (ø) @O Ø 
> (Ø) @ss ß (ß)
> 
> (width 90)?
> 
> I guess you will need to look at the Unicode characters that you pass to 
> c32width,
> and whether you get return values < 1 for some of them.

It is locale-dependent!

It looks like c32width is simply being redirected to wcwidth which then
doesn't work properly with LC_ALL=C.  This is from the gnulib module
c32width.

I don't know if there is an easy way to make a self-contained example
to show the difference, because it needs all the gnulib Makefile machinery,
but the difference shows up for any non-ASCII character.  If I add a line
like

 fprintf (stderr, "width of [%4.0lx] is %d (remaining %s)\n",
(long) wc, width, q);

in the right place in the code, where width is the result of c32width,
then the output looks like

width of [  40] is 1 (remaining @)
width of [  4f] is 1 (remaining OE )
width of [  45] is 1 (remaining E )
width of [ 152] is -1 (remaining Œ)
width of [  28] is 1 (remaining (Œ)

for LC_ALL=C, but

width of [  40] is 1 (remaining @)
width of [  4f] is 1 (remaining OE )
width of [  45] is 1 (remaining E )
width of [ 152] is 1 (remaining Œ)
width of [  28] is 1 (remaining (Œ)

otherwise (LC_ALL=en_GB.UTF-8).

Should this be reported as a bug to bug-gnulib or bug-libunistring?

In the context of the input from the test, the following is the contents
of a a simplified test file "test.texi":

@@aa @aa{} (å)
@@AA @AA{} (Å)
@@ae @ae{} (æ)
@@oe @oe{} (œ)
@@AE @AE{} (Æ)
@@OE @OE{} (Œ)
@@o @o{} (ø)
@@O @O{} (Ø)
@@ss @ss{} (ß)
@@l @l{} (ł)
@@L @L{} (Ł)
@@DH @DH{} (Ð)
@@TH @TH{} (Þ)
@@dh @dh{} (ð)
@@th @th{} (þ)

Then, in a UTF-8 locale:

$ ../tp/texi2any.pl test.texi && cat test.info
test.texi: warning: document without nodes
This is test.info, produced by texi2any version 7.1dev+dev from
test.texi.

@aa å (å) @AA Å (Å) @ae æ (æ) @oe œ (œ) @AE Æ (Æ) @OE Œ (Œ) @o ø (ø) @O
Ø (Ø) @ss ß (ß) @l ł (ł) @L Ł (Ł) @DH Ð (Ð) @TH Þ (Þ) @dh ð (ð) @th þ
(þ)

Tag Table:

End Tag Table

Local Variables:
coding: utf-8
End:

However:

$ LC_ALL=C ../tp/texi2any.pl test.texi && cat test.info
test.texi: warning: document without nodes
This is test.info, produced by texi2any version 7.1dev+dev from
test.texi.

@aa å (å) @AA Å (Å) @ae æ (æ) @oe œ (œ) @AE Æ (Æ) @OE Œ (Œ) @o ø (ø) @O Ø (Ø) 
@ss ß (ß) @l
ł(ł) @L Ł (Ł) @DH Ð (Ð) @TH Þ (Þ) @dh ð (ð) @th þ (þ)

Tag Table:

End Tag Table

Local Variables:
coding: utf-8
End:

In the later case, it is a much longer line.

Re: Locale-independent paragraph formatting

2023-11-10 Thread Gavin Smith

On Fri, Nov 10, 2023 at 08:47:10AM +0200, Eli Zaretskii wrote:
> > Does anybody know if we could just write 'a' instead of U'a' and rely
> > on it being converted?
> > 
> > E.g. if you do
> > 
> > char32_t c = 'a';
> > 
> > then afterwards, c should be equal to 97 (ASCII value of 'a').
> 
> Why not?  What could be the problems with using this?

I think what was confusing me was the statement that char32_t held a UTF-32
encoded Unicode character.  I then thought it would have a certain byte
order, so if the UTF-32 was big endian, the bytes would have the order
00 00 00 61, whereas the value 97 on a little endian machine would have
the order 61 00 00 00.  However, it seems that UTF-32 just means the
codepoint is encoded as a 32-bit integer, and the endianness of the
UTF-32 sequence can be assumed to match the endianness of the machine.
The standard C integer conversions can be assumed to work when assigning
to/from char32_t because it is just an integer type, I assume.

Fwd: Locale-independent paragraph formatting

2023-11-10 Thread Gavin Smith

Forwarding reply from Bruno as I accidentally dropped the list when
replying.

- Forwarded message from Bruno Haible  -

Date: Fri, 10 Nov 2023 03:27:14 +0100
From: Bruno Haible 
To: Gavin Smith 
Subject: Re: Locale-independent paragraph formatting [was Re: Texinfo 7.0.93
pretest available]

Hi Gavin,

> Some of the changes look incorrect, e.g.
> 
> --- a/tp/tests/coverage/res_parser/formatting_info/formatting.info
> +++ b/tp/tests/coverage/res_parser/formatting_info/formatting.info
> @@ -55,16 +55,14 @@ Now !  !@ but , ,
>  
> @point ⋆ (⋆) @print ⊣ (⊣) @result ⇒ (⇒) @today a sunny day
>  
> -   @aa å (å) @AA Å (Å) @ae æ (æ) @oe œ (œ) @AE Æ (Æ) @OE Œ (Œ) @o ø (ø)
> -@O Ø (Ø) @ss ß (ß) @l ł (ł) @L Ł (Ł) @DH Ð (Ð) @TH Þ (Þ) @dh ð (ð) @th þ
> -(þ)
> +   @aa å (å) @AA Å (Å) @ae æ (æ) @oe œ (œ) @AE Æ (Æ) @OE Œ (Œ) @o ø (ø) @O Ø 
> (Ø
> ) @ss ß (ß)
> +@l ł (ł) @L Ł (Ł) @DH Ð (Ð) @TH Þ (Þ) @dh ð (ð) @th þ (þ)
>  
> 
> where the line in the new output is too long.

Is the expected output

   å å (å) Å Å (Å) æ æ (æ) œ œ (œ) Æ Æ (Æ) Œ Œ (Œ) ø ø (ø) Ø Ø (Ø) ß ß (ß)

(width 74) or

   @aa å (å) @AA Å (Å) @ae æ (æ) @oe œ (œ) @AE Æ (Æ) @OE Œ (Œ) @o ø (ø) @O Ø 
(Ø) @ss ß (ß)

(width 90)?

I guess you will need to look at the Unicode characters that you pass to 
c32width,
and whether you get return values < 1 for some of them.

Bruno

- End forwarded message -

Re: Task-completion reporting does not reflect -o argument

2023-11-10 Thread Gavin Smith

On Thu, Nov 09, 2023 at 04:12:21PM -0500, Jonathan Dushoff wrote:
> Thanks for responding!
> 
> etex, pdftex and pdflatex all seem to take the jobname argument, which
> may be what's wanted. Alternatively, you could consider mentioning
> briefly in the documentation that -o simply moves the output file.

I've added an extra message in a recent commit.  I don't think it's
worth mentioning in the documentation how it works.  -jobname would
be possible but I don't want to mess with this any more than I need
to.  I don't find texi2dvi a particular easy program to work on (I'm
probably partially to blame here) and I don't want to risk breaking
some rare use case.

Re: Locale-independent paragraph formatting

2023-11-09 Thread Gavin Smith

(Reply accidentally not sent to the list.)

- Forwarded message from Gavin Smith  -

Date: Thu, 9 Nov 2023 21:25:28 +
From: Gavin Smith 
To: Bruno Haible 
Subject: Re: Locale-independent paragraph formatting [was Re: Texinfo 7.0.93
pretest available]

On Thu, Nov 09, 2023 at 08:11:59PM +, Gavin Smith wrote:
> On Tue, Oct 10, 2023 at 07:29:15PM +0200, Bruno Haible wrote:
> > Given that the only encoding you want to deal with is UTF-8, Eli's 
> > suggestion
> > to use GNU libunistring is better than my iconv() suggestion. It has 
> > functions
> > for width determination:
> > https://www.gnu.org/software/libunistring/manual/html_node/uniwidth_002eh.html
> > 
> > > but I doubt it is urgent to do before the release, as the current 
> > > approach,
> > > however flawed, has been in place and worked fairly well for a long time
> > > (since the XS paragraph module was written).
> > 
> > Well, it does not work on Windows.
> > 
> > I agree with you that it's not urgent to do before the 7.1 release, since
> > the Windows port is work-in-progress.
> 
> I have just pushed a commit (e3a28cc9bf) to use gnulib/libunistring
> functions instead of the locale-dependent functions mbrtowc and wcwidth.
> This allows for a significant simplification as we do not have to try
> to switch to a UTF-8 encoded locale.

I have realised this changes the test results under tp/tests.  I haven't
updated those test results yet, but am happy for this commit to be reverted
until I investigate the changes.  Some of the changes look incorrect, e.g.

--- a/tp/tests/coverage/res_parser/formatting_info/formatting.info
+++ b/tp/tests/coverage/res_parser/formatting_info/formatting.info
@@ -55,16 +55,14 @@ Now !  !@ but , ,
 
@point ⋆ (⋆) @print ⊣ (⊣) @result ⇒ (⇒) @today a sunny day
 
-   @aa å (å) @AA Å (Å) @ae æ (æ) @oe œ (œ) @AE Æ (Æ) @OE Œ (Œ) @o ø (ø)
-@O Ø (Ø) @ss ß (ß) @l ł (ł) @L Ł (Ł) @DH Ð (Ð) @TH Þ (Þ) @dh ð (ð) @th þ
-(þ)
+   @aa å (å) @AA Å (Å) @ae æ (æ) @oe œ (œ) @AE Æ (Æ) @OE Œ (Œ) @o ø (ø) @O Ø (Ø
) @ss ß (ß)
+@l ł (ł) @L Ł (Ł) @DH Ð (Ð) @TH Þ (Þ) @dh ð (ð) @th þ (þ)
 

where the line in the new output is too long.

- End forwarded message -

Locale-independent paragraph formatting [was Re: Texinfo 7.0.93 pretest available]

2023-11-09 Thread Gavin Smith

(Reply accidentally not sent to list.)

- Forwarded message from Gavin Smith  -

Date: Thu, 9 Nov 2023 20:11:59 +
From: Gavin Smith 
To: Bruno Haible 
Subject: Locale-independent paragraph formatting [was Re: Texinfo 7.0.93
pretest available]

On Tue, Oct 10, 2023 at 07:29:15PM +0200, Bruno Haible wrote:
> Given that the only encoding you want to deal with is UTF-8, Eli's suggestion
> to use GNU libunistring is better than my iconv() suggestion. It has functions
> for width determination:
> https://www.gnu.org/software/libunistring/manual/html_node/uniwidth_002eh.html
> 
> > but I doubt it is urgent to do before the release, as the current approach,
> > however flawed, has been in place and worked fairly well for a long time
> > (since the XS paragraph module was written).
> 
> Well, it does not work on Windows.
> 
> I agree with you that it's not urgent to do before the 7.1 release, since
> the Windows port is work-in-progress.

I have just pushed a commit (e3a28cc9bf) to use gnulib/libunistring
functions instead of the locale-dependent functions mbrtowc and wcwidth.
This allows for a significant simplification as we do not have to try
to switch to a UTF-8 encoded locale.

I was not sure about how to put a char32_t literal in the source code.
For example, where we previously had L'a' as a literal wchar_t letter 'a',
I changed this to U'a'.  I could not find very much information about this
online or whether this would be widely supported by C compilers.  The U prefix
for char32_t is mentioned in a copy of the C11 standard I found online and
also in a C23 draft.

Section 6.4.4.4 "Character constants" (page 67):

A wide character constant prefixed by the letter L has type wchar_t,
an integer type defined in the  header; a wide character
constant prefixed by the letter u or U has type char16_t or char32_t,
respectively, unsigned integer types defined in the  header.

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf

The "uchar" or "uchar-c23" modules are supposed to provide uchar.h on
platforms where it doesn't exist, but I highly doubt there is any
mechanism for providing a new character literal syntax if not supported
by the compiler.

Does anybody know if we could just write 'a' instead of U'a' and rely
on it being converted?

E.g. if you do

char32_t c = 'a';

then afterwards, c should be equal to 97 (ASCII value of 'a').

- End forwarded message -

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2302 matches

Mail list logo