Hi,

Thank you for looking into this!

On Mon, 16 Feb 2026 at 23:38, Andres Freund <[email protected]> wrote:
>
> Hi,
>
> On 2026-02-16 10:39:06 +0300, Nazir Bilal Yavuz wrote:
> >  # Run validation only once, common to all subsequent targets.  While
> >  # we're at it, also resolve all entities (that is, copy all included
> >  # files into one big file).  This helps tools that don't understand
> > @@ -108,12 +128,15 @@ postgres_full_xml = custom_target('postgres-full.xml',
> >    depfile: 'postgres-full.xml.d',
> >    command: [xmllint, '--nonet', '--noent', '--valid',
> >              '--path', '@OUTDIR@', '-o', '@OUTPUT@', '@INPUT@'],
> > -  depends: doc_generated,
> > +  depends: [doc_generated, sgml_check_stamp],
> >    build_by_default: false,
> >  )
> >  docs += postgres_full_xml
> >  alldocs += postgres_full_xml
>
> Not sure I love that now the entire docs build fails if there is a tab
> anywhere. It's probably ok, but ...
>
> Given you add it as a test for meson and a check for make, why not just make
> sgml_check_stamp a dependency of the test (meson) and check targets (make)?

 I followed what Peter suggested here [1]. The idea was basically
running these checks while building the 'postgres.full.xml', not only
running the test.


> > +postgres_full_xml_target = 'postgres_full_xml'
> > +alias_target(postgres_full_xml_target, postgres_full_xml)
>
> Why would we want a top-level postgres_full_xml target?

The idea was using it with the 'meson compile $target' command [2]
which I will explain below.


> >  if xsltproc_bin.found()
> >    xsltproc_flags = [
> > @@ -308,3 +331,13 @@ endif
> >  if alldocs.length() != 0
> >    alias_target('alldocs', alldocs)
> >  endif
> > +
> > +# Test creation of 'postgres_full_xml'.
> > +test(
> > +  'doc' / 'sgml_syntax_check',
> > +  meson_bin,
> > +  args: ['compile', postgres_full_xml_target],
> > +  suite: 'doc'
> > +)
>
> I'm confused about this 'compile' bit here. What is that intending to do? I
> don't have such a 'compile' command in my PATH, so the test fails here...

My intent was to have a syntax check that will run while we build the
'postgres_full_xml'. And since we already have that, the test just can
try rebuilding the target with the 'meson compile $target' command
[2]. I converted this to:

# The actual test is whether postgres_full_xml can be built, as it has a
# xmllint check and also depends on sgml_check_stamp which performs the syntax
# check.
test(
  'doc' / 'sgml_syntax_check',
  python,
  args: ['-c', ''],
  depends: [postgres_full_xml],
  suite: 'doc',
)

and removed the top-level postgres_full_xml target. This should work
on any platform now. Does that work for you?


> > diff --git a/doc/src/sgml/sgml_syntax_check.pl 
> > b/doc/src/sgml/sgml_syntax_check.pl
> > new file mode 100755
> > index 00000000000..61b4cd203c0
> > --- /dev/null
> > +++ b/doc/src/sgml/sgml_syntax_check.pl
> > @@ -0,0 +1,115 @@
> > +#!/usr/bin/perl
> > +#
> > +# Perl script that checks SGML/XSL syntax and formatting issues.
> > +#
> > +# doc/src/sgml/sgml_syntax_check.pl
> > +#
> > +# Copyright (c) 2026, PostgreSQL Global Development Group
> > +
> > +use strict;
> > +use warnings FATAL => 'all';
> > +use Getopt::Long;
> > +
> > +use File::Find;
> > +
> > +my $srcdir;
> > +my $builddir;
> > +my $dep_file;
> > +my $stamp_file;
> > +
> > +GetOptions(
> > +     'srcdir:s' => \$srcdir,
> > +     'builddir:s' => \$builddir,
> > +     'dep_file:s' => \$dep_file,
> > +     'stamp_file:s' => \$stamp_file) or die "$0: wrong arguments";
>
> I don't think there's a point in having a dep file here. Dependency files are
> only useful if you have dependency on *additional* files to the ones
> explicitly passed in (e.g. a .c file may depend on headers, but the headers
> aren't listed as arguments to the compiler).

We had a chat with Andres off-list about this. Andres thought the
sgml_syntax_check.pl script accepts files as in input rather than
directory. That comment was true if we directly send the file as an
input argument since meson will automatically append this file to the
dependency file. However, we rather send a directory as an input and
manually create a dependency file with the *.sgml and *.xsl files in
the directory.


> > +die "$0: --srcdir must be specified\n" unless defined $srcdir;
> > +
> > +# Find files to process - all the sgml and xsl files (including in 
> > subdirectories)
> > +my @files_to_process;
> > +my @dirs_to_search = ($srcdir);
> > +push @dirs_to_search, $builddir if defined $builddir;
> > +find(
> > +     sub {
> > +             return unless -f $_;
> > +             return if $_ !~ /\.(sgml|xsl)$/;
> > +             push @files_to_process, $File::Find::name;
> > +     },
> > +     @dirs_to_search,);
>
> Why is this any more complicated than just opening the file exactly in the
> passed in source dir?

Same as above, sgml_syntax_check.pl script accepts directory as in
input rather than files.


> > +# tabs and non-breaking spaces are harmless, but it is best to avoid them 
> > in SGML files
> > +sub check_tabs_and_nbsp
> > +{
> > +     my $errors = 0;
> > +     for my $f (@files_to_process)
> > +     {
> > +             open my $fh, "<:encoding(UTF-8)", $f or die "Can't open $f: 
> > $!";
> > +             my $line_no = 0;
> > +             while (<$fh>)
> > +             {
> > +                     $line_no++;
> > +                     if (/\t/)
> > +                     {
> > +                             print STDERR "Tab found in $f:$line_no\n";
> > +                             $errors++;
> > +                     }
> > +                     if (/\xC2\xA0/)
> > +                     {
> > +                             print STDERR "$f:$line_no: contains 
> > non-breaking space\n";
> > +                             $errors++;
> > +                     }
> > +             }
> > +             close($fh);
> > +     }
> > +
> > +     if ($errors)
> > +     {
> > +             remove_stamp_file();
>
> I don't think that's needed, the stamp file needs to be newer than the inputs
> to have an effect, and here you're just removing an old one, no?

Yes, done.


> > +sub create_stamp_file
> > +{
> > +     # Avoid touching existing stamp file to prevent unnecessary rebuilds
> > +     if (!(-f $stamp_file))
> > +     {
> > +             open my $fh, '>', $stamp_file
> > +               or die "can't open $stamp_file: $!";
> > +             close $fh;
> > +     }
> > +}
>
> Won't that *cause* rebuilds? With a stamp file you normally want the stamp
> file to be *newer* than the inputs. Which it won't be, if you don't touch it
> here.
>
> The only reason it doesn't cause quick rebuilds with meson is that ninja
> remembers the timestamps of files an avoids rebuilds if they haven't changed.

Yes, you are right. Done.


[1] https://postgr.es/m/02d46e55-418e-4cb2-ad36-c43d6172a010%40eisentraut.org
[2] https://mesonbuild.com/Commands.html#compile

--
Regards,
Nazir Bilal Yavuz
Microsoft
From 4363b25ec38de2f098af4cf481e2a6016ed42aa4 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Tue, 17 Feb 2026 13:43:07 +0300
Subject: [PATCH v10] Improve docs syntax checking

Move the checks out of the Makefile into a perl script that can be
called from both the Makefile and meson.build. The set of files checked
is simplified, so it is just all the sgml and xsl files found in
docs/src/sgml directory tree.

Along the way make some adjustments to .cirrus.tasks.yml to support this
better in CI.

Also ensure that the checks are run while generating the
postgres-full.xml.

Author: Nazir Bilal Yavuz <[email protected]>
Co-Author: Andrew Dunstan <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Hayato Kuroda <[email protected]>
Discussion: https://postgr.es/m/CAN55FZ1qzoDcaKqsR3DwE%3DX6FL%2Bwpm%2B%3DKLvH6ahrRXNhjU53DQ%40mail.gmail.com
---
 .cirrus.tasks.yml                 |   3 +
 doc/src/sgml/Makefile             |  16 +----
 doc/src/sgml/meson.build          |  36 ++++++++++-
 doc/src/sgml/sgml_syntax_check.pl | 102 ++++++++++++++++++++++++++++++
 4 files changed, 141 insertions(+), 16 deletions(-)
 create mode 100755 doc/src/sgml/sgml_syntax_check.pl

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 2a821593ce5..78f167656f8 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -640,6 +640,8 @@ task:
     TEST_JOBS: 8
     IMAGE: ghcr.io/cirruslabs/macos-runner:sequoia
 
+    XML_CATALOG_FILES: /opt/local/share/xml/docbook/4.5/catalog.xml
+
     CIRRUS_WORKING_DIR: ${HOME}/pgsql/
     CCACHE_DIR: ${HOME}/ccache
     MACPORTS_CACHE: ${HOME}/macports-cache
@@ -654,6 +656,7 @@ task:
 
     MACOS_PACKAGE_LIST: >-
       ccache
+      docbook-xml-4.5
       icu
       kerberos5
       lz4
diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index b53b2694a6b..50de8fab931 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -69,6 +69,7 @@ ALL_IMAGES := $(wildcard $(srcdir)/images/*.svg)
 # files into one big file).  This helps tools that don't understand
 # vpath builds (such as dbtoepub).
 postgres-full.xml: postgres.sgml $(ALL_SGML)
+	$(PERL) $(srcdir)/sgml_syntax_check.pl --srcdir $(srcdir)
 	$(XMLLINT) $(XMLINCLUDE) --output $@ --noent --valid $<
 
 
@@ -200,8 +201,7 @@ MAKEINFO = makeinfo
 ##
 
 # Quick syntax check without style processing
-check: postgres.sgml $(ALL_SGML) check-tabs check-nbsp
-	$(XMLLINT) $(XMLINCLUDE) --noout --valid $<
+check: postgres-full.xml
 
 
 ##
@@ -261,18 +261,6 @@ clean-man:
 
 endif # sqlmansectnum != 7
 
-# tabs are harmless, but it is best to avoid them in SGML files
-check-tabs:
-	@( ! grep '	' $(wildcard $(srcdir)/*.sgml $(srcdir)/func/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \
-	(echo "Tabs appear in SGML/XML files" 1>&2;  exit 1)
-
-# Non-breaking spaces are harmless, but it is best to avoid them in SGML files.
-# Use perl command because non-GNU grep or sed could not have hex escape sequence.
-check-nbsp:
-	@ ( $(PERL) -ne '/\xC2\xA0/ and print("$$ARGV:$$_"),$$n++; END {exit($$n>0)}' \
-	  $(wildcard $(srcdir)/*.sgml $(srcdir)/func/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl $(srcdir)/images/*.xsl) ) || \
-	(echo "Non-breaking spaces appear in SGML/XML files" 1>&2;  exit 1)
-
 ##
 ## Clean
 ##
diff --git a/doc/src/sgml/meson.build b/doc/src/sgml/meson.build
index a1ae5c54ed6..0f2244d7e09 100644
--- a/doc/src/sgml/meson.build
+++ b/doc/src/sgml/meson.build
@@ -98,6 +98,26 @@ xmllint = xmltools_wrapper + [
   '--tool', xmllint_bin, '--',
 ]
 
+sgml_syntax_check = files(
+  'sgml_syntax_check.pl'
+)
+
+# Create a stamp file for sgml syntax check and make postgres_full_xml
+# depend on it.  So, this check will run before the generation of
+# postgres_full_xml.
+sgml_check_stamp = custom_target('sgml-check-stamp',
+  output: 'sgml-check-stamp',
+  depfile: 'sgml-check-stamp.d',
+  command: [
+    perl, sgml_syntax_check,
+    '--srcdir', meson.current_source_dir(),
+    '--builddir', meson.current_build_dir(),
+    '--dep_file', '@DEPFILE@',
+    '--stamp_file', '@OUTPUT@',
+  ],
+  build_by_default: false,
+)
+
 # Run validation only once, common to all subsequent targets.  While
 # we're at it, also resolve all entities (that is, copy all included
 # files into one big file).  This helps tools that don't understand
@@ -108,13 +128,12 @@ postgres_full_xml = custom_target('postgres-full.xml',
   depfile: 'postgres-full.xml.d',
   command: [xmllint, '--nonet', '--noent', '--valid',
             '--path', '@OUTDIR@', '-o', '@OUTPUT@', '@INPUT@'],
-  depends: doc_generated,
+  depends: [doc_generated, sgml_check_stamp],
   build_by_default: false,
 )
 docs += postgres_full_xml
 alldocs += postgres_full_xml
 
-
 if xsltproc_bin.found()
   xsltproc_flags = [
     '--nonet',
@@ -308,3 +327,16 @@ endif
 if alldocs.length() != 0
   alias_target('alldocs', alldocs)
 endif
+
+# The actual test is whether postgres_full_xml can be built, as it has a
+# xmllint check and also depends on sgml_check_stamp which performs the syntax
+# check.
+test(
+  'doc' / 'sgml_syntax_check',
+  python,
+  args: ['-c', ''],
+  depends: [postgres_full_xml],
+  suite: 'doc',
+)
+
+testprep_targets += doc_generated
diff --git a/doc/src/sgml/sgml_syntax_check.pl b/doc/src/sgml/sgml_syntax_check.pl
new file mode 100755
index 00000000000..5ebb7edd37a
--- /dev/null
+++ b/doc/src/sgml/sgml_syntax_check.pl
@@ -0,0 +1,102 @@
+#!/usr/bin/perl
+#
+# Perl script that checks SGML/XSL syntax and formatting issues.
+#
+# doc/src/sgml/sgml_syntax_check.pl
+#
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+use Getopt::Long;
+
+use File::Find;
+
+my $srcdir;
+my $builddir;
+my $dep_file;
+my $stamp_file;
+
+GetOptions(
+	'srcdir:s' => \$srcdir,
+	'builddir:s' => \$builddir,
+	'dep_file:s' => \$dep_file,
+	'stamp_file:s' => \$stamp_file) or die "$0: wrong arguments";
+
+die "$0: --srcdir must be specified\n" unless defined $srcdir;
+
+# Find files to process - all the sgml and xsl files (including in subdirectories)
+my @files_to_process;
+my @dirs_to_search = ($srcdir);
+push @dirs_to_search, $builddir if defined $builddir;
+find(
+	sub {
+		return unless -f $_;
+		return if $_ !~ /\.(sgml|xsl)$/;
+		push @files_to_process, $File::Find::name;
+	},
+	@dirs_to_search,);
+
+# tabs and non-breaking spaces are harmless, but it is best to avoid them in SGML files
+sub check_tabs_and_nbsp
+{
+	my $errors = 0;
+	for my $f (@files_to_process)
+	{
+		open my $fh, "<:encoding(UTF-8)", $f or die "Can't open $f: $!";
+		my $line_no = 0;
+		while (<$fh>)
+		{
+			$line_no++;
+			if (/\t/)
+			{
+				print STDERR "Tab found in $f:$line_no\n";
+				$errors++;
+			}
+			if (/\xC2\xA0/)
+			{
+				print STDERR "$f:$line_no: contains non-breaking space\n";
+				$errors++;
+			}
+		}
+		close($fh);
+	}
+
+	if ($errors)
+	{
+		die "Tabs and/or non-breaking spaces appear in SGML/XML files\n";
+	}
+}
+
+sub write_dep_file()
+{
+	open my $fh, '>', $dep_file or die "can't open $dep_file: $!";
+
+	foreach my $file (@files_to_process)
+	{
+		print $fh "$dep_file : $file\n";
+	}
+
+	close $fh;
+}
+
+sub create_stamp_file
+{
+	open my $fh, '>', $stamp_file
+	  or die "can't open $stamp_file: $!";
+	close $fh;
+}
+
+# Create a dependency file for meson build
+if ($dep_file)
+{
+	write_dep_file();
+}
+
+check_tabs_and_nbsp();
+
+# All checks are passed, we can create a stamp file meson build now
+if ($stamp_file)
+{
+	create_stamp_file();
+}
-- 
2.47.3

Reply via email to