On Tue, Feb 18, 2025 at 3:23 PM Noah Misch <n...@leadboat.com> wrote:
>
> Apart from two doc issues, this is ready:
>
> On Tue, Feb 18, 2025 at 01:23:20PM -0800, Masahiko Sawada wrote:
> > On Mon, Feb 17, 2025 at 2:57 PM Noah Misch <n...@leadboat.com> wrote:
> > > On Fri, Jan 17, 2025 at 05:11:41PM -0800, Masahiko Sawada wrote:
>
> > +        However, when upgrading from <productname>PostgreSQL</productname> 
> > 17 or,
> > +        earlier <application>pg_upgrade</application> adopts the char 
> > signedness
>
> s/or, earlier/or earlier,/
>
> > --- a/doc/src/sgml/ref/pg_resetwal.sgml
> > +++ b/doc/src/sgml/ref/pg_resetwal.sgml
> > @@ -171,6 +171,22 @@ PostgreSQL documentation
> >    </para>
> >
> >    <variablelist>
> > +   <varlistentry>
> > +    <term><option>--char-signedness=<replaceable 
> > class="parameter">option</replaceable></option></term>
> > +    <listitem>
> > +     <para>
> > +      Manually set the default char signedness. Possible values are
> > +      <literal>signed</literal> and <literal>unsigned</literal>.
> > +     </para>
> > +     <para>
> > +      A safe value for this option is, if known, the default char 
> > signedness
> > +      of the platform where the database cluster was initialized. However,
>
> Only if initialized on v17 or earlier.  I recommend this edit:
>
> diff --git a/doc/src/sgml/ref/pg_resetwal.sgml 
> b/doc/src/sgml/ref/pg_resetwal.sgml
> index a72678d..dd011d2 100644
> --- a/doc/src/sgml/ref/pg_resetwal.sgml
> +++ b/doc/src/sgml/ref/pg_resetwal.sgml
> @@ -179,8 +179,11 @@ PostgreSQL documentation
>        <literal>signed</literal> and <literal>unsigned</literal>.
>       </para>
>       <para>
> -      A safe value for this option is, if known, the default char signedness
> -      of the platform where the database cluster was initialized. However,
> +      For a database cluster that <command>pg_upgrade</command> upgraded from
> +      a <productname>PostgreSQL</productname> version before 18, the safe
> +      value would be the default <type>char</type> signedness of the platform
> +      that ran the cluster before that upgrade. For all other
> +      clusters, <literal>signed</literal> would be the safe value. However,
>        this option is exclusively for use with <command>pg_upgrade</command>
>        and should not normally be used manually.
>       </para>

Thank you for reviewing the patches. I've fixed these issues and
attached the updated patches.

I have one question about the 0001 patch; since we add
'default_char_signedness' field to ControlFileData do we need to bump
PG_CONTROL_VERSION? We have comments about bumping PG_CONTROL_VERSION
when changing CheckPoint struct or DBState enum so it seems likely but
I'd like to confirm just in case that we need to bump
PG_CONTROL_VERSION also when changing ControlFileData. If we need, can
we bump it to 1800? or 1701?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From 55d365dd4e1b33e55834b00007b8b580f2db0686 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 2 Oct 2024 15:08:26 -0700
Subject: [PATCH v5 2/5] pg_resetwal: Add --char-signedness option to change
 the default char signedness.

With the newly added option --char-signedness, pg_resetwal updates the
default char signedness flag in the controlfile. This option is
primarily intended for an upcoming patch that pg_upgrade supports
preserving the default char signedness during upgrades, and is not
meant for manual operation.

Reviewed-by: Noah Misch
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
---
 doc/src/sgml/ref/pg_resetwal.sgml  | 19 +++++++++++++++++++
 src/bin/pg_resetwal/pg_resetwal.c  | 25 +++++++++++++++++++++++++
 src/bin/pg_resetwal/t/001_basic.pl |  6 ++++++
 3 files changed, 50 insertions(+)

diff --git a/doc/src/sgml/ref/pg_resetwal.sgml b/doc/src/sgml/ref/pg_resetwal.sgml
index cf9c7e70f27..dd011d246c1 100644
--- a/doc/src/sgml/ref/pg_resetwal.sgml
+++ b/doc/src/sgml/ref/pg_resetwal.sgml
@@ -171,6 +171,25 @@ PostgreSQL documentation
   </para>
 
   <variablelist>
+   <varlistentry>
+    <term><option>--char-signedness=<replaceable class="parameter">option</replaceable></option></term>
+    <listitem>
+     <para>
+      Manually set the default char signedness. Possible values are
+      <literal>signed</literal> and <literal>unsigned</literal>.
+     </para>
+     <para>
+      For a database cluster that <command>pg_upgrade</command> upgraded from
+      a <productname>PostgreSQL</productname> version before 18, the safe
+      value would be the default <type>char</type> signedness of the platform
+      that ran the cluster before that upgrade. For all other
+      clusters, <literal>signed</literal> would be the safe value. However,
+      this option is exclusively for use with <command>pg_upgrade</command>
+      and should not normally be used manually.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><option>-c <replaceable class="parameter">xid</replaceable>,<replaceable class="parameter">xid</replaceable></option></term>
     <term><option>--commit-timestamp-ids=<replaceable class="parameter">xid</replaceable>,<replaceable class="parameter">xid</replaceable></option></term>
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index ed73607a46f..31bc0abff16 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -75,6 +75,7 @@ static TimeLineID minXlogTli = 0;
 static XLogSegNo minXlogSegNo = 0;
 static int	WalSegSz;
 static int	set_wal_segsize;
+static int	set_char_signedness = -1;
 
 static void CheckDataVersion(void);
 static bool read_controlfile(void);
@@ -106,6 +107,7 @@ main(int argc, char *argv[])
 		{"oldest-transaction-id", required_argument, NULL, 'u'},
 		{"next-transaction-id", required_argument, NULL, 'x'},
 		{"wal-segsize", required_argument, NULL, 1},
+		{"char-signedness", required_argument, NULL, 2},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -302,6 +304,23 @@ main(int argc, char *argv[])
 					break;
 				}
 
+			case 2:
+				{
+					errno = 0;
+
+					if (pg_strcasecmp(optarg, "signed") == 0)
+						set_char_signedness = 1;
+					else if (pg_strcasecmp(optarg, "unsigned") == 0)
+						set_char_signedness = 0;
+					else
+					{
+						pg_log_error("invalid argument for option %s", "--char-signedness");
+						pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+						exit(1);
+					}
+					break;
+				}
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -456,6 +475,9 @@ main(int argc, char *argv[])
 	if (set_wal_segsize != 0)
 		ControlFile.xlog_seg_size = WalSegSz;
 
+	if (set_char_signedness != -1)
+		ControlFile.default_char_signedness = (set_char_signedness == 1);
+
 	if (minXlogSegNo > newXlogSegNo)
 		newXlogSegNo = minXlogSegNo;
 
@@ -779,6 +801,8 @@ PrintControlValues(bool guessed)
 		   (ControlFile.float8ByVal ? _("by value") : _("by reference")));
 	printf(_("Data page checksum version:           %u\n"),
 		   ControlFile.data_checksum_version);
+	printf(_("Default char data signedness:         %s\n"),
+		   (ControlFile.default_char_signedness ? _("signed") : _("unsigned")));
 }
 
 
@@ -1188,6 +1212,7 @@ usage(void)
 	printf(_("  -O, --multixact-offset=OFFSET    set next multitransaction offset\n"));
 	printf(_("  -u, --oldest-transaction-id=XID  set oldest transaction ID\n"));
 	printf(_("  -x, --next-transaction-id=XID    set next transaction ID\n"));
+	printf(_("      --char-signedness=OPTION     set char signedness to \"signed\"  or \"unsigned\"\n"));
 	printf(_("      --wal-segsize=SIZE           size of WAL segments, in megabytes\n"));
 
 	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
diff --git a/src/bin/pg_resetwal/t/001_basic.pl b/src/bin/pg_resetwal/t/001_basic.pl
index 323cd483cf3..d6bbbd0ceda 100644
--- a/src/bin/pg_resetwal/t/001_basic.pl
+++ b/src/bin/pg_resetwal/t/001_basic.pl
@@ -173,6 +173,12 @@ command_fails_like(
 	qr/must be greater than/,
 	'fails with -x value too small');
 
+# --char-signedness
+command_fails_like(
+	[ 'pg_resetwal', '--char-signedness', 'foo', $node->data_dir ],
+	qr/error: invalid argument for option --char-signedness/,
+	'fails with incorrect --char-signedness option');
+
 # run with control override options
 
 my $out = (run_command([ 'pg_resetwal', '--dry-run', $node->data_dir ]))[0];
-- 
2.43.5

From fd9d95eac0ed5c64815e07d27ab892be4f0aad11 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 25 Sep 2024 15:17:02 -0700
Subject: [PATCH v5 5/5] Fix an issue with index scan using pg_trgm due to char
 signedness on different architectures.

GIN and GiST indexes utilizing pg_trgm's opclasses store sorted
trigrams within index tuples. When comparing and sorting each trigram,
pg_trgm treats each character as a 'char[3]' type in C. However, the
char type in C can be interpreted as either signed char or unsigned
char, depending on the platform, if the signedness is not explicitly
specified. Consequently, during replication between different CPU
architectures, there was an issue where index scans on standby servers
could not locate matching index tuples due to the differing treatment
of character signedness.

This change introduces comparison functions for trgm that explicitly
handle signed char and unsigned char. The appropriate comparison
function will be dynamically selected based on the character
signedness stored in the control file. Therefore, upgraded clusters
can utilize the indexes without rebuilding, provided the cluster
upgrade occurs on platforms with the same character signedness as the
original cluster initialization.

The default char signedness information was introduced in XXXX, so no
backpatch.

Reviewed-by: Noah Misch
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
---
 contrib/pg_trgm/trgm.h    |  5 +----
 contrib/pg_trgm/trgm_op.c | 44 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/contrib/pg_trgm/trgm.h b/contrib/pg_trgm/trgm.h
index 10827563694..ca017585369 100644
--- a/contrib/pg_trgm/trgm.h
+++ b/contrib/pg_trgm/trgm.h
@@ -40,15 +40,12 @@
 
 typedef char trgm[3];
 
-#define CMPCHAR(a,b) ( ((a)==(b)) ? 0 : ( ((a)<(b)) ? -1 : 1 ) )
-#define CMPPCHAR(a,b,i)  CMPCHAR( *(((const char*)(a))+i), *(((const char*)(b))+i) )
-#define CMPTRGM(a,b) ( CMPPCHAR(a,b,0) ? CMPPCHAR(a,b,0) : ( CMPPCHAR(a,b,1) ? CMPPCHAR(a,b,1) : CMPPCHAR(a,b,2) ) )
-
 #define CPTRGM(a,b) do {				\
 	*(((char*)(a))+0) = *(((char*)(b))+0);	\
 	*(((char*)(a))+1) = *(((char*)(b))+1);	\
 	*(((char*)(a))+2) = *(((char*)(b))+2);	\
 } while(0)
+extern int	(*CMPTRGM) (const void *a, const void *b);
 
 #define ISWORDCHR(c)	(t_isalnum(c))
 #define ISPRINTABLECHAR(a)	( isascii( *(unsigned char*)(a) ) && (isalnum( *(unsigned char*)(a) ) || *(unsigned char*)(a)==' ') )
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index d0833b3e4a1..94b9015fd67 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -42,6 +42,9 @@ PG_FUNCTION_INFO_V1(strict_word_similarity_commutator_op);
 PG_FUNCTION_INFO_V1(strict_word_similarity_dist_op);
 PG_FUNCTION_INFO_V1(strict_word_similarity_dist_commutator_op);
 
+static int	CMPTRGM_CHOOSE(const void *a, const void *b);
+int			(*CMPTRGM) (const void *a, const void *b) = CMPTRGM_CHOOSE;
+
 /* Trigram with position */
 typedef struct
 {
@@ -107,6 +110,47 @@ _PG_init(void)
 	MarkGUCPrefixReserved("pg_trgm");
 }
 
+#define CMPCHAR(a,b) ( ((a)==(b)) ? 0 : ( ((a)<(b)) ? -1 : 1 ) )
+
+/*
+ * Functions for comparing two trgms while treating each char as "signed char" or
+ * "unsigned char".
+ */
+static inline int
+CMPTRGM_SIGNED(const void *a, const void *b)
+{
+#define CMPPCHAR_S(a,b,i)  CMPCHAR( *(((const signed char*)(a))+i), *(((const signed char*)(b))+i) )
+
+	return CMPPCHAR_S(a, b, 0) ? CMPPCHAR_S(a, b, 0)
+		: (CMPPCHAR_S(a, b, 1) ? CMPPCHAR_S(a, b, 1)
+		   : CMPPCHAR_S(a, b, 2));
+}
+
+static inline int
+CMPTRGM_UNSIGNED(const void *a, const void *b)
+{
+#define CMPPCHAR_UNS(a,b,i)  CMPCHAR( *(((const unsigned char*)(a))+i), *(((const unsigned char*)(b))+i) )
+
+	return CMPPCHAR_UNS(a, b, 0) ? CMPPCHAR_UNS(a, b, 0)
+		: (CMPPCHAR_UNS(a, b, 1) ? CMPPCHAR_UNS(a, b, 1)
+		   : CMPPCHAR_UNS(a, b, 2));
+}
+
+/*
+ * This gets called on the first call. It replaces the function pointer so
+ * that subsequent calls are routed directly to the chosen implementation.
+ */
+static int
+CMPTRGM_CHOOSE(const void *a, const void *b)
+{
+	if (GetDefaultCharSignedness())
+		CMPTRGM = CMPTRGM_SIGNED;
+	else
+		CMPTRGM = CMPTRGM_UNSIGNED;
+
+	return CMPTRGM(a, b);
+}
+
 /*
  * Deprecated function.
  * Use "pg_trgm.similarity_threshold" GUC variable instead of this function.
-- 
2.43.5

From 5296822018301c5b6d4d59981cbf4a6bd78b4633 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 2 Oct 2024 15:11:23 -0700
Subject: [PATCH v5 3/5] pg_upgrade: Preserve default char signedness value
 from old cluster.

Commit XXX introduced the 'default_char_signedness' field in
controlfile. Newly created database clusters always set this field to
'signed'.

This change ensures that pg_upgrade updates the
'default_char_signedness' to 'unsigned' if the source database cluster
has signedness=false. For source clusters from v17 or earlier, which
lack the 'default_char_signedness' information, pg_upgrade assumes the
source cluster was initialized on the same platform where pg_upgrade
is running. It then sets the 'default_char_signedness' value according
to the current platform's default character signedness.

XXX: need to adjust DEFAULT_CHAR_SIGNEDNESS_CAT_VER.

Reviewed-by: Noah Misch
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
---
 src/bin/pg_upgrade/controldata.c            | 44 +++++++++++++-
 src/bin/pg_upgrade/pg_upgrade.c             | 28 +++++++++
 src/bin/pg_upgrade/pg_upgrade.h             |  7 +++
 src/bin/pg_upgrade/t/005_char_signedness.pl | 65 +++++++++++++++++++++
 4 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 src/bin/pg_upgrade/t/005_char_signedness.pl

diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index ab16716f319..bd49ea867bf 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include <ctype.h>
+#include <limits.h>				/* for CHAR_MIN */
 
 #include "common/string.h"
 #include "pg_upgrade.h"
@@ -62,6 +63,7 @@ get_control_data(ClusterInfo *cluster)
 	bool		got_date_is_int = false;
 	bool		got_data_checksum_version = false;
 	bool		got_cluster_state = false;
+	bool		got_default_char_signedness = false;
 	char	   *lc_collate = NULL;
 	char	   *lc_ctype = NULL;
 	char	   *lc_monetary = NULL;
@@ -501,6 +503,25 @@ get_control_data(ClusterInfo *cluster)
 			cluster->controldata.data_checksum_version = str2uint(p);
 			got_data_checksum_version = true;
 		}
+		else if ((p = strstr(bufin, "Default char data signedness:")) != NULL)
+		{
+			p = strchr(p, ':');
+
+			if (p == NULL || strlen(p) <= 1)
+				pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+			/* Skip the colon and any whitespace after it */
+			p++;
+			while (isspace((unsigned char) *p))
+				p++;
+
+			/* The value should be either 'signed' or 'unsigned' */
+			if (strcmp(p, "signed") != 0 && strcmp(p, "unsigned") != 0)
+				pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+			cluster->controldata.default_char_signedness = strcmp(p, "signed") == 0;
+			got_default_char_signedness = true;
+		}
 	}
 
 	rc = pclose(output);
@@ -561,6 +582,21 @@ get_control_data(ClusterInfo *cluster)
 		}
 	}
 
+	/*
+	 * Pre-v18 database clusters don't have the default char signedness
+	 * information. We use the char signedness of the platform where
+	 * pg_upgrade was built.
+	 */
+	if (cluster->controldata.cat_ver < DEFAULT_CHAR_SIGNEDNESS_CAT_VER)
+	{
+		Assert(!got_default_char_signedness);
+#if CHAR_MIN != 0
+		cluster->controldata.default_char_signedness = true;
+#else
+		cluster->controldata.default_char_signedness = false;
+#endif
+	}
+
 	/* verify that we got all the mandatory pg_control data */
 	if (!got_xid || !got_oid ||
 		!got_multi || !got_oldestxid ||
@@ -572,7 +608,9 @@ get_control_data(ClusterInfo *cluster)
 		!got_index || !got_toast ||
 		(!got_large_object &&
 		 cluster->controldata.ctrl_ver >= LARGE_OBJECT_SIZE_PG_CONTROL_VER) ||
-		!got_date_is_int || !got_data_checksum_version)
+		!got_date_is_int || !got_data_checksum_version ||
+		(!got_default_char_signedness &&
+		 cluster->controldata.cat_ver >= DEFAULT_CHAR_SIGNEDNESS_CAT_VER))
 	{
 		if (cluster == &old_cluster)
 			pg_log(PG_REPORT,
@@ -641,6 +679,10 @@ get_control_data(ClusterInfo *cluster)
 		if (!got_data_checksum_version)
 			pg_log(PG_REPORT, "  data checksum version");
 
+		/* value added in Postgres 18 */
+		if (!got_default_char_signedness)
+			pg_log(PG_REPORT, "  default char signedness");
+
 		pg_fatal("Cannot continue without required control information, terminating");
 	}
 }
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 36c7f3879d5..cc7357b5599 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -54,6 +54,7 @@
  */
 #define RESTORE_TRANSACTION_SIZE 1000
 
+static void set_new_cluster_char_signedness(void);
 static void set_locale_and_encoding(void);
 static void prepare_new_cluster(void);
 static void prepare_new_globals(void);
@@ -154,6 +155,7 @@ main(int argc, char **argv)
 	 */
 
 	copy_xact_xlog_xid();
+	set_new_cluster_char_signedness();
 
 	/* New now using xids of the old system */
 
@@ -388,6 +390,32 @@ setup(char *argv0)
 	}
 }
 
+/*
+ * Set the new cluster's default char signedness using the old cluster's
+ * value.
+ */
+static void
+set_new_cluster_char_signedness(void)
+{
+	bool		new_char_signedness;
+
+	/* Inherit the source database's signedness */
+	new_char_signedness = old_cluster.controldata.default_char_signedness;
+
+	/* Change the char signedness of the new cluster, if necessary */
+	if (new_cluster.controldata.default_char_signedness != new_char_signedness)
+	{
+		prep_status("Setting the default char signedness for new cluster");
+
+		exec_prog(UTILITY_LOG_FILE, NULL, true, true,
+				  "\"%s/pg_resetwal\" --char-signedness %s \"%s\"",
+				  new_cluster.bindir,
+				  new_char_signedness ? "signed" : "unsigned",
+				  new_cluster.pgdata);
+
+		check_ok();
+	}
+}
 
 /*
  * Copy locale and encoding information into the new cluster's template0.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 0cdd675e4f1..26991e71009 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -125,6 +125,12 @@ extern char *output_files[];
  */
 #define JSONB_FORMAT_CHANGE_CAT_VER 201409291
 
+/*
+ * The control file was changed to have the default char signedness,
+ * commit XXXXX.
+ */
+#define DEFAULT_CHAR_SIGNEDNESS_CAT_VER 202501161
+
 
 /*
  * Each relation is represented by a relinfo structure.
@@ -245,6 +251,7 @@ typedef struct
 	bool		date_is_int;
 	bool		float8_pass_by_value;
 	uint32		data_checksum_version;
+	bool		default_char_signedness;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_upgrade/t/005_char_signedness.pl b/src/bin/pg_upgrade/t/005_char_signedness.pl
new file mode 100644
index 00000000000..05c3014a27d
--- /dev/null
+++ b/src/bin/pg_upgrade/t/005_char_signedness.pl
@@ -0,0 +1,65 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# Tests for handling the default char signedness during upgrade.
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Can be changed to test the other modes
+my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
+
+# Initialize old and new clusters
+my $old = PostgreSQL::Test::Cluster->new('old');
+my $new = PostgreSQL::Test::Cluster->new('new');
+$old->init();
+$new->init();
+
+# Check the default char signedness of both the old and the new clusters.
+# Newly created clusters unconditionally use 'signed'.
+command_like(
+	[ 'pg_controldata', $old->data_dir ],
+	qr/Default char data signedness:\s+signed/,
+	'default char signedness of old cluster is signed in control file');
+command_like(
+	[ 'pg_controldata', $new->data_dir ],
+	qr/Default char data signedness:\s+signed/,
+	'default char signedness of new cluster is signed in control file');
+
+# Set the old cluster's default char signedness to unsigned for test.
+command_ok(
+	[ 'pg_resetwal', '--char-signedness', 'unsigned', '-f', $old->data_dir ],
+	"set old cluster's default char signedness to unsigned");
+
+# Check if the value is successfully updated.
+command_like(
+	[ 'pg_controldata', $old->data_dir ],
+	qr/Default char data signedness:\s+unsigned/,
+	'updated default char signedness is unsigned in control file');
+
+# pg_upgrade should be successful.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old->data_dir,
+		'-D', $new->data_dir,
+		'-b', $old->config_data('--bindir'),
+		'-B', $new->config_data('--bindir'),
+		'-s', $new->host,
+		'-p', $old->port,
+		'-P', $new->port,
+		$mode
+	],
+	'run of pg_upgrade');
+
+# Check if the default char signedness of the new cluster inherited
+# the old cluster's value.
+command_like(
+	[ 'pg_controldata', $new->data_dir ],
+	qr/Default char data signedness:\s+unsigned/,
+	'the default char signedness is updated during pg_upgrade');
+
+done_testing();
-- 
2.43.5

From a064ae71ee5d4990acff8eef60eb90de80141f9c Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 25 Sep 2024 11:08:57 -0700
Subject: [PATCH v5 1/5] Add default_char_signedness field to ControlFileData.

The signedness of the 'char' type in C is
implementation-dependent. For instance, 'signed char' is used by
default on x86 CPUs, while 'unsigned char' is used on aarch
CPUs. Previously, we accidentally let C implementation signedness
affect persistent data. This led to inconsistent results when
comparing char data across different platforms.

This commit introduces a new 'default_char_signedness' field in
ControlFileData to store the signedness of the 'char' type. While this
change does not encourage the use of 'char' without explicitly
specifying its signedness, this field can be used as a hint to ensure
consistent behavior for pre-v18 data files that store data sorted by
the 'char' type on disk (e.g., GIN and GiST indexes), especially in
cross-platform replication scenarios.

Newly created database clusters unconditionally set the default char
signedness to true. pg_upgrade (with an upcoming commit) changes this
flag for clusters if the source database cluster has
signedness=false. As a result, signedness=false setting will become
rare over time. If we had known about the problem during the last
development cycle that forced initdb (v8.3), we would have made all
clusters signed or all clusters unsigned. Making pg_upgrade the only
source of signedness=false will cause the population of database
clusters to converge toward that retrospective ideal.

XXX bump catversion.
XXX bump PG_CONTROL_VERSION too?

Reviewed-by: Noah Misch
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
---
 doc/src/sgml/func.sgml                  |  5 ++++
 src/backend/access/transam/xlog.c       | 40 +++++++++++++++++++++++++
 src/backend/utils/misc/pg_controldata.c |  7 +++--
 src/bin/pg_controldata/pg_controldata.c |  2 ++
 src/include/access/xlog.h               |  1 +
 src/include/catalog/pg_control.h        |  6 ++++
 src/include/catalog/pg_proc.dat         |  6 ++--
 7 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7efc81936ab..21c666af2d2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27986,6 +27986,11 @@ acl      | {postgres=arwdDxtm/postgres,foo=r/postgres}
        <entry><type>integer</type></entry>
       </row>
 
+      <row>
+       <entry><structfield>default_char_signedness</structfield></entry>
+       <entry><type>boolean</type></entry>
+      </row>
+
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 25a5c605404..90832c37a53 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4284,6 +4284,33 @@ WriteControlFile(void)
 
 	ControlFile->float8ByVal = FLOAT8PASSBYVAL;
 
+	/*
+	 * Initialize the default 'char' signedness.
+	 *
+	 * The signedness of the char type is implementation-defined. For instance
+	 * on x86 architecture CPUs, the char data type is typically treated as
+	 * signed by default, whereas on aarch architecture CPUs, it is typically
+	 * treated as unsigned by default. In v17 or earlier, we accidentally let
+	 * C implementation signedness affect persistent data. This led to
+	 * inconsistent results when comparing char data across different
+	 * platforms.
+	 *
+	 * This flag can be used as a hint to ensure consistent behavior for
+	 * pre-v18 data files that store data sorted by the 'char' type on disk,
+	 * especially in cross-platform replication scenarios.
+	 *
+	 * Newly created database clusters unconditionally set the default char
+	 * signedness to true. pg_upgrade changes this flag for clusters that were
+	 * initialized on signedness=false platforms. As a result,
+	 * signedness=false setting will become rare over time. If we had known
+	 * about this problem during the last development cycle that forced initdb
+	 * (v8.3), we would have made all clusters signed or all clusters
+	 * unsigned. Making pg_upgrade the only source of signedness=false will
+	 * cause the population of database clusters to converge toward that
+	 * retrospective ideal.
+	 */
+	ControlFile->default_char_signedness = true;
+
 	/* Contents are protected with a CRC */
 	INIT_CRC32C(ControlFile->crc);
 	COMP_CRC32C(ControlFile->crc,
@@ -4612,6 +4639,19 @@ DataChecksumsEnabled(void)
 	return (ControlFile->data_checksum_version > 0);
 }
 
+/*
+ * Return true if the cluster was initialized on a platform where the
+ * default signedness of char is "signed". This function exists for code
+ * that deals with pre-v18 data files that store data sorted by the 'char'
+ * type on disk (e.g., GIN and GiST indexes). See the comments in
+ * WriteControlFile() for details.
+ */
+bool
+GetDefaultCharSignedness(void)
+{
+	return ControlFile->default_char_signedness;
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c
index 9dfba499c13..6d036e3bf32 100644
--- a/src/backend/utils/misc/pg_controldata.c
+++ b/src/backend/utils/misc/pg_controldata.c
@@ -203,8 +203,8 @@ pg_control_recovery(PG_FUNCTION_ARGS)
 Datum
 pg_control_init(PG_FUNCTION_ARGS)
 {
-	Datum		values[11];
-	bool		nulls[11];
+	Datum		values[12];
+	bool		nulls[12];
 	TupleDesc	tupdesc;
 	HeapTuple	htup;
 	ControlFileData *ControlFile;
@@ -254,6 +254,9 @@ pg_control_init(PG_FUNCTION_ARGS)
 	values[10] = Int32GetDatum(ControlFile->data_checksum_version);
 	nulls[10] = false;
 
+	values[11] = BoolGetDatum(ControlFile->default_char_signedness);
+	nulls[11] = false;
+
 	htup = heap_form_tuple(tupdesc, values, nulls);
 
 	PG_RETURN_DATUM(HeapTupleGetDatum(htup));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index cf11ab3f2ee..bea779eef94 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -336,6 +336,8 @@ main(int argc, char *argv[])
 		   (ControlFile->float8ByVal ? _("by value") : _("by reference")));
 	printf(_("Data page checksum version:           %u\n"),
 		   ControlFile->data_checksum_version);
+	printf(_("Default char data signedness:         %s\n"),
+		   (ControlFile->default_char_signedness ? _("signed") : _("unsigned")));
 	printf(_("Mock authentication nonce:            %s\n"),
 		   mock_auth_nonce_str);
 	return 0;
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 4411c1468ac..d313099c027 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -231,6 +231,7 @@ extern XLogRecPtr GetXLogWriteRecPtr(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
 extern bool DataChecksumsEnabled(void);
+extern bool GetDefaultCharSignedness(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 3797f25b306..0cb86faa43b 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -221,6 +221,12 @@ typedef struct ControlFileData
 	/* Are data pages protected by checksums? Zero if no checksum version */
 	uint32		data_checksum_version;
 
+	/*
+	 * True if the default signedness of char is "signed" on a platform where
+	 * the cluster is initialized.
+	 */
+	bool		default_char_signedness;
+
 	/*
 	 * Random nonce, used in authentication requests that need to proceed
 	 * based on values that are cluster-unique, like a SASL exchange that
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9e803d610d7..e2d5c0d0886 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12206,9 +12206,9 @@
   descr => 'pg_controldata init state information as a function',
   proname => 'pg_control_init', provolatile => 'v', prorettype => 'record',
   proargtypes => '',
-  proallargtypes => '{int4,int4,int4,int4,int4,int4,int4,int4,int4,bool,int4}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version}',
+  proallargtypes => '{int4,int4,int4,int4,int4,int4,int4,int4,int4,bool,int4,bool}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version,default_char_signedness}',
   prosrc => 'pg_control_init' },
 
 # subscripting support for built-in types
-- 
2.43.5

From 25344fad6f925e328dc80b139e2ca80ae6117dca Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 2 Oct 2024 15:12:27 -0700
Subject: [PATCH v5 4/5] pg_upgrade: Add --set-char-signedness to set the
 default char signedness of new cluster.

This change adds a new option --set-char-signedness to pg_upgrade. It
enables user to set arbitrary signedness during pg_upgrade. This helps
cases where user who knew they copied the v17 source cluster from
x86 (signedness=true) to ARM (signedness=false) can pg_upgrade
properly without the prerequisite of acquiring an x86 VM.

Reviewed-by: Noah Misch
Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
---
 doc/src/sgml/ref/pgupgrade.sgml             | 53 +++++++++++++++++++++
 src/bin/pg_upgrade/check.c                  | 12 +++++
 src/bin/pg_upgrade/option.c                 | 12 +++++
 src/bin/pg_upgrade/pg_upgrade.c             | 10 +++-
 src/bin/pg_upgrade/pg_upgrade.h             |  3 ++
 src/bin/pg_upgrade/t/005_char_signedness.pl | 17 +++++++
 6 files changed, 105 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 4777381dac2..27d0b668deb 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -276,6 +276,59 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--set-char-signedness=</option><replaceable>option</replaceable></term>
+      <listitem>
+       <para>
+        Manually set the default char signedness of new clusters. Possible values
+        are <literal>signed</literal> and <literal>unsigned</literal>.
+       </para>
+       <para>
+        In the C language, the default signedness of the <type>char</type> type
+        (when not explicitly specified) varies across platforms. For example,
+        <type>char</type> defaults to <type>signed char</type> on x86 CPUs but
+        to <type>unsigned char</type> on ARM CPUs.
+       </para>
+       <para>
+        Starting from <productname>PostgreSQL</productname> 18, database clusters
+        maintain their own default char signedness setting, which can be used to
+        ensure consistent behavior across platforms with different default char
+        signedness. By default, <application>pg_upgrade</application> preserves
+        the char signedness setting when upgrading from an existing cluster.
+        However, when upgrading from <productname>PostgreSQL</productname> 17 or
+        earlier, <application>pg_upgrade</application> adopts the char signedness
+        of the platform on which it was built.
+       </para>
+       <para>
+        This option allows you to explicitly set the default char signedness for
+        the new cluster, overriding any inherited values. There are two specific
+        scenarios where this option is relevant:
+        <itemizedlist>
+         <listitem>
+          <para>
+           If you are planning to migrate to a different platform after the upgrade,
+           you should not use this option. The default behavior is right in this case.
+           Instead, perform the upgrade on the original platform without this flag,
+           and then migrate the cluster afterward. This is the recommended and safest
+           approach.
+          </para>
+         </listitem>
+         <listitem>
+          <para>
+           If you have already migrated the cluster to a platform with different
+           char signedness (for example, from an x86-based system to an ARM-based
+           system), you should use this option to specify the signedness matching
+           the original platform's default char signedness. Additionally, it's
+           essential not to modify any data files between migrating data files and
+           running <command>pg_upgrade</command>. <command>pg_upgrade</command>
+           should be the first operation that starts the cluster on the new platform.
+          </para>
+         </listitem>
+        </itemizedlist>
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-?</option></term>
       <term><option>--help</option></term>
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 7ca1d8fffc9..d6f629dd3a2 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -838,6 +838,18 @@ check_cluster_versions(void)
 		GET_MAJOR_VERSION(new_cluster.bin_version))
 		pg_fatal("New cluster data and binary directories are from different major versions.");
 
+	/*
+	 * Since from version 18, newly created database clusters always have
+	 * 'signed' default char-signedness, it makes less sense to use
+	 * --set-char-signedness option for upgrading from version 18 or later.
+	 * Users who want to change the default char signedness of the new
+	 * cluster, they can use pg_resetwal manually before the upgrade.
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1800 &&
+		user_opts.char_signedness != -1)
+		pg_fatal("%s option cannot be used to upgrade from PostgreSQL %s and later.",
+				 "--set-char-signedness", "18");
+
 	check_ok();
 }
 
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 108eb7a1ba4..1a580d656bb 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -60,6 +60,7 @@ parseCommandLine(int argc, char *argv[])
 		{"copy", no_argument, NULL, 2},
 		{"copy-file-range", no_argument, NULL, 3},
 		{"sync-method", required_argument, NULL, 4},
+		{"set-char-signedness", required_argument, NULL, 5},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -70,6 +71,7 @@ parseCommandLine(int argc, char *argv[])
 
 	user_opts.do_sync = true;
 	user_opts.transfer_mode = TRANSFER_MODE_COPY;
+	user_opts.char_signedness = -1;
 
 	os_info.progname = get_progname(argv[0]);
 
@@ -212,6 +214,14 @@ parseCommandLine(int argc, char *argv[])
 				user_opts.sync_method = pg_strdup(optarg);
 				break;
 
+			case 5:
+				if (pg_strcasecmp(optarg, "signed") == 0)
+					user_opts.char_signedness = 1;
+				else if (pg_strcasecmp(optarg, "unsigned") == 0)
+					user_opts.char_signedness = 0;
+				else
+					pg_fatal("invalid argument for option %s", "--set-char-signedness");
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
 						os_info.progname);
@@ -306,6 +316,8 @@ usage(void)
 	printf(_("  --clone                       clone instead of copying files to new cluster\n"));
 	printf(_("  --copy                        copy files to new cluster (default)\n"));
 	printf(_("  --copy-file-range             copy files to new cluster with copy_file_range\n"));
+	printf(_("  --set-char-signedness=OPTION  set new cluster char signedness to \"signed\" or\n"));
+	printf(_("                                \"unsigned\"\n"));
 	printf(_("  --sync-method=METHOD          set method for syncing files to disk\n"));
 	printf(_("  -?, --help                    show this help, then exit\n"));
 	printf(_("\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index cc7357b5599..e95be8b459d 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -399,8 +399,14 @@ set_new_cluster_char_signedness(void)
 {
 	bool		new_char_signedness;
 
-	/* Inherit the source database's signedness */
-	new_char_signedness = old_cluster.controldata.default_char_signedness;
+	/*
+	 * Use the specified char signedness if specified. Otherwise we inherit
+	 * inherit the source database's signedness.
+	 */
+	if (user_opts.char_signedness != -1)
+		new_char_signedness = (user_opts.char_signedness == 1);
+	else
+		new_char_signedness = old_cluster.controldata.default_char_signedness;
 
 	/* Change the char signedness of the new cluster, if necessary */
 	if (new_cluster.controldata.default_char_signedness != new_char_signedness)
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 26991e71009..7d50c83d0bf 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -334,6 +334,9 @@ typedef struct
 	int			jobs;			/* number of processes/threads to use */
 	char	   *socketdir;		/* directory to use for Unix sockets */
 	char	   *sync_method;
+	int			char_signedness;	/* default char signedness: -1 for initial
+									 * value, 1 for "signed" and 0 for
+									 * "unsigned" */
 } UserOpts;
 
 typedef struct
diff --git a/src/bin/pg_upgrade/t/005_char_signedness.pl b/src/bin/pg_upgrade/t/005_char_signedness.pl
index 05c3014a27d..c024106863e 100644
--- a/src/bin/pg_upgrade/t/005_char_signedness.pl
+++ b/src/bin/pg_upgrade/t/005_char_signedness.pl
@@ -40,6 +40,23 @@ command_like(
 	qr/Default char data signedness:\s+unsigned/,
 	'updated default char signedness is unsigned in control file');
 
+# Cannot use --set-char-signedness option for upgrading from v18+
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d', $old->data_dir,
+		'-D', $new->data_dir,
+		'-b', $old->config_data('--bindir'),
+		'-B', $new->config_data('--bindir'),
+		'-s', $new->host,
+		'-p', $old->port,
+		'-P', $new->port,
+		'-set-char-signedness', 'signed',
+		$mode
+	],
+	'--set-char-signedness option cannot be used for upgrading from v18 or later'
+);
+
 # pg_upgrade should be successful.
 command_ok(
 	[
-- 
2.43.5

Reply via email to