Re: Test instability when pg_dump orders by OID

Noah Misch Fri, 18 Jul 2025 12:17:26 -0700

On Thu, Jul 17, 2025 at 09:24:02AM -0400, Robert Haas wrote:
> On Mon, Jul 7, 2025 at 3:27 PM Noah Misch <n...@leadboat.com> wrote:
> > Let's get rid of pg_dump's need to sort by OID, apart from catalog 
> > corruption
> > scenarios.
> 
> +1. I had at one point believed that sorting by OID was a good way to
> make dumps stable, but this disproves that theory. Sorting by logical
> properties of the object is better.


Sorting by OID was a reasonable approximation, for its time.

> > Adding an assert found a total of seven affected object types.
> > See the second attached patch.  The drawback is storing five more fields in
> > pg_dump memory: oprleft, oprright, opcmethod, opfmethod, and collencoding.
> > That seems minor relative to existing pg_dump memory efficiency.  Since this
> > is a source of test flakes in v18, I'd like to back-patch to v18.  I'm not
> > sure why the buildfarm hasn't seen the above diff, but I expect the diff 
> > could
> > happen there.  This is another nice win for the new test from commit
> > 172259afb563d35001410dc6daad78b250924038.  The order instability was always
> > bad for users, but the test brought it to the forefront.  One might argue 
> > for
> > back-patching $SUBJECT further, too.
> 
> I agree with back-patching it at least as far as v18. I think it
> probably wouldn't hurt anything to back-patch further, and it might
> avoid future buildfarm failures. Against that, there's a remote
> possibility that someone who is currently saving pg_dump output for
> later comparison, say in a case where OIDs are always stable in
> practice, could be displeased to see the pg_dump order change in a
> minor release. But that seems like a very weak argument against
> back-patching. I can't see us ever deciding to put up with buildfarm
> instability on such grounds.

Thanks for reviewing.  I agree with those merits of back-patching further.  A
back-patch to v13 has no pg_dump_sort.c conflicts, while pg_dump.c has
mechanical conflicts around retrieving the extra sort inputs.  If there are no
objections in the next 72h, I'll likely back-patch.

> Reviewing:
> 
> + * Sort by name.  This differs from "Name:" in plain format output, which
> + * is a _tocEntry.tag.  For example, DumpableObject.name of a constraint
> + * is pg_constraint.conname, but _tocEntry.tag of a constraint is relname
> + * and conname joined with a space.
> 
> This comment is useful, but if I were to be critical, it does a better
> job saying what this field isn't than what it is.

True.  I've changed it to this:

         * Sort by name.  With a few exceptions, names here are single catalog
         * columns.  To get a fuller picture, grep pg_dump.c for "dobj.name = ".
         * Names here don't match "Name:" in plain format output, which is a
         * _tocEntry.tag.  For example, DumpableObject.name of a constraint is
         * pg_constraint.conname, but _tocEntry.tag of a constraint is relname 
and
         * conname joined with a space.

The patch's original change to the comment was a reaction to my own surprise.
Reading "pg_dump regression|grep Name:|sort|grep CONSTRAINT" I saw unique
"Name:" output for constraints, which felt at odds with the instability in
DOTypeNameCompare() sorting them.  But it wasn't the name I was looking for:

- getConstraints() sets DumpableObject.name = conname
- DOTypeNameCompare() sorts by DumpableObject.name
- dumpConstraint() sets _tocEntry.tag = "relname conname"
- _tocEntry.tag becomes the "Name:" in pg_dump output

Long-term, in a post-scarcity world, I'd do one of these or similar:

a. Change what we store in DumpableObject.name so it matches _tocEntry.tag.
b. Rename field DumpableObject.name, so there's no longer a field called
   "name" with contents different from the "Name:" values in pg_dump output.

> + * Sort by encoding, per pg_collation_name_enc_nsp_index.  This is
> + * mostly academic, because CREATE COLLATION has restrictions to make
> + * (nspname, collname) uniquely identify a collation within a given
> + * DatabaseEncoding.  pg_import_system_collations() bypasses those
> + * restrictions, but pg_dump+restore fails after a
> + * pg_import_system_collations('my_schema') that creates collations
> + * for a blend of encodings.
> 
> This comment is also useful, but if I were to be critical again, it
> does a better job saying why we shouldn't do what the code then does
> than why we should.

I've tried the following further refinement.  If it's worse, I can go back to
the last version.

diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index ffae7b3..f7d6a03 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -248,7 +250,19 @@ DOTypeNameCompare(const void *p1, const void *p2)
        if (cmpval != 0)
                return cmpval;
 
-       /* To have a stable sort order, break ties for some object types */
+       /*
+        * To have a stable sort order, break ties for some object types.  Most
+        * catalogs have a natural key, e.g. pg_proc_proname_args_nsp_index.
+        * Where the above "namespace" and "name" comparisons don't cover all
+        * natural key columns, compare the rest here.
+        *
+        * The natural key usually refers to other catalogs by surrogate keys.
+        * Hence, this translates each of those references to the natural key of
+        * the referenced catalog.  That may descend through multiple levels of
+        * catalog references.  For example, to sort by pg_proc.proargtypes,
+        * descend to each pg_type and then further to its pg_namespace, for an
+        * overall sort by (nspname, typname).
+        */
        if (obj1->objType == DO_FUNC || obj1->objType == DO_AGG)
        {
                FuncInfo   *fobj1 = *(FuncInfo *const *) p1;
@@ -312,13 +326,16 @@ DOTypeNameCompare(const void *p1, const void *p2)
                CollInfo   *cobj2 = *(CollInfo *const *) p2;
 
                /*
-                * Sort by encoding, per pg_collation_name_enc_nsp_index.  This 
is
-                * mostly academic, because CREATE COLLATION has restrictions 
to make
-                * (nspname, collname) uniquely identify a collation within a 
given
-                * DatabaseEncoding.  pg_import_system_collations() bypasses 
those
-                * restrictions, but pg_dump+restore fails after a
-                * pg_import_system_collations('my_schema') that creates 
collations
-                * for a blend of encodings.
+                * Sort by encoding, per pg_collation_name_enc_nsp_index.  
Wherever
+                * this changes dump order, restoring the dump fails anyway.  
CREATE
+                * COLLATION can't create a tie for this to break, because it 
imposes
+                * restrictions to make (nspname, collname) uniquely identify a
+                * collation within a given DatabaseEncoding.  While
+                * pg_import_system_collations() can create a tie, 
pg_dump+restore
+                * fails after pg_import_system_collations('my_schema') does so.
+                * There's little to gain by ignoring one natural key column on 
the
+                * basis of those limitations elsewhere, so respect the full 
natural
+                * key like we do for other object types.
                 */
                cmpval = cobj1->collencoding - cobj2->collencoding;
                if (cmpval != 0)

Re: Test instability when pg_dump orders by OID

Reply via email to