Re: Potential ABI breakage in upcoming minor releases

Noah Misch Fri, 29 Nov 2024 12:40:01 -0800

On Mon, Nov 25, 2024 at 08:56:50PM -0500, Tom Lane wrote:
> [ getting back to the document-ABI-breakage-rules-better topic ... ]
> 
> I wrote:
> > That text says exactly nothing about what specific code changes to
> > make or not make.  I'm not sure offhand where (or if) we have this
> > documented, but there's an idea that adding fields at the end of
> > a struct is safer ABI-wise than putting them in the middle.  Which
> > is true if you can't squeeze them into padding space.  Here, that
> > could have been done and probably should have.
> 
> I remembered where that's documented:
> 
> https://wiki.postgresql.org/wiki/Committing_checklist#Maintaining_ABI_compatibility_while_backpatching


I'd say the source tree's <sect2 id="xfunc-api-abi-stability-guidance">, which
David Wheeler mentioned, is authoritative.  It currently matches
postgr.es/c/e54a42a.  Let's update both or change that wiki heading to point
to the authoritative doc section.

> I propose rewriting and expanding that:
> 
> 
> * Don't change the signature of non-static functions.  One common
>   workaround is to change the existing function into a wrapper that
>   calls a new function with more/different arguments.

Sounds fine.

> * Don't change the contents of globally-visible structs, specifically
>   not the offsets of existing fields.  If you must add a new field,
>   the very best way is to put it into existing alignment padding
>   between fields.  (But consider both 32-bit and 64-bit cases when
>   deciding what is "padding".)  If that's not possible, it may be okay
>   to add the field at the end of the struct; but you have to worry
>   about whether any extensions depend on the size of the struct.
>   This will not work well if extensions allocate new instances of the
>   struct.  One particular gotcha is that it's not okay to change
>   sizeof(ResultRelInfo), as there are executor APIs that require
>   extensions to index into arrays of ResultRelInfos.

We should be able to avoid ResultRelInfo length changes, since there are other
executor structs to choose from.

I think this thread implies an open question about assert builds.  Whenever we
add to the end of a struct that extensions heap-allocate (makeNode() or
otherwise), assert builds will get "write past chunk end" warnings
(postgr.es/m/[email protected]).  Should we say that (a) assert
builds need to rebuild with every minor release, that (b) PostgreSQL will
treat this like it treats non-assert rebuild requirements, or something else?
If (a), we committers also need to confirm that each lengthening of an
extension-allocated struct doesn't change the resulting palloc chunk size.

Corresponding doc paragraphs:

      <para>
       When a change <emphasis>is</emphasis> required,
       <productname>PostgreSQL</productname> will choose the least invasive
       change possible, for example by squeezing a new field into padding
       space or appending it to the end of a struct. These sorts of changes
       should not impact extensions unless they use very unusual code
       patterns.
      </para>

      <para>
       In rare cases, however, even such non-invasive changes may be
       impractical or impossible. In such an event, the change will be
       carefully managed, taking the requirements of extensions into account.
       Such changes will also be documented in the release notes (<xref
       linkend="release"/>).
      </para>

I think those paragraphs aren't practical for packager needs, or they leave
too much undefined.  There's no available algorithm for classifying extensions
as "unusual".  timescaledb is objectively unusual, being the one extension
known to break in response to a ResultRelInfo size change in v17.  That
observation requires knowing what will change, so it wouldn't have helped a
packager classify in advance.  For non-unusual extensions, the second
paragraph implies packagers should check release notes on Monday, being ready
to rebuild by Thursday.  Assuming nobody wants to create a rebuild procedure
in 3 days, the packager will have a procedure already.  If they have a rebuild
procedure, they're better off rebuilding every time.  That way, they don't
need to scrutinize the release notes under time pressure, and they're
protected even if hackers overlook the incompatibility.  Those are the
incentives our docs create today, right or wrong.

I gather we want packagers to feel comfortable not rebuilding every time.  One
way to achieve that could be text like:

  If a change is suspected to require extension rebuilds in well-known
  extensions, a release at least 3 months before the breaking release will
  include "<specific phrase>" in its release notes.

Then packagers who are taking the risk of not rebuilding every time will have
3 months to prepare, not the 3 days we're currently giving.  The point about
"well-known extensions" is based on my practice of grepping PGXN.  That would
not have found timescaledb.  Should we name PGXN explicitly, or should we be
vague like that draft?  I'd be comfortable naming any number of repositories
that make it equally easy to bulk-download the whole repository.

How else might we make it safe for packagers to skip rebuilding extensions for
the majority of minor releases?

> * Add new enum entries at the end of the enum's list, so that existing
>   entries don't change value.
> 
> * These rules only apply to released branches.  In master, or in a new
>   branch that has not yet reached RC status, it's better to place new
>   fields and enum values in natural positions.  Changing function
>   signatures is fine too, unless there are so many callers that
>   a compatibility wrapper seems advisable.

Those two sound fine.

> > The other bit of documentation we probably need is some annotation in
> > struct ResultRelInfo saying "do not change the sizeof() this struct
> > in back branches, period".
> 
> Something like
> 
>     /*
>      * DO NOT change sizeof(ResultRelInfo) in a released branch.  This
>      * generally makes it unsafe to add fields here in a released branch.
>      */
> 
> at the end of struct ResultRelInfo's definition.

I agree with adding a comment to the end of the struct.  Thanks for writing
all this.  I'd also have that comment refer to ModifyTableState.resultRelInfo
specifically.  That way, if the ModifyTable implementation changes to remove
this hazard, it's easier to trace the ability to remove this comment.

> None of this is a substitute for installing some kind of ABI-checking
> infrastructure; but that project doesn't seem to be moving fast.

Right, they're complementary.  The documentation is about what ABI checker
complaints we choose to ignore.  An extension buildfarm is another
complementary idea that comes up occasionally.

Re: Potential ABI breakage in upcoming minor releases

Reply via email to