On 19/03/2026 15:36, Robert Haas wrote:
Without taking a strong position on this particular design idea, I
kind of wonder if we should be going the opposite direction: instead
of bundling more and more things into descriptors, try to let people
write declarative code that does whatever it is that they want to do.
I think descriptors are pretty limiting as a concept, because they can
only do the exact things that they're designed to do. For instance, I
find the change to use LWLockPadded rather than LWLock * in
pgssSharedState to be a clear improvement, because I'd rather have
fewer objects and less pointer-chasing. Now, that LWLockPadded is
going to need to be initialized. I would rather do that by writing
LWLockInitialize(&pgss->lock.lock, tranche_id) as you did than by
adding something to a descriptor, since doing the latter is almost
certainly going to be a less intuitive syntax for the same thing and
I'm going to have to spend time verifying that whatever locution I'm
compelled to write actually does what I want. And if somebody adds
"really light weight locks" to the system then we'll have to add
RLWLockInitialize() to the things that the descriptor system knows how
to do, and if for some reason I want to do
LWLockInitialize(&mythingy->lock.lock, some_random_condition ?
this_tranche_id : that_tranche_id), the descriptor system will
probably become an annoying straightjacket. Or if that example isn't
compelling enough, imagine that I have an array of structs each of
which contains an LWLockPadded and I need to go loop over the array
and do all the initializations. Or maybe space is at a premium and I
want to use LWLock rather than LWLockPadded. Or maybe something else.
Code is just more flexible than having to go through descriptors,
which is why a lot of modern languages go to a great deal of trouble
to make closures a first-class concept.
Sure, descriptors are restricting. I'm not proposing to move everything
into descriptors.
Let me take a step back and say what I think the problems in this area
are, to see whether we agree on the basics. I suspect the reason
you've undertaken this project is the fact that, currently, requesting
shared memory and allocating it are totally decoupled. The number of
bytes you request and the number of bytes you actually allocated could
be totally different, and then some completely unrelated subsystem can
break because it's allocating last, and even though it requested the
bytes it wanted to allocate, some other subsystem under-requested and
now the bytes this subsystem wants are not actually available.
Tracking this kind of thing down can be a giant pain in the rear end,
bordering on impossible IME. Also, if we want to be able to resize
stuff in shared memory in some happy future, the need for precise
tracking of these sorts of things presumably goes way up, although the
exact details of that are not altogether clear to me. Furthermore, as
you point out, even if everyone behaves themselves and requests and
allocates the same number of bytes, that's still annoying if it means
redoing some computation.
Agreed, yes, that's what I'm trying to address.
I think the answer to this problem is to make requests into named
objects. You're not allowed to request a number of BYTES of shared
memory any more; you have to request "the shared memory bytes for the
object named XYZ". So instead of
RequestAddinShmemSpace(pgss_memsize()), you would say something like
RequestAddinShmemSpace("pg_stat_staements", pgss_memsize()) and then
later instead of saying pgss = ShmemInitStruct("pg_stat_statements",
sizeof(pgssSharedState), &found), you say pgss =
ShmemInitStruct("pg_stat_statements", &size, &found).
Yeah, that's a little better than what we have today.
The other big problem that I think we have in this area is that it's
unclear what you're allowed to do in _PG_init() vs. some other
callback, and sometimes you need IsUnderPostmaster checks or
IsPostmasterEnvironment checks or
process_shared_preload_libraries_in_progress checks. From my point of
view, good goals would include (1) moving as much logic as possible
into _PG_init() vs. having to put it elsewhere and (2) removing as
many conditional checks as possible from it and aiming for _PG_init()
functions that just run from start to finish in all cases.
Agreed.
What _PG_init() already does pretty well is allow you to do
per-backend setup. For instance, pg_plan_advice needs no shared
resources, so _PG_init() was easy to write and, IMHO, easy to read.
It's requesting diverse types of resources -- GUCs, an EXPLAIN
extension ID, an EXPLAIN option, and hooks, but it can just do all of
those things one after another with no conditional logic and, IMHO,
life is great. We fall down a little bit because of the fact that
PGC_POSTMASTER GUCs can't be added after startup -- see autoprewarm.c,
for instance, which calls out that problem implicitly; and ...
Agreed.
I suspect
that issue is also why pg_stat_statements has the
process_shared_preload_libraries_in_progress check at the top, because
it looks to me like everything else that the function does would be
completely fine to do later.
I think a bigger problem is loading and saving the statistics file. The
file needs to be saved on postmaster exit, where do you do that if the
library was not in shared_preload_libraries?
Shared resources do require some split-up of initialization: as you
point out, if _PG_init() is called before we know MaxBackends, then we
can't even size data structures who size depends on that quantity yet,
and we certainly can't initialize anything, because shared memory
might not have been created yet. I don't think we can completely avoid
the need for callbacks here, but... just spitballing, how about
something like this:
extern void DefineShmemRegion(char *name, size_t size, void
**localptr, void (*init_callback)(void *), int flags);
extern void DefineShmemRegionDynamic(char *name, size_t
(*sizing_callback)(void *), void **localptr, void
(*init_callback)(void *), int flags);
extern void *GetShmemRegion(char *name);
#define DSR_FLAGS_NO_SLOP 0x01
#define DSR_FLAGS_DSA_OK 0x02
If DefineShmemRegion() or DefineShmemRegionDynamic() is called at
shared_preload_libraries time, it arranges to increase the size of the
main shared memory segment by the given amount, or the computed amount
(for things that depend on MaxBackends). Then, once the main shared
memory segment is created, it invokes the init_callback and sets
*localptr. If either of these functions are called after the main
shared memory segment has been created, they check for an existing
allocation, and if one is found, they just set *localptr. (They can
actually probably exit quickly if *localptr is already set.)
If you squint a little, this is pretty much the same as my descriptor
design. Let's start from your DefineShmemRegion function, but in order
to have some flexibility to add optional optional in the future, without
having to create DefineShmemRegionNew(), DefineShmemRegionExt() or
similar functions, let's pass the arguments in a struct. So instead of:
DefineShmemRegion("pg_stat_statements", sizeof(pgssSharedState), &pgss,
&pgss,pgss_shmem_init, 0);
you would call it like this:
DefineShmemRegion(&(ShmemStructDesc) {
.name = "pg_stat_statements",
.size = sizeof(pgssSharedState),
.ptr = (void **) &pgss,
.init_fn = pgss_shmem_init,
.flags = 0,
});
This flexibility will come handy as soon as we add the ability to resize.
Now if you rename DefineShmemRegion() to ShmemRegisterStruct(), this is
equivalent to my descriptor design. (One detail is whether you require
the descriptor struct to stay around after the call, or if
DefineShmemRegion/ShmemRegisterStruct will copy all the information)
Otherwise, they try to allocate from DSA if DSR_FLAGS_DSA_OK is given,
or else from the slop space unless DSR_FLAGS_NO_SLOP is given.
Ok, falling back to DSA is a new idea. You could implement that in
either design.
If that
works, they then call the init_callback under a suitable lock and set
*localptr. If not, they fail silently. Functions that need to use the
local pointers do something like this:
if (unlikely(pgss == NULL))
pgss = GetShmemRegion("pg_stat_statements");
...which throws a suitable error -- not just a generic one that the
region doesn't exist, but something that's sensitive to different
failure conditions: the region was never registered, the region was
registered after shared_preload_libraries time and not enough slop
space remains, or whatever. GetShmemRegion() could even retry the
initialization in certain cases, e.g. if DSA is OK and we previously
were called too early in startup to try DSA, we can try now, or if DSA
allocation failed due to OOM, we can try again.
Ok, we could add GetShmemRegion() in either design. Do we have any place
where we'd use that though, instead the backend-private pointer global
variable? I can't think of any examples where we currently call
ShmemInitStruct() to get a pointer "on demand" like that.
In pg_stat_statements, this would replace these tests:
if (!pgss || !pgss_hash)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("pg_stat_statements must be loaded via
\"shared_preload_libraries\"")));
But I don't think pg_stat_statements could still allocate the region
after postmaster startup. GetShmemRegion() would just be a different way
of throwing that error.
I *think* this design gets rid of all the IsUnderPostmaster and
shared_preload_libraries_in_progress checks in individual subsystems,
and all the use of shmem startup hooks. You just ask for what you want
and if there's a way to get it, the system gives it to you, and if
there's not, it generates an error at the latest possible time, and
also tries to self-heal if that's reasonable. If you do load your
module in shared_preload_libraries, then by the time main shared
memory initialization completes in the postmaster, everyone's localptr
values (like pgss) will be initialized, but if it happens to be an
EXEC_BACKEND build, those same calls will also happen in every
postmaster child, and will automatically re-find the shared memory
areas and reinitialize all the pointers. If you load your module
later, your localptr values will hopefully be initialized by the end
of _PG_init(), but if that doesn't work out, then the
unlikely-protected calls to GetShmemRegion will produce suitable
errors at a suitable time. And I think it all works out nicely in a
standalone backend, too.
This is all kind of a brain dump and is not fully thought-through and
might be riddled with cognitive errors, but what I'm sort of trying to
convey is where I think the complexity in the current system comes
from (which is that we require every subsystem/extension author to
know how the sausage is made, and we don't enforce consistency between
requests and allocations) and what I don't really like about
descriptors as a solution (which is that they are harder to read than
imperative code and can interfere with cases where somebody wants to
do something slightly different than what the descriptor-designer had
in mind). I hope that some of it is helpful to you...
Thanks, this hasn't changed my opinions, but I really appreciate
pressure-testing the design. I don't want to rewrite this again in a
year, because we didn't get it quite right.
- Heikki