On Sun, Jun 14, 2026 at 02:23:54PM +0500, Andrey Borodin wrote:
> The same revision also closes an unrelated init race: when two processes
> create the backing file at once, the loser could map it before the winner
> sized it (SIG-something on first touch). The attach path now waits for the
> file to reach full size before mapping.
> 
> CI is green. Same two patches on top of your atomics commit. Let me know
> if this prototype goes radically wrong way.

You may find my reply surprising (or not), but using a client tool to
bypass the SQL protocol is super invasive in terms of the in-core
changes, and I'm -1 on that.

Test tooling should rely on simple facilities, and you are
re-implementing quite a few things here that come with a new class of
bugs and what look like design issues to me due to the invasiveness:
- Reimplementation of the registry protocol for file mapping.  That
may be useful for other things, and there may be use for a refactored
in-core API, but I don't see why we should use it here:
- Mirroring of internal structs for shmem manipulation.
- Core data updates with zero locking.

I am wondering if we are not overcomplicating things here..  How about
this idea instead of 0002 and 0003, for paths that cannot rely on some
SQL:
- Let's use a file-based markup, located in a injection_points/ in the
data folder.
- When attaching a wait point, write a marker, say wait_$POINT.
- On wakeup, write a wakeup_$POINT. 
- The wait routine does periodic checks of the wakeup_$POINT file,
using a stat().

It depends on how much we want to achieve, but forcing a postmaster to
wait at an early startup sequence would then be something like:
- Add the a wait markup.
- At startup, shared_preload_libraries scans the pg_injection_points/
repo, fills in its shmem state by calling InjectionPointAttach().
- postmaster hits the injection point, calls injection_wait()
- Test does what it wants while the postmaster waits, manipulates
states.
- Test script drops a wakeup file.
- Postmaster sees the file, continues its startup sequence.
This means one cannot set an injection point until
shared_preload_libraries is loaded, of course.  Cleanup logic feels
kind of nice: test cleans his stuff, or just remove
pg_injection_points/ at shutdown with an exit callback.  No platform
specific logic, only FS checks.

This means a bit higher latency, but it eliminates a full class of
bugs due to the fact that it is simpler.  This would need to live
alongside patch 0001 so as a _PG_init() can be loaded when the
injection_points lib is loaded; we're still going to need it to fill
the shmem info with the wait slot being occupied.
--
Michael

Attachment: signature.asc
Description: PGP signature

Reply via email to