On 8 March 2018 at 10:18, Andres Freund <and...@anarazel.de> wrote:
> On March 7, 2018 5:51:29 PM PST, Craig Ringer <cr...@2ndquadrant.com>
> >My favourite remains an organisation that kept "fixing" an issue by
> >-9'ing the postmaster and removing postmaster.pid to make it start up
> >again. Without killing all the leftover backends. Of course, the system
> >kept getting more unstable and broken, so they did it more and more
> >They were working on scripting it when they gave up and asked for help.
> Maybe I'm missing something, but that ought to not work. The shmem segment
> that we keep around would be a conflict, no?
As I understand it, because we allow multiple Pg instances on a system, we
identify the small sysv shmem segment we use by the postmaster's pid. If
you remove the DirLockFile (postmaster.pid) you remove the interlock
against starting a new postmaster. It'll think it's a new independent
instance on the same host, make a new shmem segment and go merrily on its
way mangling data horribly.
See CreateLockFile(). Also 7e2a18a9161 . In
particular src/backend/utils/init/miscinit.c +938,
if (PGSharedMemoryIsInUse(id1, id2))
errmsg("pre-existing shared memory block "
"(key %lu, ID %lu) is still in use",
errhint("If you're sure there are no old "
"server processes still running,
"the shared memory block "
"or just delete the file \"%s\".",
I still think that error is a bit optimistic, and should really say "make
very sure there are no 'postgres' processes associated with this data
directory, then ...'
It'd be nice if the OS offered us some support here. Something like opening
a lockfile in exclusive lock mode, then inheriting the FD and lock on all
children, with each child inheriting the lock. So the exclusive lock
wouldn't get released until all FDs associated with it are released. But
AFAIK nothing like that is present, let alone portable.
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services