Re: [HACKERS] Dynamic Shared Memory stuff

Robert Haas Wed, 18 Dec 2013 09:23:13 -0800

On Tue, Dec 10, 2013 at 6:26 PM, Tom Lane <[email protected]> wrote:
> Noah Misch <[email protected]> writes:
>> On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote:
>>> Let's not add more cases like that, if we can avoid it.
>
>> Only if we can avoid it for a modicum of effort and feature compromise.
>> You're asking for PostgreSQL to reshape its use of persistent resources so 
>> you
>> can throw around "killall -9 postgres; rm -rf $PGDATA" without so much as a
>> memory leak.  That use case, not PostgreSQL, has the defect here.
>
> The larger point is that such a shutdown process has never in the history
> of Postgres been successful at removing shared-memory (or semaphore)
> resources.  I do not feel a need to put a higher recoverability standard
> onto the DSM code than what we've had for fifteen years with SysV shmem.
>
> But, by the same token, it shouldn't be any *less* recoverable.  In this
> context that means that starting a new postmaster should recover the
> resources, at least 90% of the time (100% if we still have the old
> postmaster lockfile).  I think the idea of keeping enough info in the
> SysV segment to permit recovery of DSM resources is a good one.  Then,
> any case where the existing logic is able to free the old SysV segment
> is guaranteed to also work for DSM.


So I'm taking a look at this.  There doesn't appear to be anything
intrinsically intractable here, but it seems important to time the
order of operations so as to minimize the chances that things fail in
awkward places.  The point where we remove the old shared memory
segment from the previous postmaster invocation is here:

        /*
         * The segment appears to be from a dead Postgres process, or from a
         * previous cycle of life in this same process.  Zap it, if possible.
         * This probably shouldn't fail, but if it does, assume the segment
         * belongs to someone else after all, and continue quietly.
         */
        shmdt(memAddress);
        if (shmctl(shmid, IPC_RMID, NULL) < 0)
            continue;

My first thought was to remember the control segment ID from the
header just before the shmdt() there, and then later when the DSM
module is starting, do cleanup.  But that doesn't seem like a good
idea, because then if startup fails after we remove the old shared
memory segment and before we get to the DSM initialization code, we'll
have lost the information on what control segment needs to be cleaned
up.  A subsequent startup attempt won't see the old shm again, because
it's already gone.  I'm fairly sure that this would be a net reduction
in reliability vs. the status quo.

So now what I'm thinking is that we ought to actually perform the DSM
cleanup before detaching the old segment and trying to remove it.
That shouldn't fail, but if it does, or if we get killed before
completing it, the next run will hopefully find the same old shm and
finish the cleanup.  But that kind of flies in the face of the comment
above: if we perform DSM cleanup and then discover that the segment
wasn't ours after all, that means we just stomped all over somebody
else's stuff.  That's not too good. But trying to remove the segment
first and then perform the cleanup creates a window where, if we get
killed, the next restart will have lost information about how to
finish cleaning up.  So it seems that some kind of logic tweak is
needed here, but I'm not sure exactly what.  As I see it, the options
are:

1. Make failure to remove the shared memory segment we thought was
ours an error.  This will definitely show up any problems, but only
after we've burned down some other processes's dynamic shared memory
segments.  The most likely outcome is creating failure-to-start
problems that don't exist today.

2. Make it a warning.  We'll still burn down somebody else's DSMs, but
at least we'll still start ourselves.  Sadly, "WARNING: You have just
done a bad thing.  It's too late to fix it.  Sorry!" is not very
appealing.

3. Remove the old shared memory segment first, then perform the
cleanup immediately afterwards.  If we get killed before completing
the cleanup, we'll leak the un-cleaned-up stuff.  Decide that's OK and
move on.

4. Adopt some sort of belt-and-suspenders approach, keeping the state
file we have now and backstopping it with this mechanism, so that we
really only need this to work when $PGDATA has been blown away and
recreated.  This seems pretty inelegant, and I'm not sure who it'll
benefit other than those (few?) people who kill -9 the postmaster (or
it segfaults or otherwise dies without running the code to remove
shared memory segments) and then remove $PGDATA and then re-initdb and
then expect cleanup to find the old DSMs.

5. Give up on this approach.  We could keep what we have now, or make
the DSM control segment land at a well-known address as we do for the
main segment.

What do people prefer?

The other thing I'm a bit squidgy about is injecting more code that
runs before we get the new shared memory segment up and running.
During that time, we don't have effective mutual exclusion, so it's
possible that someone else could be trying to do cleanup at the same
time we are.  That should be OK as far as the DSM code itself is
concerned, but there may be other downsides to lengthening the period
of time that's required to get mutual exclusion in place that I should
be worrying about.

I thought about trying to handle this by destroying the old main
shared memory segment (memorizing the handle) and then creating the
new one and immediately inserting the old handle into it.  Then, the
DSM cleanup can happen much later, and if we die in the meanwhile, the
next startup will find our new main shared memory segment but it'll
still be pointing to the old control segment, so cleanup should still
work.  But there's still a window between the time when we destroy the
old main shared memory segment and the time when we create the new one
where we leak if we die unexpectedly.  In particular, when creating
the new segment fails, we're stuck.  I don't really see a way to
salvage this approach.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

Reply via email to