On Tue, Dec 10, 2013 at 6:26 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Noah Misch <n...@leadboat.com> writes: >> On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote: >>> Let's not add more cases like that, if we can avoid it. > >> Only if we can avoid it for a modicum of effort and feature compromise. >> You're asking for PostgreSQL to reshape its use of persistent resources so >> you >> can throw around "killall -9 postgres; rm -rf $PGDATA" without so much as a >> memory leak. That use case, not PostgreSQL, has the defect here. > > The larger point is that such a shutdown process has never in the history > of Postgres been successful at removing shared-memory (or semaphore) > resources. I do not feel a need to put a higher recoverability standard > onto the DSM code than what we've had for fifteen years with SysV shmem. > > But, by the same token, it shouldn't be any *less* recoverable. In this > context that means that starting a new postmaster should recover the > resources, at least 90% of the time (100% if we still have the old > postmaster lockfile). I think the idea of keeping enough info in the > SysV segment to permit recovery of DSM resources is a good one. Then, > any case where the existing logic is able to free the old SysV segment > is guaranteed to also work for DSM.
So I'm taking a look at this. There doesn't appear to be anything intrinsically intractable here, but it seems important to time the order of operations so as to minimize the chances that things fail in awkward places. The point where we remove the old shared memory segment from the previous postmaster invocation is here: /* * The segment appears to be from a dead Postgres process, or from a * previous cycle of life in this same process. Zap it, if possible. * This probably shouldn't fail, but if it does, assume the segment * belongs to someone else after all, and continue quietly. */ shmdt(memAddress); if (shmctl(shmid, IPC_RMID, NULL) < 0) continue; My first thought was to remember the control segment ID from the header just before the shmdt() there, and then later when the DSM module is starting, do cleanup. But that doesn't seem like a good idea, because then if startup fails after we remove the old shared memory segment and before we get to the DSM initialization code, we'll have lost the information on what control segment needs to be cleaned up. A subsequent startup attempt won't see the old shm again, because it's already gone. I'm fairly sure that this would be a net reduction in reliability vs. the status quo. So now what I'm thinking is that we ought to actually perform the DSM cleanup before detaching the old segment and trying to remove it. That shouldn't fail, but if it does, or if we get killed before completing it, the next run will hopefully find the same old shm and finish the cleanup. But that kind of flies in the face of the comment above: if we perform DSM cleanup and then discover that the segment wasn't ours after all, that means we just stomped all over somebody else's stuff. That's not too good. But trying to remove the segment first and then perform the cleanup creates a window where, if we get killed, the next restart will have lost information about how to finish cleaning up. So it seems that some kind of logic tweak is needed here, but I'm not sure exactly what. As I see it, the options are: 1. Make failure to remove the shared memory segment we thought was ours an error. This will definitely show up any problems, but only after we've burned down some other processes's dynamic shared memory segments. The most likely outcome is creating failure-to-start problems that don't exist today. 2. Make it a warning. We'll still burn down somebody else's DSMs, but at least we'll still start ourselves. Sadly, "WARNING: You have just done a bad thing. It's too late to fix it. Sorry!" is not very appealing. 3. Remove the old shared memory segment first, then perform the cleanup immediately afterwards. If we get killed before completing the cleanup, we'll leak the un-cleaned-up stuff. Decide that's OK and move on. 4. Adopt some sort of belt-and-suspenders approach, keeping the state file we have now and backstopping it with this mechanism, so that we really only need this to work when $PGDATA has been blown away and recreated. This seems pretty inelegant, and I'm not sure who it'll benefit other than those (few?) people who kill -9 the postmaster (or it segfaults or otherwise dies without running the code to remove shared memory segments) and then remove $PGDATA and then re-initdb and then expect cleanup to find the old DSMs. 5. Give up on this approach. We could keep what we have now, or make the DSM control segment land at a well-known address as we do for the main segment. What do people prefer? The other thing I'm a bit squidgy about is injecting more code that runs before we get the new shared memory segment up and running. During that time, we don't have effective mutual exclusion, so it's possible that someone else could be trying to do cleanup at the same time we are. That should be OK as far as the DSM code itself is concerned, but there may be other downsides to lengthening the period of time that's required to get mutual exclusion in place that I should be worrying about. I thought about trying to handle this by destroying the old main shared memory segment (memorizing the handle) and then creating the new one and immediately inserting the old handle into it. Then, the DSM cleanup can happen much later, and if we die in the meanwhile, the next startup will find our new main shared memory segment but it'll still be pointing to the old control segment, so cleanup should still work. But there's still a window between the time when we destroy the old main shared memory segment and the time when we create the new one where we leak if we die unexpectedly. In particular, when creating the new segment fails, we're stuck. I don't really see a way to salvage this approach. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers