On Wed, Feb 14, 2007 at 11:34:47PM +0100, Peter Kovacs wrote: > The post I quoted also says: > > "That should not happen; it should always be possible to detect whether > the file is stale, *if* your start script is written correctly." > > "That" in the quote refers to my description: > "On system startup PostgreSQL 8.1.4 refuses to start due to the pid > file is [being] left over from [a] previous "session" on Solaris 10 > x86."
Ah, I hadn't noticed that this was in fact a problem you had experienced. > Nothing did "actually break", but something happened which should not > have happened if my "start script is [had been] written correctly". No, something did break then. > The issue here is: > Does the SMF setup described in the referenced Sun document make sure > that the postmaster process will start and become functional without > human intervention even if there is a pid file left over from a > previously crashed (abruptly terminated) postmaster process? I don't know. But reading Tom Lane's reply to you I don't see what our SMF manifest and start method are doing wrong. Hmm, let's see, your log file said: LOG: 00000: shutting down LOCATION: ShutdownXLOG, xlog.c:5031 <I infer the restart is here> FATAL: 58P01: could not remove old lock file "postmaster.pid": No such file or directory HINT: The file seems accidentally left over, but it could not be removed. Please remove the file by hand and try again. The hint and the fatal message come the code that Tom Lane quoted. This means that the exclusive create open(2) of the lock file failed with EEXIST or EACCES, no other process was found with the pid in the lock file and then when PostgreSQL went to remove the lockfile the lockfile was gone! What removed the lockfile, and why should PostgreSQL complain at that point instead of just looping and trying again? Nico --