On Mon, Aug 27, 2012 at 09:59:10PM -0400, Tom Lane wrote: > Bruce Momjian <br...@momjian.us> writes: > > On Mon, Aug 27, 2012 at 07:39:35PM -0400, Tom Lane wrote: > >> I could get behind that, but I don't think the delay should be more than > >> 100ms or so. > > > I took Alvaro's approach of a sleep. The file test was already in a > > loop that went 100 times. Basically, if the lock file exists, this > > postmaster isn't going to succeed, so I figured there is no reason to > > rush in the testing. I gave it 5 tries with one second between > > attempts. Either the file is being populated, or it is stale and empty. > > How did "100ms" translate to 5 seconds?
That was the "no need to rush, let's just be sure of what we report". > > I checked pg_ctl and that has a default wait of 60 second, so 5 seconds > > to exit out of the postmaster should be fine. > > pg_ctl is not the only consideration here. In particular, there are a > lot of initscripts out there (all of Red Hat's, for instance) that don't > use pg_ctl and expect the postmaster to come up (or not) in a couple of > seconds. > > I don't see a need for more than about one retry with 100ms delay. > There is no evidence that the case we're worried about has ever occurred > in the real world anyway, so slowing down error failures to make really > really really sure there's not a competing postmaster doesn't seem like > a good tradeoff. > > I'm not terribly impressed with that errhint, either. I am concerned at 100ms that we can't be sure if it is still being created, and if we can't be sure, I am not sure there is much point in trying to clarify the odd error message we omit. FYI, here is what the code does now with a zero-length pid file, with my patch: $ postmaster [ wait 5 seconds ] FATAL: lock file "postmaster.pid" is empty HINT: Empty lock file probably left from operating system crash during database startup; file deletion suggested. $ pg_ctl start pg_ctl: invalid data in PID file "/u/pgsql/data/postmaster.pid" $ pg_ctl -w start pg_ctl: invalid data in PID file "/u/pgsql/data/postmaster.pid" Seems pg_ctl would also need some cleanup if we change the error message and/or timing. I am thinking we should just change the error message in the postmaster and pg_ctl to say the file is empty, and call it done (no hint message). If we do want a hint, say that either the file is stale from a crash or another postmaster is starting up, and let the user diagnose it. -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers