>>> On Fri, Aug 31, 2007 at 2:18 PM, in message <[EMAIL PROTECTED]>,
Tom Lane <[EMAIL PROTECTED]> wrote:
> "Kevin Grittner" <[EMAIL PROTECTED]> writes:
>> It appears that when pg_ctl gets a stop request for a given directory, it l=
>> ooks for a pid file in that directory and signals that pid to stop. It doe=
>> sn't appear to check that the pid is for a PostgreSQL postmaster running ou=
>> t of the given directory. I think it should, although on a quick scan of t=
>> he code, I didn't see a convenient way to do that.
> [ shrug... ] AFAICS there is no way to know that.
I sure couldn't see a way, but I was hoping that was just a matter of my own
>> I have some evidence that when we attempted to stop a PostgreSQL instance w=
>> hich (it turned out) had died without cleaning up the pid file, it actually=
>> stopped another instance which was using a different data directory but ha=
>> d wrapped around to the same pid.
> The real question there is how come the postmaster died without removing
> the pidfile. It's not that easy to crash the postmaster ...
Well, that's not due to a bug in PostgreSQL. We're using a buggy LDAP
implementation (not my call) which can crash things. The machine totally
locked up after logging distress messages from that daemon, and they cycled
power to get out of it.
The PostgreSQL issue here was a secondary problem in trying to get the
server back to normal. So really, what I was suggesting was something to
improve the robustness of PostgreSQL in the face of severe challenges posed
by other issues. I realize it's a very low volume issue; if it's not easy
to fix, probably not worth it.
Now to bug the people on the list of authorized contacts for Novell to open
a support case on the LDAP problems, and see how many of the 40 core dumps
I have from their daemon they want to see.
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings