Re: [HACKERS] Questions about pid file creation code

Zdenek Kotala Tue, 03 Apr 2007 09:44:03 -0700

Tom Lane wrote:

Zdenek Kotala <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
Just to distinguish postmasters from standalone backends in the error
messages.  I think that's still useful.
I'm not sure what you mean. It is used only in CreatePidFile functionand I think that if directory is locked by some process, I don't see anyuseful reason to know if it is postmaster or standalone backend.
You don't?  Consider the decisions the user needs to take upon seeing
the message --- should he kill that other process or not, and if so how?
Knowing whether it's a postmaster seems pretty important to me.

If somebody want to kill some process he must know what he want to do.How many postgres user know what is different between postmaster andpostgres in error message?

And other problem. If another application (e.g. pg_migrator) want tolock this directory to prevent data corruption. How shall it do that?How big sense have this message in this case?


I suggest to remove this behavior and modify message.

Yes there are. But it does not sense for me. If I want to open file andanother process remove it, why I want to try created it again whenanother process going to do it?
That could be the track of another postmaster just now shutting down.
There's no reason to fail to start in such a scenario.  The looping
logic is necessary anyway (to guard against races involving two
postmasters trying to start at the same time), so we might as well let
it handle this case too.

Ok. I now understand (I hope) what this loop try to handle. However, Ifone server go down and another go up there is only really small timepiece between first open attempt and second one. I guess in this case wecan say stop to the startup postmaster. For me it is better then makeone hundred loops depend on cpu speed and recheck it again. I thinkthat in this case postgres doubled role of startup scripts.

There is also another issue which can occur. If you have two node withaccess to one shared filesystem. One node is for backup and somebody runpostgres on second node. In this case postgres remove file and createown and two postgres on one dbcluster is not good idea. Good clustersolution protect this situation, but it can happen if somebody run itmanually.

I'm sorry, I meant why there is a pid cleanup which stays there afteranother postmaster crash. Many application only check OK there is somepid file -> exit. And rest is on start script or some other monitoringfacility.
The start script does not typically have the intelligence to get this
right, particularly not the is-shmem-still-in-use part.  If you check
the archives you will find many of us on record telling people who think
they should remove the pidfile in their start script that they're crazy.

It is true, but question is what way is better. Keep all logic inpostmaster or improve pg_ctl to share more information and keepresponsibility on start scripts or monitoring tool which has moreinformation about system as complex.

It's not actually trying to validate the syntax of the lock file, only
to make certain it doesn't trigger any unexpected behavior in kill().

I not sure if we talk about same place.


Yes, we are.  Read the kill(2) man page and note the special behaviors
for pid = 0 or -1.  The test is just trying to be darn certain we don't
invoke those behaviors.


No we don't :-). I mean code few lines up after atoi().

        with regards Zdenek



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Questions about pid file creation code

Reply via email to