Hi, On Wed, Jun 17, 2009 at 12:22 AM, Czichy, Thoralf (NSN - FI/Helsinki)<thoralf.czi...@nsn.com> wrote: > [STONITH is not always best strategy if failures can be declared as > user-space software problem only, limit STONITH to HW/OS failures] > > The isolation of the failing Postgres instance does not require a > STONITH > - mainly as there's also other software running on the same node that > we'd > not want to automatically switchover (e.g. because it takes longer to do > or > the functionality is more critical or less critical). Also we generally > trust > the HW, OS kernel and cluster middleware to behave correctly . These > functions > also follow the principle of fail-fast-and-safe. This trust might be an > assumption that not everybody agrees with, though. So, if the failure > originated > from HW/OS/Clusterware it clearly is a STONITH situation, but if it's a > user-space problem - the default assumption is that isolation can be > implemented on > OS-level and that's a guarantee that the clusterware gives (using a > separate > Quorum mechanism to avoid split-brain situations).
HW-level STONITH seems to be too much for your case. How about making your HA-middleware shut the dying postgres down before doing switchover by using (for example) "pg_ctl -mi stop"? In this case, other softwares can still keep on running on the original node after switchover. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers