Re: [HACKERS] postmaster recovery and automatic restart suppression

Fujii Masao Wed, 17 Jun 2009 00:37:51 -0700

Hi,

On Wed, Jun 17, 2009 at 12:22 AM, Czichy, Thoralf (NSN -
FI/Helsinki)<thoralf.czi...@nsn.com> wrote:
> [STONITH is not always best strategy if failures can be declared as
> user-space software problem only, limit STONITH to HW/OS failures]
>
> The isolation of the failing Postgres instance does not require a
> STONITH
> - mainly as there's also other software running on the same node that
> we'd
> not want to automatically switchover (e.g. because it takes longer to do
> or
> the functionality is more critical or less critical). Also we generally
> trust
> the HW, OS kernel and cluster middleware to behave correctly . These
> functions
> also follow the principle of fail-fast-and-safe. This trust might be an
> assumption that not everybody agrees with, though. So, if the failure
> originated
> from HW/OS/Clusterware it clearly is a STONITH situation, but if it's a
> user-space problem - the default assumption is that isolation can be
> implemented on
> OS-level and that's a guarantee that the clusterware gives (using a
> separate
> Quorum mechanism to avoid split-brain situations).


HW-level STONITH seems to be too much for your case. How about making
your HA-middleware shut the dying postgres down before doing switchover
by using (for example) "pg_ctl -mi stop"? In this case, other
softwares can still
keep on running on the original node after switchover.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postmaster recovery and automatic restart suppression

Reply via email to