Re: [ClusterLabs] Howto stonith in the case of any interface failure?

Ken Gaillot Thu, 10 Oct 2019 12:08:39 -0700

On Wed, 2019-10-09 at 20:10 +0200, Kadlecsik József wrote:
> On Wed, 9 Oct 2019, Ken Gaillot wrote:
> 
> > > One of the nodes has got a failure ("watchdog: BUG: soft lockup
> > > - 
> > > CPU#7 stuck for 23s"), which resulted that the node could
> > > process 
> > > traffic on the backend interface but not on the fronted one. Thus
> > > the 
> > > services became unavailable but the cluster thought the node is
> > > all 
> > > right and did not stonith it.
> > > 
> > > How could we protect the cluster against such failures?
> > 
> > See the ocf:heartbeat:ethmonitor agent (to monitor the interface
> > itself) 
> > and/or the ocf:pacemaker:ping agent (to monitor reachability of
> > some IP 
> > such as a gateway)
> 
> This looks really promising, thank you! Does the cluster regard it as
> a 
> failure when a ocf:heartbeat:ethmonitor agent clone on a node does
> not 
> run? :-)


If you configure it typically, so that it runs on all nodes, then a
start failure on any node will be recorded in the cluster status. To
get other resources to move off such a node, you would colocate them
with the ethmonitor resource.

> 
> Best regards,
> Jozsef
> --
> E-mail : kadlecsik.joz...@wigner.mta.hu
> PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
> Address: Wigner Research Centre for Physics
>          H-1525 Budapest 114, POB. 49, Hungary
> ______________________________________________
-- 
Ken Gaillot <kgail...@redhat.com>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Howto stonith in the case of any interface failure?

Reply via email to