[ClusterLabs] host in standby causes havoc

2023-06-15 Thread Kadlecsik József
Hello, We had a strange issue here: 7 node cluster, one node was put into standby mode to test a new iscsi setting on it. During configuring the machine it was rebooted and after the reboot the iscsi didn't come up. That caused a malformed communication (atlas5 is the node in standby) with the

Re: [ClusterLabs] node utilization attributes are lost during upgrade

2020-08-18 Thread Kadlecsik József
Hi, On Mon, 17 Aug 2020, Ken Gaillot wrote: > On Mon, 2020-08-17 at 12:12 +0200, Kadlecsik József wrote: > > > > At upgrading a corosync/pacemaker/libvirt/KVM cluster from Debian > > stretch to buster, all the node utilization attributes were erased > > from the

[ClusterLabs] node utilization attributes are lost during upgrade

2020-08-17 Thread Kadlecsik József
Hello, At upgrading a corosync/pacemaker/libvirt/KVM cluster from Debian stretch to buster, all the node utilization attributes were erased from the configuration. However, the same attributes were kept at the VirtualDomain resources. This resulted that all resources with utilization attributes

Re: [ClusterLabs] Howto stonith in the case of any interface failure?

2019-10-10 Thread Kadlecsik József
On Wed, 9 Oct 2019, Digimer wrote: > > One of the nodes has got a failure ("watchdog: BUG: soft lockup - > > CPU#7 stuck for 23s"), which resulted that the node could process > > traffic on the backend interface but not on the fronted one. Thus the > > services became unavailable but the cluste

Re: [ClusterLabs] Howto stonith in the case of any interface failure?

2019-10-09 Thread Kadlecsik József
On Wed, 9 Oct 2019, Ken Gaillot wrote: > > One of the nodes has got a failure ("watchdog: BUG: soft lockup - > > CPU#7 stuck for 23s"), which resulted that the node could process > > traffic on the backend interface but not on the fronted one. Thus the > > services became unavailable but the cl

Re: [ClusterLabs] Howto stonith in the case of any interface failure?

2019-10-09 Thread Kadlecsik József
Hi, On Wed, 9 Oct 2019, Jan Pokorný wrote: > On 09/10/19 09:58 +0200, Kadlecsik József wrote: > > The nodes in our cluster have got backend and frontend interfaces: the > > former ones are for the storage and cluster (corosync) traffic and the > > latter ones are for the p

[ClusterLabs] Howto stonith in the case of any interface failure?

2019-10-09 Thread Kadlecsik József
Hello, The nodes in our cluster have got backend and frontend interfaces: the former ones are for the storage and cluster (corosync) traffic and the latter ones are for the public services of KVM guests only. One of the nodes has got a failure ("watchdog: BUG: soft lockup - CPU#7 stuck for 23s

Re: [ClusterLabs] Antw: Re: Antw: Re: Constant stop/start of resource in spite of interval=0

2019-05-21 Thread Kadlecsik József
Hi, On Tue, 21 May 2019, Ulrich Windl wrote: > So maybe the original defective RA would be valuable for debugging the > issue. I guess the RA was invalid in some way that wasn't detected or > handled properly... With the attached skeleton RA and the setting primitive testid-testid0 ocf:local:

Re: [ClusterLabs] Antw: Re: Constant stop/start of resource in spite of interval=0

2019-05-20 Thread Kadlecsik József
Hi, On Mon, 20 May 2019, Ken Gaillot wrote: > On Mon, 2019-05-20 at 15:29 +0200, Ulrich Windl wrote: > > What worries me is "Rejecting name for unique". > > Trace messages are often not user-friendly. The rejecting/accepting is > nothing to be concerned about; it just refers to which parameters

Re: [ClusterLabs] Constant stop/start of resource in spite of interval=0

2019-05-18 Thread Kadlecsik József
On Sat, 18 May 2019, Kadlecsik József wrote: > On Sat, 18 May 2019, Andrei Borzenkov wrote: > > > 18.05.2019 18:34, Kadlecsik József пишет: > > > > We have a resource agent which creates IP tunnels. In spite of the > > > configuration setting >

Re: [ClusterLabs] Constant stop/start of resource in spite of interval=0

2019-05-18 Thread Kadlecsik József
On Sat, 18 May 2019, Andrei Borzenkov wrote: > 18.05.2019 18:34, Kadlecsik József пишет: > > We have a resource agent which creates IP tunnels. In spite of the > > configuration setting > > > > primitive tunnel-eduroam ocf:local:tunnel \ > > params

[ClusterLabs] Constant stop/start of resource in spite of interval=0

2019-05-18 Thread Kadlecsik József
Hello, We have a resource agent which creates IP tunnels. In spite of the configuration setting primitive tunnel-eduroam ocf:local:tunnel \ params op start timeout=120s interval=0 \ op stop timeout=300s interval=0 \ op monitor timeout=30s interval=30s depth=0

Re: [ClusterLabs] Antw: Rebooting a standby node triggers lots of transitions

2018-09-07 Thread Kadlecsik József
On Wed, 5 Sep 2018, Kadlecsik József wrote: > On Wed, 5 Sep 2018, Ken Gaillot wrote: > > > > > For testing purposes one of our nodes was put in standby node and > > > > then  rebooted several times. When the standby node started up, it > > > >

[ClusterLabs] Rebooting a standby node triggers lots of transitions

2018-09-05 Thread Kadlecsik József
Hi, For testing purposes one of our nodes was put in standby node and then rebooted several times. When the standby node started up, it joined the cluster as a new member and it resulted in transitions between the online nodes. However, when the standby node was rebooted in mid-transitions, it