Re: [ClusterLabs] controlling cluster behavior on startup
On Tue, Jan 30, 2024 at 2:21 PM Walker, Chris wrote: > >>> However, now it seems to wait that amount of time before it elects a > >>> DC, even when quorum is acquired earlier. In my log snippet below, > >>> with dc-deadtime 300s, > >> > >> The dc-deadtime is not waiting for quorum, but for another DC to show > >> up. If all nodes show up, it can proceed, but otherwise it has to wait. > > > I believe all the nodes showed up by 14:17:04, but it still waited until > 14:19:26 to elect a DC: > > > Jan 29 14:14:25 gopher12 pacemaker-controld [123697] > (peer_update_callback)info: Cluster node gopher12 is now membe (was in > unknown state) > > Jan 29 14:17:04 gopher12 pacemaker-controld [123697] > (peer_update_callback)info: Cluster node gopher11 is now membe (was in > unknown state) > > Jan 29 14:17:04 gopher12 pacemaker-controld [123697] > (quorum_notification_cb) notice: Quorum acquired | membership=54 members=2 > > Jan 29 14:19:26 gopher12 pacemaker-controld [123697] (do_log) info: > Input I_ELECTION_DC received in state S_ELECTION from election_win_cb > > > This is a cluster with 2 nodes, gopher11 and gopher12. > > This is our experience with dc-deadtime too: even if both nodes in the > cluster show up, dc-deadtime must elapse before the cluster starts. This > was discussed on this list a while back ( > https://www.mail-archive.com/users@clusterlabs.org/msg03897.html) and an > RFE came out of it (https://bugs.clusterlabs.org/show_bug.cgi?id=5310). > > > > I’ve worked around this by having an ExecStartPre directive for Corosync > that does essentially: > > > > while ! systemctl -H ${peer} is-active corosync; do sleep 5; done > > > > With this in place, the nodes wait for each other before starting Corosync > and Pacemaker. We can then use the default 20s dc-deadtime so that the DC > election happens quickly once both nodes are up. > Actually wait-for-all coming per default with 2-node should lead to quorum being delayed till both nodes showed up. And if we make the cluster not ignore quorum it shouldn't start fencing before it sees the peer - right? Running a 2-node-cluster ignoring quorum or without wait-for-all is a delicate thing anyway I would say and shouldn't work in a generic case. Not saying it is an issue here - guess there just isn't enough info about the cluster to say. So you shouldn't need this raised dc-deadtime and thus wouldn't experience large startup-delays. Regards, Klaus > Thanks, > > Chris > > > > *From: *Users on behalf of Faaland, Olaf > P. via Users > *Date: *Monday, January 29, 2024 at 7:46 PM > *To: *Ken Gaillot , Cluster Labs - All topics > related to open-source clustering welcomed > *Cc: *Faaland, Olaf P. > *Subject: *Re: [ClusterLabs] controlling cluster behavior on startup > > >> However, now it seems to wait that amount of time before it elects a > >> DC, even when quorum is acquired earlier. In my log snippet below, > >> with dc-deadtime 300s, > > > > The dc-deadtime is not waiting for quorum, but for another DC to show > > up. If all nodes show up, it can proceed, but otherwise it has to wait. > > I believe all the nodes showed up by 14:17:04, but it still waited until > 14:19:26 to elect a DC: > > Jan 29 14:14:25 gopher12 pacemaker-controld [123697] > (peer_update_callback)info: Cluster node gopher12 is now membe (was in > unknown state) > Jan 29 14:17:04 gopher12 pacemaker-controld [123697] > (peer_update_callback)info: Cluster node gopher11 is now membe (was in > unknown state) > Jan 29 14:17:04 gopher12 pacemaker-controld [123697] > (quorum_notification_cb) notice: Quorum acquired | membership=54 members=2 > Jan 29 14:19:26 gopher12 pacemaker-controld [123697] (do_log) info: > Input I_ELECTION_DC received in state S_ELECTION from election_win_cb > > This is a cluster with 2 nodes, gopher11 and gopher12. > > Am I misreading that? > > thanks, > Olaf > > > From: Ken Gaillot > Sent: Monday, January 29, 2024 3:49 PM > To: Faaland, Olaf P.; Cluster Labs - All topics related to open-source > clustering welcomed > Subject: Re: [ClusterLabs] controlling cluster behavior on startup > > On Mon, 2024-01-29 at 22:48 +, Faaland, Olaf P. wrote: > > Thank you, Ken. > > > > I changed my configuration management system to put an initial > > cib.xml into /var/lib/pacemaker/cib/, which sets all the property > > values I was setting via pcs commands, including dc-deadtime. I > > removed those "pcs property set" commands from the ones that are run > > at startup time. > > > > That worked in the sense that after Pacemaker start, the node waits > > my newly specified dc-deadtime of 300s before giving up on the > > partner node and fencing it, if the partner never appears as a > > member. > > > > However, now it seems to wait that amount of time before it elects a > > DC, even when quorum is acquired earlier. In my log snippet below, > > with dc-deadtime 300s, > > The dc-deadtime is not waiting for
Re: [ClusterLabs] controlling cluster behavior on startup
On Tue, 2024-01-30 at 13:20 +, Walker, Chris wrote: > >>> However, now it seems to wait that amount of time before it > elects a > >>> DC, even when quorum is acquired earlier. In my log snippet > below, > >>> with dc-deadtime 300s, > >> > >> The dc-deadtime is not waiting for quorum, but for another DC to > show > >> up. If all nodes show up, it can proceed, but otherwise it has to > wait. > > > I believe all the nodes showed up by 14:17:04, but it still waited > until 14:19:26 to elect a DC: > > > Jan 29 14:14:25 gopher12 pacemaker-controld [123697] > (peer_update_callback)info: Cluster node gopher12 is now membe > (was in unknown state) > > Jan 29 14:17:04 gopher12 pacemaker-controld [123697] > (peer_update_callback)info: Cluster node gopher11 is now membe > (was in unknown state) > > Jan 29 14:17:04 gopher12 pacemaker-controld [123697] > (quorum_notification_cb) notice: Quorum acquired | membership=54 > members=2 > > Jan 29 14:19:26 gopher12 pacemaker-controld [123697] (do_log) > info: Input I_ELECTION_DC received in state S_ELECTION from > election_win_cb > > > This is a cluster with 2 nodes, gopher11 and gopher12. > > This is our experience with dc-deadtime too: even if both nodes in > the cluster show up, dc-deadtime must elapse before the cluster > starts. This was discussed on this list a while back ( > https://www.mail-archive.com/users@clusterlabs.org/msg03897.html) and > an RFE came out of it ( > https://bugs.clusterlabs.org/show_bug.cgi?id=5310). Ah, I misremembered, I thought we had done that :( > > I’ve worked around this by having an ExecStartPre directive for > Corosync that does essentially: > > while ! systemctl -H ${peer} is-active corosync; do sleep 5; done > > With this in place, the nodes wait for each other before starting > Corosync and Pacemaker. We can then use the default 20s dc-deadtime > so that the DC election happens quickly once both nodes are up. That makes sense > Thanks, > Chris > > From: Users on behalf of Faaland, > Olaf P. via Users > Date: Monday, January 29, 2024 at 7:46 PM > To: Ken Gaillot , Cluster Labs - All topics > related to open-source clustering welcomed > Cc: Faaland, Olaf P. > Subject: Re: [ClusterLabs] controlling cluster behavior on startup > > >> However, now it seems to wait that amount of time before it elects > a > >> DC, even when quorum is acquired earlier. In my log snippet > below, > >> with dc-deadtime 300s, > > > > The dc-deadtime is not waiting for quorum, but for another DC to > show > > up. If all nodes show up, it can proceed, but otherwise it has to > wait. > > I believe all the nodes showed up by 14:17:04, but it still waited > until 14:19:26 to elect a DC: > > Jan 29 14:14:25 gopher12 pacemaker-controld [123697] > (peer_update_callback)info: Cluster node gopher12 is now membe > (was in unknown state) > Jan 29 14:17:04 gopher12 pacemaker-controld [123697] > (peer_update_callback)info: Cluster node gopher11 is now membe > (was in unknown state) > Jan 29 14:17:04 gopher12 pacemaker-controld [123697] > (quorum_notification_cb) notice: Quorum acquired | membership=54 > members=2 > Jan 29 14:19:26 gopher12 pacemaker-controld [123697] (do_log) info: > Input I_ELECTION_DC received in state S_ELECTION from election_win_cb > > This is a cluster with 2 nodes, gopher11 and gopher12. > > Am I misreading that? > > thanks, > Olaf > > > From: Ken Gaillot > Sent: Monday, January 29, 2024 3:49 PM > To: Faaland, Olaf P.; Cluster Labs - All topics related to open- > source clustering welcomed > Subject: Re: [ClusterLabs] controlling cluster behavior on startup > > On Mon, 2024-01-29 at 22:48 +, Faaland, Olaf P. wrote: > > Thank you, Ken. > > > > I changed my configuration management system to put an initial > > cib.xml into /var/lib/pacemaker/cib/, which sets all the property > > values I was setting via pcs commands, including dc-deadtime. I > > removed those "pcs property set" commands from the ones that are > run > > at startup time. > > > > That worked in the sense that after Pacemaker start, the node waits > > my newly specified dc-deadtime of 300s before giving up on the > > partner node and fencing it, if the partner never appears as a > > member. > > > > However, now it seems to wait that amount of time before it elects > a > > DC, even when quorum is acquired earlier. In my log snippet below, > > with dc-deadtime 300s, > > The dc-deadtime is not waiting for quorum, but for another DC to show > up. If all nodes show up, it can proceed, but otherwise it has to > wait. > > > > > 14:14:24 Pacemaker starts on gopher12 > > 14:17:04 quorum is acquired > > 14:19:26 Election Trigger just popped (start time + dc-deadtime > > seconds) > > 14:19:26 gopher12 wins the election > > > > Is there other configuration that needs to be present in the cib at > > startup time? > > > > thanks, > > Olaf > > > > === log extract using new system of
Re: [ClusterLabs] controlling cluster behavior on startup
>>> However, now it seems to wait that amount of time before it elects a >>> DC, even when quorum is acquired earlier. In my log snippet below, >>> with dc-deadtime 300s, >> >> The dc-deadtime is not waiting for quorum, but for another DC to show >> up. If all nodes show up, it can proceed, but otherwise it has to wait. > I believe all the nodes showed up by 14:17:04, but it still waited until > 14:19:26 to elect a DC: > Jan 29 14:14:25 gopher12 pacemaker-controld [123697] (peer_update_callback) > info: Cluster node gopher12 is now membe (was in unknown state) > Jan 29 14:17:04 gopher12 pacemaker-controld [123697] (peer_update_callback) > info: Cluster node gopher11 is now membe (was in unknown state) > Jan 29 14:17:04 gopher12 pacemaker-controld [123697] > (quorum_notification_cb) notice: Quorum acquired | membership=54 members=2 > Jan 29 14:19:26 gopher12 pacemaker-controld [123697] (do_log) info: Input > I_ELECTION_DC received in state S_ELECTION from election_win_cb > This is a cluster with 2 nodes, gopher11 and gopher12. This is our experience with dc-deadtime too: even if both nodes in the cluster show up, dc-deadtime must elapse before the cluster starts. This was discussed on this list a while back (https://www.mail-archive.com/users@clusterlabs.org/msg03897.html) and an RFE came out of it (https://bugs.clusterlabs.org/show_bug.cgi?id=5310). I’ve worked around this by having an ExecStartPre directive for Corosync that does essentially: while ! systemctl -H ${peer} is-active corosync; do sleep 5; done With this in place, the nodes wait for each other before starting Corosync and Pacemaker. We can then use the default 20s dc-deadtime so that the DC election happens quickly once both nodes are up. Thanks, Chris From: Users on behalf of Faaland, Olaf P. via Users Date: Monday, January 29, 2024 at 7:46 PM To: Ken Gaillot , Cluster Labs - All topics related to open-source clustering welcomed Cc: Faaland, Olaf P. Subject: Re: [ClusterLabs] controlling cluster behavior on startup >> However, now it seems to wait that amount of time before it elects a >> DC, even when quorum is acquired earlier. In my log snippet below, >> with dc-deadtime 300s, > > The dc-deadtime is not waiting for quorum, but for another DC to show > up. If all nodes show up, it can proceed, but otherwise it has to wait. I believe all the nodes showed up by 14:17:04, but it still waited until 14:19:26 to elect a DC: Jan 29 14:14:25 gopher12 pacemaker-controld [123697] (peer_update_callback) info: Cluster node gopher12 is now membe (was in unknown state) Jan 29 14:17:04 gopher12 pacemaker-controld [123697] (peer_update_callback) info: Cluster node gopher11 is now membe (was in unknown state) Jan 29 14:17:04 gopher12 pacemaker-controld [123697] (quorum_notification_cb) notice: Quorum acquired | membership=54 members=2 Jan 29 14:19:26 gopher12 pacemaker-controld [123697] (do_log) info: Input I_ELECTION_DC received in state S_ELECTION from election_win_cb This is a cluster with 2 nodes, gopher11 and gopher12. Am I misreading that? thanks, Olaf From: Ken Gaillot Sent: Monday, January 29, 2024 3:49 PM To: Faaland, Olaf P.; Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] controlling cluster behavior on startup On Mon, 2024-01-29 at 22:48 +, Faaland, Olaf P. wrote: > Thank you, Ken. > > I changed my configuration management system to put an initial > cib.xml into /var/lib/pacemaker/cib/, which sets all the property > values I was setting via pcs commands, including dc-deadtime. I > removed those "pcs property set" commands from the ones that are run > at startup time. > > That worked in the sense that after Pacemaker start, the node waits > my newly specified dc-deadtime of 300s before giving up on the > partner node and fencing it, if the partner never appears as a > member. > > However, now it seems to wait that amount of time before it elects a > DC, even when quorum is acquired earlier. In my log snippet below, > with dc-deadtime 300s, The dc-deadtime is not waiting for quorum, but for another DC to show up. If all nodes show up, it can proceed, but otherwise it has to wait. > > 14:14:24 Pacemaker starts on gopher12 > 14:17:04 quorum is acquired > 14:19:26 Election Trigger just popped (start time + dc-deadtime > seconds) > 14:19:26 gopher12 wins the election > > Is there other configuration that needs to be present in the cib at > startup time? > > thanks, > Olaf > > === log extract using new system of installing partial cib.xml before > startup > Jan 29 14:14:24 gopher12 pacemakerd [123690] > (main)notice: Starting Pacemaker 2.1.7-1.t4 | build=2.1.7 > features:agent-manpages ascii-docs compat-2.0 corosync-ge-2 default- > concurrent-fencing generated-manpages monotonic nagios ncurses remote > systemd > Jan 29 14:14:25 gopher12 pacemaker-attrd [123695] >