Thank you for the response Ken. Will watch for this to be reproduced again.
On Thu, May 16, 2019 at 4:10 AM Ken Gaillot <[email protected]> wrote: > Resurrecting a really old thread in case anyone had similar questions. > This arrived at a crazy busy time and got neglected. > > The "finalization timer" is a timeout in the DC election process. The > value is the join-finalization-timeout cluster property (formerly crmd- > finalization-timeout), which defaults to 30 minutes. > > The whole election process is undocumented and quite arcane. It would > be nice to document it but that's a bigger project than there is time > for at the moment. > > The controller (crmd) is implemented as a finite state machine, meaning > various inputs move it from one state to another according to fixed > rules. The finalization timer is started when the "finalize join" state > is reached, and stopped whenever that state is left. This state is > achieved once the (possibly newly elected) DC has received join > requests from all nodes, and is left once the DC has sync'd the CIB to > those nodes, ack'd their join requests, and received confirmations back > from them. > > Obviously it's too late to look at this particular incident, but if it > can be reliably reproduced, I can take a new look. > > On Sun, 2018-10-28 at 23:32 +0530, Prasad Nagaraj wrote: > > Hi : > > > > I came across a strange situation in my cluster few days back. I was > > trying to replace one of the nodes in my 3 node cluster and I removed > > and added that node back following due process as per > > > https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_removing_a_corosync_node.html > > . Updated corosync.conf on all nodes and restarted corosync and > > pacemaker on all the nodes. The cluster didnt elect a DC and also > > didnt report any activities for almost 28 mins as seen from below > > logs. Then I saw this message: > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crm_timer_popped: Finalization Timer (I_ELECTION) just popped > > (1800000ms) > > after which I could see further activities happening including DC > > election. > > > > I was not able to understand or identify any reasons for this > > behavior and also there is absolutely no documentation on this > > Finalization timer and what it means. Appreciate any help in terms of > > explaining what exactly this timer means and what could be reasons > > for this behavior. I have pasted a snippet of logs during the time > > here. I do have more logs and also logs from other nodes that I can > > share if required. > > > > Thanks in advance for the help! > > Prasad > > > > > > Oct 22 22:07:46 [76412] vm85c4465533 cib: info: > > cib_process_request: Completed cib_modify operation for section > > nodes: OK (rc=0, origin=vm46890219c5/crm_attribute/4, > > version=0.100.0) > > Oct 22 22:07:46 [76412] vm85c4465533 cib: info: > > cib_file_write_with_digest: Reading cluster configuration file > > /var/lib/pacemaker/cib/cib.QC3Kay (digest: > > /var/lib/pacemaker/cib/cib.c74mjr) > > Oct 22 22:07:47 [76412] vm85c4465533 cib: info: > > cib_file_backup: Archived previous version as > > /var/lib/pacemaker/cib/cib-96.raw > > Oct 22 22:07:47 [76412] vm85c4465533 cib: info: > > cib_file_write_with_digest: Wrote version 0.100.0 of the CIB to disk > > (digest: 3ab4bffdc9c372985cfe50ad3100131d) > > Oct 22 22:07:47 [76412] vm85c4465533 cib: info: > > cib_file_write_with_digest: Reading cluster configuration file > > /var/lib/pacemaker/cib/cib.7NqOEy (digest: > > /var/lib/pacemaker/cib/cib.gamHhs) > > Oct 22 22:07:48 [76412] vm85c4465533 cib: info: > > cib_perform_op: Diff: --- 0.100.0 2 > > Oct 22 22:07:48 [76412] vm85c4465533 cib: info: > > cib_perform_op: Diff: +++ 0.101.0 (null) > > Oct 22 22:07:48 [76412] vm85c4465533 cib: info: > > cib_perform_op: + /cib: @epoch=101 > > Oct 22 22:07:48 [76412] vm85c4465533 cib: info: > > cib_perform_op: ++ /cib/configuration/constraints: > > <rsc_location id="ms_mysql_member453" rsc="ms_mysql" score="0" > > node="vm46890219c5"/> > > Oct 22 22:07:48 [76412] vm85c4465533 cib: info: > > cib_process_request: Completed cib_apply_diff operation for section > > 'all': OK (rc=0, origin=local/cibadmin/2, version=0.101.0) > > Oct 22 22:07:48 [76412] vm85c4465533 cib: info: > > cib_file_backup: Archived previous version as > > /var/lib/pacemaker/cib/cib-97.raw > > Oct 22 22:07:48 [76412] vm85c4465533 cib: info: > > cib_file_write_with_digest: Wrote version 0.101.0 of the CIB to disk > > (digest: 7ffa91a8ca752581d4c1df13d287e467) > > Oct 22 22:07:48 [76412] vm85c4465533 cib: info: > > cib_file_write_with_digest: Reading cluster configuration file > > /var/lib/pacemaker/cib/cib.JkMV2C (digest: > > /var/lib/pacemaker/cib/cib.mLy23A) > > Oct 22 22:07:49 [76412] vm85c4465533 cib: info: > > cib_perform_op: Diff: --- 0.101.0 2 > > Oct 22 22:07:49 [76412] vm85c4465533 cib: info: > > cib_perform_op: Diff: +++ 0.102.0 (null) > > Oct 22 22:07:49 [76412] vm85c4465533 cib: info: > > cib_perform_op: + /cib: @epoch=102 > > Oct 22 22:07:49 [76412] vm85c4465533 cib: info: > > cib_perform_op: + > > /cib/configuration/resources/master[@id='ms_mysql']/meta_attributes[@ > > id='ms_mysql-meta_attributes']/nvpair[@id='ms_mysql-meta_attributes- > > maintenance']: @value=false > > Oct 22 22:07:49 [76412] vm85c4465533 cib: info: > > cib_process_request: Completed cib_modify operation for section > > resources: OK (rc=0, origin=local/crm_resource/6, version=0.102.0) > > Oct 22 22:07:49 [76412] vm85c4465533 cib: info: > > cib_file_backup: Archived previous version as > > /var/lib/pacemaker/cib/cib-98.raw > > Oct 22 22:07:49 [76412] vm85c4465533 cib: info: > > cib_file_write_with_digest: Wrote version 0.102.0 of the CIB to disk > > (digest: dd6263dc226d652721b09ec702e37742) > > Oct 22 22:07:49 [76412] vm85c4465533 cib: info: > > cib_file_write_with_digest: Reading cluster configuration file > > /var/lib/pacemaker/cib/cib.DOunDE (digest: > > /var/lib/pacemaker/cib/cib.MYBrjE) > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crm_timer_popped: Finalization Timer (I_ELECTION) just popped > > (1800000ms) > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > do_state_transition: State transition S_FINALIZE_JOIN -> S_ELECTION [ > > input=I_ELECTION cause=C_TIMER_POPPED origin=crm_timer_popped ] > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > update_dc: Unset DC. Was vm85c4465533 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > election_complete: Election election-0 complete > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > election_timeout_popped: Election failed: Declaring ourselves the > > winner > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: do_log: > > FSA: Input I_ELECTION_DC from election_timeout_popped() received in > > state S_ELECTION > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: notice: > > do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ > > input=I_ELECTION_DC cause=C_TIMER_POPPED > > origin=election_timeout_popped ] > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > do_dc_takeover: Taking over DC status for this partition > > Oct 22 22:35:14 [76412] vm85c4465533 cib: info: > > cib_process_request: Completed cib_master operation for section > > 'all': OK (rc=0, origin=local/crmd/86, version=0.102.0) > > Oct 22 22:35:14 [76412] vm85c4465533 cib: info: > > cib_process_request: Completed cib_modify operation for section cib: > > OK (rc=0, origin=local/crmd/87, version=0.102.0) > > Oct 22 22:35:14 [76412] vm85c4465533 cib: info: > > cib_process_request: Completed cib_modify operation for section > > crm_config: OK (rc=0, origin=local/crmd/89, version=0.102.0) > > Oct 22 22:35:14 [76412] vm85c4465533 cib: info: > > cib_process_request: Completed cib_modify operation for section > > crm_config: OK (rc=0, origin=local/crmd/91, version=0.102.0) > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crm_update_peer_join: initialize_join: Node > > vm46890219c5[218109612] - join-2 phase 4 -> 0 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crm_update_peer_join: initialize_join: Node > > vm85c4465533[83891884] - join-2 phase 4 -> 0 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crm_update_peer_join: initialize_join: Node > > vm2ad44008dc[251664044] - join-2 phase 3 -> 0 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > join_make_offer: join-2: Sending offer to vm46890219c5 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crm_update_peer_join: join_make_offer: Node > > vm46890219c5[218109612] - join-2 phase 0 -> 1 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > join_make_offer: join-2: Sending offer to vm85c4465533 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crm_update_peer_join: join_make_offer: Node > > vm85c4465533[83891884] - join-2 phase 0 -> 1 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > join_make_offer: join-2: Sending offer to vm2ad44008dc > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crm_update_peer_join: join_make_offer: Node > > vm2ad44008dc[251664044] - join-2 phase 0 -> 1 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > do_dc_join_offer_all: join-2: Waiting on 3 outstanding join > > acks > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: warning: do_log: > > FSA: Input I_ELECTION_DC from do_election_check() received in state > > S_INTEGRATION > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crm_update_peer_join: initialize_join: Node > > vm46890219c5[218109612] - join-3 phase 1 -> 0 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crm_update_peer_join: initialize_join: Node > > vm85c4465533[83891884] - join-3 phase 1 -> 0 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crm_update_peer_join: initialize_join: Node > > vm2ad44008dc[251664044] - join-3 phase 1 -> 0 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > join_make_offer: join-3: Sending offer to vm46890219c5 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crm_update_peer_join: join_make_offer: Node > > vm46890219c5[218109612] - join-3 phase 0 -> 1 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > join_make_offer: join-3: Sending offer to vm85c4465533 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crm_update_peer_join: join_make_offer: Node > > vm85c4465533[83891884] - join-3 phase 0 -> 1 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > join_make_offer: join-3: Sending offer to vm2ad44008dc > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crm_update_peer_join: join_make_offer: Node > > vm2ad44008dc[251664044] - join-3 phase 0 -> 1 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > do_dc_join_offer_all: join-3: Waiting on 3 outstanding join > > acks > > Oct 22 22:35:14 [76412] vm85c4465533 cib: info: > > cib_process_request: Completed cib_modify operation for section > > crm_config: OK (rc=0, origin=local/crmd/93, version=0.102.0) > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > plugin_handle_membership: Membership 52: quorum retained > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crmd_cs_dispatch: Setting expected votes to 3 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > plugin_handle_membership: Membership 52: quorum retained > > Oct 22 22:35:14 [76412] vm85c4465533 cib: info: > > cib_process_request: Completed cib_modify operation for section > > crm_config: OK (rc=0, origin=local/crmd/96, version=0.102.0) > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > crmd_cs_dispatch: Setting expected votes to 3 > > Oct 22 22:35:14 [76417] vm85c4465533 crmd: info: > > update_dc: Set DC to vm85c4465533 (3.0.10) > -- > Ken Gaillot <[email protected]> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ >
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
