Re: [ClusterLabs] both nodes OFFLINE
Hi. Thanks for reply. And sorry for my report that problem has solved. As mentioned, corosync versions were not same. “Syncing” versions solved the problem. This was just an installation problem. Although we used Ansible to update the rpm file, there was a failure and we missed it happend. > 2017/05/23 7:12、Ken Gaillotのメール: > > On 05/13/2017 01:36 AM, 石井 俊直 wrote: >> Hi. >> >> We have, sometimes, a problem in our two nodes cluster on CentOS7. Let >> node-2 and node-3 >> be the names of the nodes. When the problem happens, both nodes are >> recognized OFFLINE >> on node-3 and on node-2, only node-3 is recognized OFFLINE. >> >> When that happens, the following log message is added repeatedly on node-2 >> and log file >> (/var/log/cluster/corosync.log) becomes hundreds of megabytes in short time. >> Log message >> content on node-3 is different. >> >> The erroneous state is temporally solved if OS of node-2 is restarted. On >> the other hand, >> restarting OS of node-3 results in the same state. >> >> I’ve searched content of ML and found a post (Mon Oct 1 01:27:39 CEST 2012) >> about >> "Discarding update with feature set” problem. According to the message, our >> problem >> may be solved by removing /var/lib/pacemaker/crm/cib.* on node-2. >> >> What I want to know is whether removing the above files on just one of the >> node is safe ? >> If there’s other method to solve the problem, I’d like to hear that. >> >> Thanks. >> >> —— from corosync.log >> cib:error: cib_perform_op: Discarding update with feature set >> '3.0.11' greater than our own '3.0.10' > > This implies that the pacemaker versions are different on the two nodes. > Usually, when the pacemaker version changes, the feature set version > also changes, which means that it introduces new features that won't > work with older pacemaker versions. > > Running a cluster with mixed pacemaker versions in such a case is > allowed, but only during a rolling upgrade. Once an older node leaves > the cluster for any reason, it will not be allowed to rejoin until it is > upgraded. > > Removing the cib files won't help, since node-2 apparently does not > support node-3's pacemaker version. > > If that's not the situation you are in, please give more details, as > this should not be possible otherwise. > >> cib:error: cib_process_request: Completed cib_replace operation for >> section 'all': Protocol not supported (rc=-93, origin=node-3/crmd/12708, >> version=0.83.30) >> crmd: error: finalize_sync_callback: Sync from node-3 failed: >> Protocol not supported >> crmd:info: register_fsa_error_adv: Resetting the current action >> list >> crmd: warning: do_log: Input I_ELECTION_DC received in state >> S_FINALIZE_JOIN from finalize_sync_callback >> crmd:info: do_state_transition: State transition S_FINALIZE_JOIN -> >> S_INTEGRATION | input=I_ELECTION_DC cause=C_FSA_INTERNAL >> origin=finalize_sync_callback >> crmd:info: crm_update_peer_join: initialize_join: Node node-2[1] - >> join-6329 phase 2 -> 0 >> crmd:info: crm_update_peer_join: initialize_join: Node node-3[2] - >> join-6329 phase 2 -> 0 >> crmd:info: update_dc:Unset DC. Was node-2 >> crmd:info: join_make_offer: join-6329: Sending offer to node-2 >> crmd:info: crm_update_peer_join: join_make_offer: Node node-2[1] - >> join-6329 phase 0 -> 1 >> crmd:info: join_make_offer: join-6329: Sending offer to node-3 >> crmd:info: crm_update_peer_join: join_make_offer: Node node-3[2] - >> join-6329 phase 0 -> 1 >> crmd:info: do_dc_join_offer_all: join-6329: Waiting on 2 outstanding >> join acks >> crmd:info: update_dc:Set DC to node-2 (3.0.10) >> crmd:info: crm_update_peer_join: do_dc_join_filter_offer: Node node-2[1] >> - join-6329 phase 1 -> 2 >> crmd:info: crm_update_peer_join: do_dc_join_filter_offer: Node node-3[2] >> - join-6329 phase 1 -> 2 >> crmd:info: do_state_transition: State transition S_INTEGRATION -> >> S_FINALIZE_JOIN | input=I_INTEGRATED cause=C_FSA_INTERNAL >> origin=check_join_state >> crmd:info: crmd_join_phase_log: join-6329: node-2=integrated >> crmd:info: crmd_join_phase_log: join-6329: node-3=integrated >> crmd: notice: do_dc_join_finalize: Syncing the Cluster Information Base >> from node-3 to rest of cluster | join-6329 >> crmd: notice: do_dc_join_finalize: Requested version > crm_feature_set="3.0.11" validate-with="pacemaker-2.5" epoch="84" >> num_updates="1" admin_epoch="0" cib-last-written="Thu May 11 08:05:45 2017" >> update-origin="node-2" update-client="crm_resource" update-user="root" >> have-quorum="1"/> >> cib: info: cib_process_request: Forwarding cib_sync operation for >> section 'all' to node-3 (origin=local/crmd/12710) >> cib: info: cib_process_replace: Digest matched on replace from node-3: >> 85a19c7927c54ccb15794f2720e07ce1 >> cib: info:
Re: [ClusterLabs] both nodes OFFLINE
On 05/13/2017 01:36 AM, 石井 俊直 wrote: > Hi. > > We have, sometimes, a problem in our two nodes cluster on CentOS7. Let node-2 > and node-3 > be the names of the nodes. When the problem happens, both nodes are > recognized OFFLINE > on node-3 and on node-2, only node-3 is recognized OFFLINE. > > When that happens, the following log message is added repeatedly on node-2 > and log file > (/var/log/cluster/corosync.log) becomes hundreds of megabytes in short time. > Log message > content on node-3 is different. > > The erroneous state is temporally solved if OS of node-2 is restarted. On the > other hand, > restarting OS of node-3 results in the same state. > > I’ve searched content of ML and found a post (Mon Oct 1 01:27:39 CEST 2012) > about > "Discarding update with feature set” problem. According to the message, our > problem > may be solved by removing /var/lib/pacemaker/crm/cib.* on node-2. > > What I want to know is whether removing the above files on just one of the > node is safe ? > If there’s other method to solve the problem, I’d like to hear that. > > Thanks. > > —— from corosync.log > cib:error: cib_perform_op:Discarding update with feature set > '3.0.11' greater than our own '3.0.10' This implies that the pacemaker versions are different on the two nodes. Usually, when the pacemaker version changes, the feature set version also changes, which means that it introduces new features that won't work with older pacemaker versions. Running a cluster with mixed pacemaker versions in such a case is allowed, but only during a rolling upgrade. Once an older node leaves the cluster for any reason, it will not be allowed to rejoin until it is upgraded. Removing the cib files won't help, since node-2 apparently does not support node-3's pacemaker version. If that's not the situation you are in, please give more details, as this should not be possible otherwise. > cib:error: cib_process_request: Completed cib_replace operation for > section 'all': Protocol not supported (rc=-93, origin=node-3/crmd/12708, > version=0.83.30) > crmd: error: finalize_sync_callback:Sync from node-3 failed: > Protocol not supported > crmd:info: register_fsa_error_adv:Resetting the current action > list > crmd: warning: do_log:Input I_ELECTION_DC received in state > S_FINALIZE_JOIN from finalize_sync_callback > crmd:info: do_state_transition: State transition S_FINALIZE_JOIN -> > S_INTEGRATION | input=I_ELECTION_DC cause=C_FSA_INTERNAL > origin=finalize_sync_callback > crmd:info: crm_update_peer_join: initialize_join: Node node-2[1] - > join-6329 phase 2 -> 0 > crmd:info: crm_update_peer_join: initialize_join: Node node-3[2] - > join-6329 phase 2 -> 0 > crmd:info: update_dc: Unset DC. Was node-2 > crmd:info: join_make_offer: join-6329: Sending offer to node-2 > crmd:info: crm_update_peer_join: join_make_offer: Node node-2[1] - > join-6329 phase 0 -> 1 > crmd:info: join_make_offer: join-6329: Sending offer to node-3 > crmd:info: crm_update_peer_join: join_make_offer: Node node-3[2] - > join-6329 phase 0 -> 1 > crmd:info: do_dc_join_offer_all: join-6329: Waiting on 2 outstanding > join acks > crmd:info: update_dc: Set DC to node-2 (3.0.10) > crmd:info: crm_update_peer_join: do_dc_join_filter_offer: Node node-2[1] > - join-6329 phase 1 -> 2 > crmd:info: crm_update_peer_join: do_dc_join_filter_offer: Node node-3[2] > - join-6329 phase 1 -> 2 > crmd:info: do_state_transition: State transition S_INTEGRATION -> > S_FINALIZE_JOIN | input=I_INTEGRATED cause=C_FSA_INTERNAL > origin=check_join_state > crmd:info: crmd_join_phase_log: join-6329: node-2=integrated > crmd:info: crmd_join_phase_log: join-6329: node-3=integrated > crmd: notice: do_dc_join_finalize: Syncing the Cluster Information Base > from node-3 to rest of cluster | join-6329 > crmd: notice: do_dc_join_finalize: Requested versioncrm_feature_set="3.0.11" validate-with="pacemaker-2.5" epoch="84" > num_updates="1" admin_epoch="0" cib-last-written="Thu May 11 08:05:45 2017" > update-origin="node-2" update-client="crm_resource" update-user="root" > have-quorum="1"/> > cib: info: cib_process_request: Forwarding cib_sync operation for > section 'all' to node-3 (origin=local/crmd/12710) > cib: info: cib_process_replace: Digest matched on replace from node-3: > 85a19c7927c54ccb15794f2720e07ce1 > cib: info: cib_process_replace: Replaced 0.83.30 with 0.84.1 from node-3 > cib: info: __xml_diff_object: Moved node_state@crmd (3 -> 2) ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] both nodes OFFLINE
Hi. We have, sometimes, a problem in our two nodes cluster on CentOS7. Let node-2 and node-3 be the names of the nodes. When the problem happens, both nodes are recognized OFFLINE on node-3 and on node-2, only node-3 is recognized OFFLINE. When that happens, the following log message is added repeatedly on node-2 and log file (/var/log/cluster/corosync.log) becomes hundreds of megabytes in short time. Log message content on node-3 is different. The erroneous state is temporally solved if OS of node-2 is restarted. On the other hand, restarting OS of node-3 results in the same state. I’ve searched content of ML and found a post (Mon Oct 1 01:27:39 CEST 2012) about "Discarding update with feature set” problem. According to the message, our problem may be solved by removing /var/lib/pacemaker/crm/cib.* on node-2. What I want to know is whether removing the above files on just one of the node is safe ? If there’s other method to solve the problem, I’d like to hear that. Thanks. —— from corosync.log cib:error: cib_perform_op: Discarding update with feature set '3.0.11' greater than our own '3.0.10' cib:error: cib_process_request: Completed cib_replace operation for section 'all': Protocol not supported (rc=-93, origin=node-3/crmd/12708, version=0.83.30) crmd: error: finalize_sync_callback: Sync from node-3 failed: Protocol not supported crmd:info: register_fsa_error_adv: Resetting the current action list crmd: warning: do_log: Input I_ELECTION_DC received in state S_FINALIZE_JOIN from finalize_sync_callback crmd:info: do_state_transition: State transition S_FINALIZE_JOIN -> S_INTEGRATION | input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=finalize_sync_callback crmd:info: crm_update_peer_join:initialize_join: Node node-2[1] - join-6329 phase 2 -> 0 crmd:info: crm_update_peer_join:initialize_join: Node node-3[2] - join-6329 phase 2 -> 0 crmd:info: update_dc: Unset DC. Was node-2 crmd:info: join_make_offer: join-6329: Sending offer to node-2 crmd:info: crm_update_peer_join:join_make_offer: Node node-2[1] - join-6329 phase 0 -> 1 crmd:info: join_make_offer: join-6329: Sending offer to node-3 crmd:info: crm_update_peer_join:join_make_offer: Node node-3[2] - join-6329 phase 0 -> 1 crmd:info: do_dc_join_offer_all:join-6329: Waiting on 2 outstanding join acks crmd:info: update_dc: Set DC to node-2 (3.0.10) crmd:info: crm_update_peer_join:do_dc_join_filter_offer: Node node-2[1] - join-6329 phase 1 -> 2 crmd:info: crm_update_peer_join:do_dc_join_filter_offer: Node node-3[2] - join-6329 phase 1 -> 2 crmd:info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN | input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state crmd:info: crmd_join_phase_log: join-6329: node-2=integrated crmd:info: crmd_join_phase_log: join-6329: node-3=integrated crmd: notice: do_dc_join_finalize: Syncing the Cluster Information Base from node-3 to rest of cluster | join-6329 crmd: notice: do_dc_join_finalize: Requested version cib: info: cib_process_request: Forwarding cib_sync operation for section 'all' to node-3 (origin=local/crmd/12710) cib: info: cib_process_replace: Digest matched on replace from node-3: 85a19c7927c54ccb15794f2720e07ce1 cib: info: cib_process_replace: Replaced 0.83.30 with 0.84.1 from node-3 cib: info: __xml_diff_object: Moved node_state@crmd (3 -> 2) ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org