They both think they should be the DC. But the log fragments dont extend back far enough to say why.
On Fri, Jan 27, 2012 at 10:21 PM, Shyam <shyam.kaus...@gmail.com> wrote: > Folks, > > We are constantly running into a long election cycle where in a 2-node > cluster when both of them are simultaneously rebooted, they take a long time > running through election loop. > > On one node pacemaker loops like: > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover: Taking > over DC status for this partition > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite: > We are now in R/O mode > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: > Operation complete: op cib_slave_all for section 'all' > (origin=local/crmd/222, version=1.1.1): ok (rc=0) > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite: > We are now in R/W mode > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: > Operation complete: op cib_master for section 'all' (origin=local/crmd/223, > version=1.1.1): ok (rc=0) > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: > Operation complete: op cib_modify for section cib (origin=local/crmd/224, > version=1.1.1): ok (rc=0) > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: > Operation complete: op cib_modify for section crm_config > (origin=local/crmd/226, version=1.1.1): ok (rc=0) > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_join_offer_all: > join-25: Waiting on 2 outstanding join acks > Jan 26 22:03:20 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: > Operation complete: op cib_modify for section crm_config > (origin=local/crmd/228, version=1.1.1): ok (rc=0) > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: config_query_callback: > Checking for expired actions every 900000ms > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: > do_election_count_vote: Election 50 (owner: > 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0 > (Age) > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Set DC to > vsa-0000009c-vc-1 (3.0.1) > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition: > State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION > cause=C_FSA_INTERNAL origin=do_election_count_vote ] > Jan 26 22:03:20 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset DC > vsa-0000009c-vc-1 > Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: info: > do_election_count_vote: Election 51 (owner: > 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0 > (Age) > Jan 26 22:03:21 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA: Input > I_JOIN_REQUEST from route_message() received in state S_ELECTION > Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition: > State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC > cause=C_FSA_INTERNAL origin=do_election_check ] > Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem: > Starting sub-system "pengine" > Jan 26 22:03:22 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem: > Client pengine already running as pid 1234 > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_takeover: Taking > over DC status for this partition > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite: > We are now in R/O mode > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: > Operation complete: op cib_slave_all for section 'all' > (origin=local/crmd/231, version=1.1.1): ok (rc=0) > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_readwrite: > We are now in R/W mode > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: > Operation complete: op cib_master for section 'all' (origin=local/crmd/232, > version=1.1.1): ok (rc=0) > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: > Operation complete: op cib_modify for section cib (origin=local/crmd/233, > version=1.1.1): ok (rc=0) > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: > Operation complete: op cib_modify for section crm_config > (origin=local/crmd/235, version=1.1.1): ok (rc=0) > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_dc_join_offer_all: > join-26: Waiting on 2 outstanding join acks > Jan 26 22:03:26 vsa-0000009c-vc-1 cib: [1130]: info: cib_process_request: > Operation complete: op cib_modify for section crm_config > (origin=local/crmd/237, version=1.1.1): ok (rc=0) > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: config_query_callback: > Checking for expired actions every 900000ms > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: > do_election_count_vote: Election 52 (owner: > 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0 > (Age) > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Set DC to > vsa-0000009c-vc-1 (3.0.1) > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition: > State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION > cause=C_FSA_INTERNAL origin=do_election_count_vote ] > Jan 26 22:03:26 vsa-0000009c-vc-1 crmd: [1134]: info: update_dc: Unset DC > vsa-0000009c-vc-1 > Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: info: > do_election_count_vote: Election 53 (owner: > 00000156-0156-0000-2b91-000000000000) pass: vote from vsa-0000009c-vc-0 > (Age) > Jan 26 22:03:27 vsa-0000009c-vc-1 crmd: [1134]: WARN: do_log: FSA: Input > I_JOIN_REQUEST from route_message() received in state S_ELECTION > Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info: do_state_transition: > State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC > cause=C_FSA_INTERNAL origin=do_election_check ] > Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: info: start_subsystem: > Starting sub-system "pengine" > Jan 26 22:03:28 vsa-0000009c-vc-1 crmd: [1134]: WARN: start_subsystem: > Client pengine already running as pid 1234 > > & other node with > Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info: crm_timer_popped: > Election Trigger (I_DC_TIMEOUT) just popped! > Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input > I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING > Jan 26 22:03:20 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition: > State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT > cause=C_TIMER_POPPED origin=crm_timer_popped ] > Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input > I_JOIN_OFFER from route_message() received in state S_ELECTION > Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition: > State transition S_ELECTION -> S_PENDING [ input=I_PENDING > cause=C_FSA_INTERNAL origin=do_election_count_vote ] > Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_dc_release: DC role > released > Jan 26 22:03:21 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control: > Transitioner is now inactive > Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info: crm_timer_popped: > Election Trigger (I_DC_TIMEOUT) just popped! > Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input > I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING > Jan 26 22:03:26 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition: > State transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT > cause=C_TIMER_POPPED origin=crm_timer_popped ] > Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: WARN: do_log: FSA: Input > I_JOIN_OFFER from route_message() received in state S_ELECTION > Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_state_transition: > State transition S_ELECTION -> S_PENDING [ input=I_PENDING > cause=C_FSA_INTERNAL origin=do_election_count_vote ] > Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_dc_release: DC role > released > Jan 26 22:03:27 vsa-0000009c-vc-0 crmd: [1314]: info: do_te_control: > Transitioner is now inactive > > This takes several minutes & finally breaks. > > Any pointers on what can be causing this? > > Thanks. > > --Shyam > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org