On 23/05/2013, at 4:44 PM, Kazunori INOUE <[email protected]> wrote:
> Hi, > > I'm using pacemaker-1.1 (c3486a4a8d. the latest devel). > After fencing caused by split-brain failed 11 times, S_POLICY_ENGINE state is > kept even if I recover split-brain. Odd, I get: May 24 00:17:08 corosync-host-1 crmd[3056]: notice: tengine_stonith_callback: Stonith operation 12/69:23:0:9b069b96-3565-4219-85a5-8782bdb5d9d3: No route to host (-113) May 24 00:17:08 corosync-host-1 crmd[3056]: notice: tengine_stonith_callback: Stonith operation 12 for corosync-host-6 failed (No route to host): aborting transition. May 24 00:17:08 corosync-host-1 crmd[3056]: notice: run_graph: Transition 23 (Complete=1, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-warn-110.bz2): Stopped May 24 00:17:08 corosync-host-1 crmd[3056]: notice: too_many_st_failures: Too many failures to fence corosync-host-6 (11), giving up May 24 00:17:08 corosync-host-1 crmd[3056]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] May 24 00:17:08 corosync-host-1 crmd[3056]: notice: tengine_stonith_notify: Peer corosync-host-6 was not terminated (reboot) by corosync-host-1 for corosync-host-1: No route to host (ref=9dd3711e-c87d-4b2e-acd1-854391a6fa9d) by client crmd.3056 > > 1. disconnect network connection > [dev1 ~]$ crm_mon > Last updated: Thu May 23 13:16:41 2013 > Last change: Thu May 23 13:15:30 2013 via cibadmin on dev1 > Stack: corosync > Current DC: dev1 (3232261525) - partition WITHOUT quorum > Version: 1.1.10-0.122.c3486a4.git.el6-c3486a4 > 2 Nodes configured, unknown expected votes > 2 Resources configured. > > > Node dev2 (3232261523): UNCLEAN (offline) > Online: [ dev1 ] > > f1 (stonith:external/libvirt.NG): Started dev2 > f2 (stonith:external/libvirt.NG): Started dev1 > > [dev2 ~]$ crm_mon > Last updated: Thu May 23 13:16:41 2013 > Last change: Thu May 23 13:15:30 2013 via cibadmin on dev1 > Stack: corosync > Current DC: dev2 (3232261523) - partition WITHOUT quorum > Version: 1.1.10-0.122.c3486a4.git.el6-c3486a4 > 2 Nodes configured, unknown expected votes > 2 Resources configured. > > > Node dev1 (3232261525): UNCLEAN (offline) > Online: [ dev2 ] > > f1 (stonith:external/libvirt.NG): Started dev2 > f2 (stonith:external/libvirt.NG): Started dev1 > > > 2. wait until fencing failed 11 times > [dev1 ~]$ egrep "CRIT|too_many_st_failures" /var/log/ha-log > May 23 13:16:46 dev1 stonith: [24981]: CRIT: external_reset_req: 'libvirt.NG > reset' for host dev2 failed with rc 1 > (snip) > May 23 13:17:24 dev1 stonith: [25105]: CRIT: external_reset_req: 'libvirt.NG > reset' for host dev2 failed with rc 1 > May 23 13:17:28 dev1 stonith: [25118]: CRIT: external_reset_req: 'libvirt.NG > reset' for host dev2 failed with rc 1 > May 23 13:17:28 dev1 crmd[24868]: notice: too_many_st_failures: Too many > failures to fence dev2 (11), giving up > > [dev2 ~]$ egrep "CRIT|too_many_st_failures" /var/log/ha-log > May 23 13:16:46 dev2 stonith: [7177]: CRIT: external_reset_req: 'libvirt.NG > reset' for host dev1 failed with rc 1 > (snip) > May 23 13:17:23 dev2 stonith: [7295]: CRIT: external_reset_req: 'libvirt.NG > reset' for host dev1 failed with rc 1 > May 23 13:17:28 dev2 stonith: [7309]: CRIT: external_reset_req: 'libvirt.NG > reset' for host dev1 failed with rc 1 > May 23 13:17:28 dev2 crmd[7107]: notice: too_many_st_failures: Too many > failures to fence dev1 (11), giving up > > > 3. recover network disconnection > [dev1 ~]$ crm_mon > Last updated: Thu May 23 13:24:23 2013 > Last change: Thu May 23 13:15:30 2013 via cibadmin on dev1 > Stack: corosync > Current DC: dev2 (3232261523) - partition with quorum > Version: 1.1.10-0.122.c3486a4.git.el6-c3486a4 > 2 Nodes configured, unknown expected votes > 2 Resources configured. > > > Online: [ dev1 dev2 ] > > f1 (stonith:external/libvirt.NG): Started dev2 > f2 (stonith:external/libvirt.NG): Started dev1 > > > S_POLICY_ENGINE state continues being maintained although a member's join > seems to have succeeded. > > [13:47:54 root@dev1 ~]$ crmadmin -S dev2 > Status of crmd@dev2: S_POLICY_ENGINE (ok) > > > Best Regards, > Kazunori INOUE > <keeping-S_POLICY_ENGINE.tar.bz2>_______________________________________________ > Pacemaker mailing list: [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
