Thank you for your help! I found the problem; it came from a bug in my STONITH agent, which caused it to become a zombie. I corrected this bug and the cluster now fails over as expected.
Kind regards. Le 08/07/2012 00:12, Andreas Kurz a écrit : > On 07/05/2012 04:12 PM, David Guyot wrote: >> Hello, everybody. >> >> As the title suggests, I'm configuring a 2-node cluster but I've got a >> strange issue here : when I put a node in standby mode, using "crm node >> standby", its resources are correctly moved to the second node, and stay >> there even if the first is back on-line, which I assume is the preferred >> behavior (preferred by the designers of such systems) to avoid having >> resources on a potentially unstable node. Nevertheless, when I simulate >> failure of the node which run resources by "/etc/init.d/corosync stop", >> the other node correctly fence the failed node by electrically resetting >> it, but it doesn't mean that it will mount resources on himself; rather, >> it waits the failed node to be back on-line, and then re-negotiates >> resource placement, which inevitably leads to the failed node restarting >> the resources, which I suppose is a consequence of the resource >> stickiness still recorded by the intact node : because this node still >> assume that resources are running on the failed node, it assumes that >> resources prefer to stay on the first node, even if it has failed. >> >> When the first node, Vindemiatrix, has shuts down Corosync, the second, >> Malastare, reports this : >> >> root@Malastare:/home/david# crm_mon --one-shot -VrA >> ============ >> Last updated: Thu Jul 5 15:27:01 2012 >> Last change: Thu Jul 5 15:26:37 2012 via cibadmin on Malastare >> Stack: openais >> Current DC: Malastare - partition WITHOUT quorum >> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff >> 2 Nodes configured, 2 expected votes >> 17 Resources configured. >> ============ >> >> Node Vindemiatrix: UNCLEAN (offline) > Pacemaker thinks fencing was not successful and will not recover > resources until STONITH was successful ... or the node returns an it is > possible to probe resource states > >> Online: [ Malastare ] >> >> Full list of resources: >> >> soapi-fencing-malastare (stonith:external/ovh): Started Vindemiatrix >> soapi-fencing-vindemiatrix (stonith:external/ovh): Started Malastare >> Master/Slave Set: ms_drbd_svn [drbd_svn] >> Masters: [ Vindemiatrix ] >> Slaves: [ Malastare ] >> Master/Slave Set: ms_drbd_pgsql [drbd_pgsql] >> Masters: [ Vindemiatrix ] >> Slaves: [ Malastare ] >> Master/Slave Set: ms_drbd_backupvi [drbd_backupvi] >> Masters: [ Vindemiatrix ] >> Slaves: [ Malastare ] >> Master/Slave Set: ms_drbd_www [drbd_www] >> Masters: [ Vindemiatrix ] >> Slaves: [ Malastare ] >> fs_www (ocf::heartbeat:Filesystem): Started Vindemiatrix >> fs_pgsql (ocf::heartbeat:Filesystem): Started Vindemiatrix >> fs_svn (ocf::heartbeat:Filesystem): Started Vindemiatrix >> fs_backupvi (ocf::heartbeat:Filesystem): Started Vindemiatrix >> VirtualIP (ocf::heartbeat:IPaddr2): Started Vindemiatrix >> OVHvIP (ocf::pacemaker:OVHvIP): Started Vindemiatrix >> ProFTPd (ocf::heartbeat:proftpd): Started Vindemiatrix >> >> Node Attributes: >> * Node Malastare: >> + master-drbd_backupvi:0 : 10000 >> + master-drbd_pgsql:0 : 10000 >> + master-drbd_svn:0 : 10000 >> + master-drbd_www:0 : 10000 >> >> As you can see, the node failure is detected. This state leads to >> attached log file. >> >> Note that both ocf::pacemaker:OVHvIP and stonith:external/ovh are custom >> resources which uses my server provider's SOAP API to provide intended >> services. The STONITH agent does nothing but returning exit status 0 >> when start, stop, on or off actions are required, but returns the 2 >> nodes names when hostlist or gethosts actions are required and, when >> reset action is required, effectively resets faulting node using the >> provider API. As this API doesn't provide reliable mean to know the >> exact moment of resetting, the STONITH agent pings the faulting node >> every 5 seconds until ping fails, then forks a process which pings the >> faulting node every 5 seconds until it answers, then, due to external >> VPN being not yet installed by the provider, I'm forced to emulate it >> with OpenVPN (which seems to be unable to re-establish a connection lost >> minutes ago, leading to a dual brain situation), the STONITH agent >> restarts OpenVPN to re-establish the connection, then restarts Corosync >> and Pacemaker. >> >> Aside from the VPN issue, of which I'm fully aware of performance and >> stability issues, I thought that Pacemaker would, as soon as the STONITH >> agent returns exit status 0, start the resources on the remaining node, >> but it doesn't. Instead, it seems that the STONITH reset action waits >> too long to report a successful reset, delay which reaches some internal >> timeout, which in turn leads Pacemaker to assume that STONITH agent >> failed, therefore, while eternally trying to reset the node (which only >> leads to the API issuing an error because the last reset request was >> less than 5 minutes ago, something forbidden) stopping actions without >> restarting resources on the remaining node. I tried to search the >> Internet to this parameter, but the only related thing I found is this >> page >> http://lists.linux-ha.org/pipermail/linux-ha/2010-March/039761.html, a >> Linux-HA mailing list archive, which mentions a stonith-timeout >> property, but I've parsed Pacemaker documentation without finding any >> occurrence, and I got an error when I tried to get its value : > man stonithd > >> root@Vindemiatrix:/home/david# crm_attribute --name stonith-timeout --query >> scope=crm_config name=stonith-timeout value=(null) >> Error performing operation: The object/attribute does not exist > stonith-timeout defaults to 60s ... crm configure property > stonith-timeout=XY .... to increase it cluster-wide... or you can add an > individual value as resource attribute to your stonith resources. > > Regards, > Andreas > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org Le 08/07/2012 00:12, Andreas Kurz a écrit : > On 07/05/2012 04:12 PM, David Guyot wrote: >> Hello, everybody. >> >> As the title suggests, I'm configuring a 2-node cluster but I've got a >> strange issue here : when I put a node in standby mode, using "crm node >> standby", its resources are correctly moved to the second node, and stay >> there even if the first is back on-line, which I assume is the preferred >> behavior (preferred by the designers of such systems) to avoid having >> resources on a potentially unstable node. Nevertheless, when I simulate >> failure of the node which run resources by "/etc/init.d/corosync stop", >> the other node correctly fence the failed node by electrically resetting >> it, but it doesn't mean that it will mount resources on himself; rather, >> it waits the failed node to be back on-line, and then re-negotiates >> resource placement, which inevitably leads to the failed node restarting >> the resources, which I suppose is a consequence of the resource >> stickiness still recorded by the intact node : because this node still >> assume that resources are running on the failed node, it assumes that >> resources prefer to stay on the first node, even if it has failed. >> >> When the first node, Vindemiatrix, has shuts down Corosync, the second, >> Malastare, reports this : >> >> root@Malastare:/home/david# crm_mon --one-shot -VrA >> ============ >> Last updated: Thu Jul 5 15:27:01 2012 >> Last change: Thu Jul 5 15:26:37 2012 via cibadmin on Malastare >> Stack: openais >> Current DC: Malastare - partition WITHOUT quorum >> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff >> 2 Nodes configured, 2 expected votes >> 17 Resources configured. >> ============ >> >> Node Vindemiatrix: UNCLEAN (offline) > Pacemaker thinks fencing was not successful and will not recover > resources until STONITH was successful ... or the node returns an it is > possible to probe resource states > >> Online: [ Malastare ] >> >> Full list of resources: >> >> soapi-fencing-malastare (stonith:external/ovh): Started Vindemiatrix >> soapi-fencing-vindemiatrix (stonith:external/ovh): Started Malastare >> Master/Slave Set: ms_drbd_svn [drbd_svn] >> Masters: [ Vindemiatrix ] >> Slaves: [ Malastare ] >> Master/Slave Set: ms_drbd_pgsql [drbd_pgsql] >> Masters: [ Vindemiatrix ] >> Slaves: [ Malastare ] >> Master/Slave Set: ms_drbd_backupvi [drbd_backupvi] >> Masters: [ Vindemiatrix ] >> Slaves: [ Malastare ] >> Master/Slave Set: ms_drbd_www [drbd_www] >> Masters: [ Vindemiatrix ] >> Slaves: [ Malastare ] >> fs_www (ocf::heartbeat:Filesystem): Started Vindemiatrix >> fs_pgsql (ocf::heartbeat:Filesystem): Started Vindemiatrix >> fs_svn (ocf::heartbeat:Filesystem): Started Vindemiatrix >> fs_backupvi (ocf::heartbeat:Filesystem): Started Vindemiatrix >> VirtualIP (ocf::heartbeat:IPaddr2): Started Vindemiatrix >> OVHvIP (ocf::pacemaker:OVHvIP): Started Vindemiatrix >> ProFTPd (ocf::heartbeat:proftpd): Started Vindemiatrix >> >> Node Attributes: >> * Node Malastare: >> + master-drbd_backupvi:0 : 10000 >> + master-drbd_pgsql:0 : 10000 >> + master-drbd_svn:0 : 10000 >> + master-drbd_www:0 : 10000 >> >> As you can see, the node failure is detected. This state leads to >> attached log file. >> >> Note that both ocf::pacemaker:OVHvIP and stonith:external/ovh are custom >> resources which uses my server provider's SOAP API to provide intended >> services. The STONITH agent does nothing but returning exit status 0 >> when start, stop, on or off actions are required, but returns the 2 >> nodes names when hostlist or gethosts actions are required and, when >> reset action is required, effectively resets faulting node using the >> provider API. As this API doesn't provide reliable mean to know the >> exact moment of resetting, the STONITH agent pings the faulting node >> every 5 seconds until ping fails, then forks a process which pings the >> faulting node every 5 seconds until it answers, then, due to external >> VPN being not yet installed by the provider, I'm forced to emulate it >> with OpenVPN (which seems to be unable to re-establish a connection lost >> minutes ago, leading to a dual brain situation), the STONITH agent >> restarts OpenVPN to re-establish the connection, then restarts Corosync >> and Pacemaker. >> >> Aside from the VPN issue, of which I'm fully aware of performance and >> stability issues, I thought that Pacemaker would, as soon as the STONITH >> agent returns exit status 0, start the resources on the remaining node, >> but it doesn't. Instead, it seems that the STONITH reset action waits >> too long to report a successful reset, delay which reaches some internal >> timeout, which in turn leads Pacemaker to assume that STONITH agent >> failed, therefore, while eternally trying to reset the node (which only >> leads to the API issuing an error because the last reset request was >> less than 5 minutes ago, something forbidden) stopping actions without >> restarting resources on the remaining node. I tried to search the >> Internet to this parameter, but the only related thing I found is this >> page >> http://lists.linux-ha.org/pipermail/linux-ha/2010-March/039761.html, a >> Linux-HA mailing list archive, which mentions a stonith-timeout >> property, but I've parsed Pacemaker documentation without finding any >> occurrence, and I got an error when I tried to get its value : > man stonithd > >> root@Vindemiatrix:/home/david# crm_attribute --name stonith-timeout --query >> scope=crm_config name=stonith-timeout value=(null) >> Error performing operation: The object/attribute does not exist > stonith-timeout defaults to 60s ... crm configure property > stonith-timeout=XY .... to increase it cluster-wide... or you can add an > individual value as resource attribute to your stonith resources. > > Regards, > Andreas > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org