Many thanks. Fixed in: https://github.com/beekhof/pacemaker/commit/fab0978
Apparently there was no regression test covering this (things collocated with the group too) but there is now: https://github.com/beekhof/pacemaker/commit/d2be466 So you can be sure it wont break again. On 02/08/2013, at 4:57 PM, Xzarth <xza...@gmail.com> wrote: > On 08/02/2013 02:16 AM, Andrew Beekhof wrote: >> On 01/08/2013, at 10:24 PM, Xzarth <xza...@gmail.com> wrote: >> >>> Hi, >>> >>> I updated from pacemaker 1.0.9 to 1.1.7 >> Distro? Seems strange to be upgrading to a release from 1.5 years ago. >> We're up to 1.1.10 now >> > I have debian, i have one with stable (wheezy), and one with oldstable > (squeeze), installed from backports. Behavior is same on both. >>> After the update, cluster behaves differently than before. I have a >>> resource with migration-treshold="1", once that resource fails >>> everything used to migrate to another node (what i would expect). >>> After the upgrade, once that resource fails, cluster stops any resources >>> that depend on that resource and just hangs there. What changed, since i >>> haven't touched the config? >> Can you attach the result of cibadmin -Ql when the cluster is in this state? >> > here it is >>> >>> Here is the config: >>> >>> node $id="1bb92e1d" asttest1 \ >>> attributes standby="off" >>> node $id="5e583c54" asttest2 \ >>> attributes standby="off" >>> node asttest1 >>> node asttest2 >>> primitive asterisk lsb:asterisk-11.0.1 \ >>> op start interval="0" timeout="15s" \ >>> op stop interval="0" timeout="15s" \ >>> op monitor interval="1s" timeout="15s" start-delay="10" >>> primitive dahdi lsb:dahdi \ >>> op start interval="0" timeout="15s" \ >>> op stop interval="0" timeout="15s" \ >>> op monitor interval="1s" timeout="15s" >>> primitive drbd ocf:linbit:drbd \ >>> params drbd_resource="r0" \ >>> op monitor interval="29s" role="Master" \ >>> op monitor interval="31s" role="Slave" >>> primitive fonulator lsb:fonulator \ >>> op start interval="0" timeout="20s" \ >>> op stop interval="0" timeout="20s" \ >>> op monitor interval="1s" timeout="20s" start-delay="30" \ >>> meta migration-threshold="1" failure-timeout="60s" >>> primitive fs_drbd ocf:heartbeat:Filesystem \ >>> params device="/dev/drbd/by-res/r0" directory="/mnt/drbd" fstype="ext3" >>> \ >>> op start interval="0" timeout="60s" start-delay="1" \ >>> op stop interval="0" timeout="60s" start-delay="1" \ >>> op monitor interval="1s" timeout="40s" start-delay="30" \ >>> meta is-managed="true" target-role="Started" >>> primitive httpd lsb:apache2 \ >>> op start interval="0" timeout="20s" \ >>> op stop interval="0" timeout="20s" \ >>> op monitor interval="1s" timeout="20s" start-delay="10" >>> primitive iax2_mon lsb:iax2_mon \ >>> op start interval="0" timeout="20s" \ >>> op stop interval="0" timeout="20s" \ >>> op monitor interval="60s" timeout="20s" start-delay="30" \ >>> meta failure-timeout="60s" >>> primitive ip_voip_route_default ocf:heartbeat:Route \ >>> params destination="default" gateway="10.2.4.1" \ >>> op monitor interval="1s" timeout="20s" >>> primitive ip_voip_route_test1 ocf:heartbeat:Route \ >>> params destination="X.X.X.X/32" gateway="X.X.X.X" \ >>> op monitor interval="1s" timeout="20s" >>> primitive ip_voip_route_test2 ocf:heartbeat:Route \ >>> params destination="X.X.X.X/32" gateway="X.X.X.X.1" \ >>> op monitor interval="1s" timeout="20s" >>> primitive ip_voip_eth0 ocf:heartbeat:IPaddr2 \ >>> params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="1" \ >>> op monitor interval="1s" timeout="20s" >>> primitive ip_voip_eth1 ocf:heartbeat:IPaddr2 \ >>> params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="2" \ >>> op monitor interval="1s" timeout="20s" >>> primitive ip_voip_eth2 ocf:heartbeat:IPaddr2 \ >>> params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="3" \ >>> op monitor interval="1s" timeout="20s" >>> primitive ip_voip_eth3 ocf:heartbeat:IPaddr2 \ >>> params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="4" \ >>> op monitor interval="1s" timeout="20s" >>> primitive ip_voip_eth4 ocf:heartbeat:IPaddr2 \ >>> params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="5" \ >>> op monitor interval="1s" timeout="20s" >>> primitive ip_voip_eth5 ocf:heartbeat:IPaddr2 \ >>> params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="6" \ >>> op monitor interval="1s" timeout="20s" >>> primitive ip_voip_eth6 ocf:heartbeat:IPaddr2 \ >>> params ip="X.X.X.X" cidr_netmask="24" nic="eth0" iflabel="7" \ >>> op monitor interval="1s" timeout="20s" >>> primitive ip_voip_eth8 ocf:heartbeat:IPaddr2 \ >>> params ip="X.X.X.X" cidr_netmask="24" nic="eth8" iflabel="1" \ >>> op monitor interval="1s" timeout="20s" >>> primitive mysqld lsb:mysql \ >>> op monitor interval="1s" timeout="15s" start-delay="10" >>> primitive tftp lsb:tftp-srce \ >>> op start interval="0" timeout="20s" \ >>> op stop interval="0" timeout="20s" \ >>> op monitor interval="60s" timeout="10s" start-delay="10" >>> group ip_voip_addresses_p ip_voip_eth0 ip_voip_eth8 ip_voip_eth1 >>> ip_voip_eth2 ip_voip_eth3 ip_voip_eth4 ip_voip_eth5 ip_voip_eth6 \ >>> meta ordered="false" collocated="true" priority="8" >>> group ip_voip_routes ip_voip_route_test1 ip_voip_route_test2 \ >>> meta ordered="false" collocated="true" priority="9" >>> group voip mysqld dahdi fonulator asterisk iax2_mon httpd tftp \ >>> meta ordered="true" collocated="true" priority="10" >>> ms ms_drbd drbd \ >>> meta master-max="1" master-node-max="1" clone-max="2" >>> clone-node-max="1" notify="true" target-role="Master" >>> clone cl_route ip_voip_route_default \ >>> meta target-role="Started" >>> colocation fs_colocation inf: fs_drbd ms_drbd:Master >>> colocation ip_colocation inf: ip_voip_addresses_p fs_drbd >>> colocation ip_route_colocation inf: ip_voip_routes ip_voip_addresses_p >>> colocation voip_colocation inf: voip ip_voip_addresses_p >>> order fs_order inf: ms_drbd:promote fs_drbd:start >>> order ip_order inf: fs_drbd:start ip_voip_addresses_p:start >>> order ip_route_order inf: ip_voip_addresses_p:start ip_voip_routes:start >>> order voip_order inf: ip_voip_routes:start voip:start >>> property $id="cib-bootstrap-options" \ >>> dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \ >>> cluster-infrastructure="openais" \ >>> stonith-enabled="false" \ >>> expected-quorum-votes="2" \ >>> last-lrm-refresh="1375355273" \ >>> no-quorum-policy="ignore" \ >>> symmetric-cluster="true" >>> >>> >>> And here is the state of the cluster after node fails: >>> >>> ============ >>> Last updated: Thu Aug 1 13:26:41 2013 >>> Last change: Thu Aug 1 13:07:53 2013 >>> Stack: openais >>> Current DC: asttest1 - partition with quorum >>> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff >>> 4 Nodes configured, 2 expected votes >>> 24 Resources configured. >>> ============ >>> >>> Online: [ asttest1 asttest2 ] >>> OFFLINE: [ asttest1 asttest2 ] >>> >>> Resource Group: voip >>> mysqld (lsb:mysql): Started asttest1 >>> dahdi (lsb:dahdi): Started asttest1 >>> fonulator (lsb:fonulator): Stopped >>> asterisk (lsb:asterisk-11.0.1): Stopped >>> iax2_mon (lsb:iax2_mon): Stopped >>> httpd (lsb:apache2): Stopped >>> tftp (lsb:tftp-srce): Stopped >>> Resource Group: ip_voip_routes >>> ip_voip_route_test1 (ocf::heartbeat:Route): Started asttest1 >>> ip_voip_route_test2 (ocf::heartbeat:Route): Started asttest1 >>> Resource Group: ip_voip_addresses_p >>> ip_voip_eth0 (ocf::heartbeat:IPaddr2): Started asttest1 >>> ip_voip_eth8 (ocf::heartbeat:IPaddr2): Started asttest1 >>> ip_voip_eth1 (ocf::heartbeat:IPaddr2): Started asttest1 >>> ip_voip_eth2 (ocf::heartbeat:IPaddr2): Started asttest1 >>> ip_voip_eth3 (ocf::heartbeat:IPaddr2): Started asttest1 >>> ip_voip_eth4 (ocf::heartbeat:IPaddr2): Started asttest1 >>> ip_voip_eth5 (ocf::heartbeat:IPaddr2): Started asttest1 >>> ip_voip_eth6 (ocf::heartbeat:IPaddr2): Started asttest1 >>> Clone Set: cl_route [ip_voip_route_default] >>> Started: [ asttest2 asttest1 ] >>> Stopped: [ ip_voip_route_default:2 ip_voip_route_default:3 ] >>> fs_drbd (ocf::heartbeat:Filesystem): Started asttest1 >>> Master/Slave Set: ms_drbd [drbd] >>> Masters: [ asttest1 ] >>> Slaves: [ asttest2 ] >>> >>> Failed actions: >>> fonulator_monitor_1000 (node=asttest1, call=85, rc=7, >>> status=complete): not running >>> >>> > <cibadmin_Ql.txt>_______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org