On Mon, Nov 21, 2011 at 10:48 AM, Andreas Kurz <[email protected]> wrote: > On 11/21/2011 12:06 AM, Charles Ulrich wrote: >> Hello, >> >> First off, I'm brand new to pacemaker and all of its tools. I'm trying >> to come up to speed as quickly as I can, but understand that my >> knowledge is probably lacking in some key areas. As Murphy would have >> it, I've come across a problem that Google has not been able to help >> me with. >> >> Here's the setup: Two machines. eldon and elisa with heartbeat and >> drbd configured. eldon is running a resource group called "www", which >> contains apache, an IP address, and /dev/www mounted from a drbd >> device. (There's a "mysql" resource group on elisa, but that appears >> to be functioning normally for now.) >> >> Here's the problem: The www resource group on eldon keeps getting >> restarted every 16 minutes. (Up for 15, down for 1.) Based on the logs >> on elisa, I believe this is happening whenever the >> cluster-recheck-interval is hit, which defaults to 15 minutes. I >> believe that Pacemaker thinks the configuration (or something) in the >> resource group changed and initiates a restart at every recheck >> interval. These are the log messages from elisa that lead me down this >> line of reasoning: >> >> Nov 19 13:44:02 elisa pengine: [1460]: notice: check_rsc_parameters: >> Forcing restart of www on eldon, type changed: Filesystem -> <null> >> Nov 19 13:44:02 elisa pengine: [1460]: notice: check_rsc_parameters: >> Forcing restart of www on eldon, class changed: ocf -> <null> >> Nov 19 13:44:02 elisa pengine: [1460]: notice: check_rsc_parameters: >> Forcing restart of www on eldon, provider changed: heartbeat -> <null> >> >> What might be causing this? I've included all of the relevant >> information that I can think of below. If there's anything else I can >> provide that would help, let me know. If it's an RTFM thing, I'd be >> grateful if you could also point me towards the right FM to R. > > Yes, the 15min are due to cluster-recheck-interval. I only saw a similar > behavior when changing the provider of a resource that was already > running and I saw it restarting on every monitor event ... btw. maybe > you also want to enable monitoring for all your resources? > > Only solution I found was to restart Pacemaker to start with clean > status section. > > Don't know how you ran into this problem ... how you created this www > group or if you did anything "unusual" to the fs_www resource ... did > you rename resources? > > Regards, > Andreas > > -- > Need help with Pacemaker? > http://www.hastexo.com/now > >> >> node eldon \ >> attributes standby="off" >> node elisa \ >> attributes standby="off" >> primitive apache lsb:apache2 >> primitive drbd_mysql ocf:linbit:drbd \ >> params drbd_resource="mysql" \ >> op monitor interval="15s" \ >> op start interval="0" timeout="240" \ >> op stop interval="0" timeout="100" >> primitive drbd_www ocf:linbit:drbd \ >> params drbd_resource="www" \ >> op monitor interval="15s" \ >> op start interval="0" timeout="240" \ >> op stop interval="0" timeout="100" >> primitive fs_mysql ocf:heartbeat:Filesystem \ >> params device="/dev/drbd/by-res/mysql" >> directory="/var/lib/mysql" fstype="ext4" >> options="noatime,nodev,nosuid,noexec" \ >> op start interval="0" timeout="60" \ >> op stop interval="0" timeout="60" >> primitive fs_www ocf:heartbeat:Filesystem \ >> params device="/dev/drbd/by-res/www" directory="/var/www" >> fstype="ext4" options="noatime,nodev,nosuid" \ >> op start interval="0" timeout="60" \ >> op stop interval="0" timeout="60" >> primitive ip_mysql ocf:heartbeat:IPaddr2 \ >> params ip="10.0.2.10" >> primitive ip_www ocf:heartbeat:IPaddr2 \ >> params ip="207.179.127.50" >> primitive mysqld lsb:mysql >> group mysql fs_mysql ip_mysql mysqld \ >> meta target-role="Started" is-managed="true" >> group www fs_www ip_www apache \ >> meta target-role="Started" is-managed="true" >> ms ms_drbd_mysql drbd_mysql \ >> meta master-max="1" master-node-max="1" clone-max="2" >> clone-node-max="1" notify="true" target-role="Started" >> ms ms_drbd_www drbd_www \ >> meta master-max="1" master-node-max="1" clone-max="2" >> clone-node-max="1" notify="true" target-role="Started" >> location loc_mysql mysql 200: elisa >> location loc_www www 200: eldon >> colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master >> colocation www_on_drbd inf: www ms_drbd_www:Master >> order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start >> order www_after_drbd inf: ms_drbd_www:promote www:start >> property $id="cib-bootstrap-options" \ >> dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \ >> cluster-infrastructure="openais" \ >> expected-quorum-votes="2" \ >> no-quorum-policy="ignore" \ >> stonith-enabled="false" >> >> >> crm(live)# status >> ============ >> Last updated: Sat Nov 19 13:34:25 2011 >> Stack: openais >> Current DC: elisa - partition with quorum >> Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b >> 2 Nodes configured, 2 expected votes >> 4 Resources configured. >> ============ >> >> Online: [ eldon elisa ] >> >> Resource Group: mysql >> fs_mysql (ocf::heartbeat:Filesystem): Started elisa >> ip_mysql (ocf::heartbeat:IPaddr2): Started elisa >> mysqld (lsb:mysql): Started elisa >> Master/Slave Set: ms_drbd_mysql >> Masters: [ elisa ] >> Slaves: [ eldon ] >> Master/Slave Set: ms_drbd_www >> Masters: [ eldon ] >> Slaves: [ elisa ] >> Resource Group: www >> fs_www (ocf::heartbeat:Filesystem): Started eldon >> ip_www (ocf::heartbeat:IPaddr2): Started eldon >> apache (lsb:apache2): Started eldon >> >> Failed actions: >> drbd_mysql_monitor_0 (node=elisa, call=2, rc=6, status=complete): >> not configured >> drbd_mysql_monitor_0 (node=eldon, call=2, rc=6, status=complete): >> not configured >> fs_mysql_start_0 (node=eldon, call=8, rc=5, status=complete): not >> installed >> >> I've also uploaded the syslogs of the restart event here (they're >> rather large and I don't wish to spam the mailing list further than >> necessary): >> >> eldon: http://pastebin.com/raw.php?i=p6Kmct9f >> elisa: http://pastebin.com/raw.php?i=mwddDxKi
Could you use hb_report to create a report and file a bug for it please? http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
