On Mon, Jan 31, 2011 at 10:12 PM, Andrew Beekhof <and...@beekhof.net> wrote: > On Mon, Jan 31, 2011 at 4:51 PM, Anton Altaparmakov <ai...@cam.ac.uk> wrote: >> Hi, >> >> After a monitor action failure the failcount is not being reset despite >> everything I am aware off being configured, i.e. I have set (copied from >> "crm configure show"): > > Thats a 1.1 feature
In 1.0 they get ignored after the timeout but not reset (so the next failure will put you back over the limit). > >> property \ >> cluster-recheck-interval="60s" >> rsc_defaults $id="rsc-options" \ >> failure-timeout="60s" >> >> Yes "crm_mon --failcounts" shows: >> >> * Node nessie: >> res_drbd:0: migration-threshold=1000000 fail-count=2 last-failure='Mon Jan >> 31 14:27:14 2011' >> >> However the logs say that: >> >> Jan 31 15:41:27 nessie pengine: [1070]: info: get_failcount: ms_drbd has >> failed 2 times on nessie >> Jan 31 15:41:27 nessie pengine: [1070]: notice: get_failcount: Failcount for >> ms_drbd on nessie has expired (limit was 60s) >> >> So why does fail-count not go back to zero and disappear? Am I doing >> something wrong? Is it broken? Am I missing some option? >> >> Note this is running on Ubuntu 10.04.1 LTS and the relevant packages are: >> >> pacemaker 1.0.8+hg15494-2ubuntu2 >> corosync 1.2.0-0ubuntu1 >> drbd8-utils 2:8.3.7-1ubuntu2.1 >> >> And here is the full configuration (crm configure show): >> >> node hydra >> node nessie >> node qs1 >> primitive res_drbd ocf:linbit:drbd \ >> params drbd_resource="dev-vmstore" \ >> meta target-role="Started" \ >> op monitor interval="9s" role="Master" on-fail="restart" \ >> op monitor interval="10s" role="Slave" on-fail="restart" >> primitive res_filesystem ocf:heartbeat:Filesystem \ >> params fstype="xfs" device="/dev/drbd0" directory="/dev-vmstore" >> options="noatime,barrier,largeio,logbufs=8,logbsize=256k,swalloc" \ >> meta target-role="Started" \ >> op monitor on-fail="restart" interval="10s" >> primitive res_ip ocf:heartbeat:IPaddr2 \ >> params ip="172.28.208.19" cidr_netmask="24" >> broadcast="172.28.208.255" \ >> meta target-role="Started" \ >> op monitor on-fail="restart" interval="10s" >> primitive res_nfs_server lsb:nfs-kernel-server \ >> meta target-role="Started" \ >> op monitor on-fail="restart" interval="10s" >> group group_dev-vmstore res_filesystem res_nfs_server res_ip >> ms ms_drbd res_drbd \ >> meta master-max="1" master-node-max="1" clone-max="2" >> clone-node-max="1" notify="true" globally_unique="false" >> location loc_dev-vmstore_hydra group_dev-vmstore 0: hydra >> location loc_dev-vmstore_nessie group_dev-vmstore 0: nessie >> location loc_drbd_hydra ms_drbd 0: hydra >> location loc_drbd_nessie ms_drbd 0: nessie >> colocation col_dev-vmstore inf: group_dev-vmstore ms_drbd:Master >> order order_dev-vmstore inf: ms_drbd:promote group_dev-vmstore:start >> property $id="cib-bootstrap-options" \ >> dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ >> cluster-infrastructure="openais" \ >> expected-quorum-votes="3" \ >> stonith-enabled="false" \ >> no-quorum-policy="stop" \ >> symmetric-cluster="false" \ >> pe-error-series-max="100" \ >> pe-warn-series-max="100" \ >> pe-input-series-max="100" \ >> cluster-delay="10s" \ >> last-lrm-refresh="1296433757" \ >> cluster-recheck-interval="60s" >> rsc_defaults $id="rsc-options" \ >> failure-timeout="60s" >> op_defaults $id="op_defaults-options" \ >> timeout="5s" >> >> Thanks a lot in advance for any help! >> >> Best regards, >> >> Anton >> -- >> Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) >> Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK >> Linux NTFS maintainer, http://www.linux-ntfs.org/ >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker