On Oct 15, 2013, at 6:21 PM, Andrew Beekhof <and...@beekhof.net> wrote:
> > On 10/10/2013, at 12:52 PM, Sean Lutner <s...@rentul.net> wrote: > >> >> On Oct 8, 2013, at 9:45 AM, Sean Lutner <s...@rentul.net> wrote: >> >>> >>> On Oct 8, 2013, at 9:33 AM, Lars Marowsky-Bree <l...@suse.com> wrote: >>> >>>> On 2013-10-08T09:29:14, Sean Lutner <s...@rentul.net> wrote: >>>> >>>>> The clone was created using the interleave=true option, yes. > > You might want to trawl the raw xml to make sure pcs did the right thing. > cibadmin -Ql | grep interleave > > would tell you. Thanks, that's very helpful. I'll have a look. > >>>> >>>> Ok, so pcs hides that (interesting to know). >>>> >>>>> Does this have an affect on what I'm trying to accomplish? >>>> >>>> Yes, if you hadn't set that, it might have been an explanation. My best >>>> guess right now would be to upgrade first; the PE has gotten quite a few >>>> fixes since 1.1.8 again. >>> >>> Are you indicating that the behavior I expect to see, which is the resource >>> being marked as Started on the now passive node, is what pacemaker should >>> be doing and this could be a bug? >>> >>> If it would help, I can provide a full cib configuration and logs while I >>> execute the tests I've been running. I won't be able to do that until >>> tonight (EST time) but can if it may help. >>> >>> Thanks >>> Sean >> >> Sorry for following up on my own post but I have a follow-up question about >> the failcount for a resource. Does a crm_resource --cleanup erase the >> failcount on the resource it's run against? > > Older versions didn't but I don't exactly recall when we started doing that. In practice that's what I'm observing so it seems that with 1.1.8 it does. > >> I'm looking at making changes to the failure-timeout and >> cluster-recheck-interval which when combined with my values of >> resource-stickiness=100 and migration-threshold=1 should allow for the >> services on the now failed node to be restarted and be marked as Started in >> the cluster without causing an unnecessary failover. >> >> Does this make sense? > > yes I currently have my failure-timeout and cluster-recheck-interval both set to 10m but I'm not seeing the failcount clear. If I trigger a failover by stopping the resource/service the failover works as expected. But if I then manually restart the services on a previously failed node pacemaker never marks the resources as Started again. I think I may be hitting this bug you fixed back in May. The commit for the fix is https://github.com/beekhof/pacemaker/commit/d87de1b and the thread discussing the issue is http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg15979.html. I think that fits and is what I'm seeing because the default on-fail behavior for a stop operation is block. I will be pulling a newer version of pacemaker from git and building an RPM to test with. > >> >>> >>>> >>>> >>>> Regards, >>>> Lars >>>> >>>> -- >>>> Architect Storage/HA >>>> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix >>>> Imendörffer, HRB 21284 (AG Nürnberg) >>>> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde >>>> >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org