Re: [Pacemaker] [Problem]Reboot by the error of the clone resource influences the resource of other nodes.

renayama19661014 Thu, 31 Mar 2011 01:12:02 -0700

Hi Vladislav,

Thank you for comment.


As for us, this problem is taking place in the top of 1.0.10 and 1.0. 

Though possibly there may be this problem from a considerably version in front. 

Let's wait for comment of Andrew.

Best Regards,
Hideo Yamauchi.

--- On Thu, 2011/3/31, Vladislav Bogdanov <bub...@hoster-ok.com> wrote:

> Hi,
> 
> 31.03.2011 04:15, renayama19661...@ybb.ne.jp wrote:
> [...]
> > Node srv01 (45f985d7-e7c8-4834-b01b-16b99526672b): online
> >         main_rsc        (ocf::pacemaker:Dummy) Started 
> >         prmDummy1:0     (ocf::pacemaker:Dummy) Started 
> >         prmPingd:0      (ocf::pacemaker:ping) Started 
> > Node srv02 (ed7fdcbf-9c17-4f31-8a27-a831a6b39ed5): online
> >         prmDummy1:1     (ocf::pacemaker:Dummy) Started 
> >         main_rsc2       (ocf::pacemaker:Dummy) Started 
> >         prmPingd:1      (ocf::pacemaker:ping) Started 
> > Node srv03 (e2ffc1ed-3ebe-47e2-b51b-b0f04b454311): online
> >         prmDummy1:2     (ocf::pacemaker:Dummy) Started 
> >         prmPingd:2      (ocf::pacemaker:ping) Started 
> [...]
> > Node srv01 (45f985d7-e7c8-4834-b01b-16b99526672b): online
> > Node srv02 (ed7fdcbf-9c17-4f31-8a27-a831a6b39ed5): online
> >         prmDummy1:1     (ocf::pacemaker:Dummy) Started     ---------> 
> >:1(funny)
> >         prmPingd:0      (ocf::pacemaker:ping) Started      ---------> 
> >:0(funny)
> > Node srv03 (e2ffc1ed-3ebe-47e2-b51b-b0f04b454311): online
> >         main_rsc        (ocf::pacemaker:Dummy) Started 
> >         prmDummy1:2     (ocf::pacemaker:Dummy) Started     ---------> 
> >:2(funny)
> >         prmPingd:1      (ocf::pacemaker:ping) Started      ---------> 
> >:1(funny)
> >
> > We think the reboot of pingd to be unnecessary in a srv02 node. 
> > Is there the method how this problem is settled?
> 
> I observe this problem too (with latest 1.1 tip):
> pengine unnecessarily decides to swap anonymous clone instances between
> nodes when it rearranges cluster resources. This leads to all dependent
> resources on that nodes to be stopped and started again.
> 
> In your case it swapped
> srv02:prmPingd:1,srv03:prmPingd:2 <-> srv02:prmPingd:0,srv03:prmPingd:1
> 
> In my case I often see something like this:
> 
> Jan 17 09:18:58 v02-a pengine: [29790]: notice: LogActions: Move
> resource libvirtd:0#011(Started v02-c -> v02-d)
> Jan 17 09:18:58 v02-a pengine: [29790]: notice: LogActions: Move
> resource libvirtd:1#011(Started v02-d -> v02-a)
> Jan 17 09:18:58 v02-a pengine: [29790]: notice: LogActions: Move
> resource libvirtd:2#011(Started v02-a -> v02-b)
> Jan 17 09:18:58 v02-a pengine: [29790]: notice: LogActions: Move
> resource libvirtd:3#011(Started v02-b -> v02-c)
> 
> I contacted Andrew about this directly some time ago (with hb_report),
> but hadn't have power to raise this problem on ML (what is he actually
> asked me to do) :( .
> 
> I suspect this is 1.1-specific, but this is solely a feeling.
> 
> Maybe somebody familiar with mercurial can bisect when this bug was
> introduced?
> 
> Best,
> Vladislav
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] [Problem]Reboot by the error of the clone resource influences the resource of other nodes.

Reply via email to