Re: [ClusterLabs] Pacemaker startup retries

2018-09-05 Thread Cesar Hernandez
> > P.S. If the issue is just a matter of timing when you're starting both > nodes, you can start corosync on both nodes first, then start pacemaker > on both nodes. That way pacemaker on each node will immediately see the > other node's presence. > -- Well rebooting a server lasts 2 minutes a

Re: [ClusterLabs] Pacemaker startup retries

2018-09-05 Thread Cesar Hernandez
Hi > > Ah, this rings a bell. Despite having fenced the node, the cluster > still considers the node unseen. That was a regression in 1.1.14 that > was fixed in 1.1.15. :-( > Oh :( I'm using Pacemaker-1.1.14. Do you know if this reboot retries are just run 3 times? All the tests I've done

Re: [ClusterLabs] Pacemaker startup retries

2018-09-05 Thread Cesar Hernandez
Hi > > The first fencing is legitimate -- the node hasn't been seen at start- > up, and so needs to be fenced. The second fencing will be the one of > interest. Also, look for the result of the first fencing. The first fencing has finished with OK, as well as the other two fencing operations.

Re: [ClusterLabs] Pacemaker startup retries

2018-08-30 Thread Cesar Hernandez
Hi > > > Do you mean you have a custom fencing agent configured? If so, check > the return value of each attempt. Pacemaker should request fencing only > once as long as it succeeds (returns 0), but if the agent fails > (returns nonzero or times out), it will retry, even if the reboot > worked i

[ClusterLabs] Pacemaker startup retries

2018-08-30 Thread Cesar Hernandez
Hi I have a two-node corosync+pacemaker which, starting only one node, it fences the other node. It's ok as the default behaviour as the default "startup-fencing" is set to true. But, the other node is rebooted 3 times, and then, the remaining node starts resources and doesn't fence the node an

Re: [ClusterLabs] Antw: Re: Problem with stonith and starting services

2017-07-16 Thread Cesar Hernandez
> El 17 jul 2017, a las 8:02, Ulrich Windl > escribió: > > Hi! > > Could this mean the stonith-timeout is signioficantly larger than the time > for a complete reboot? So the fenced node would be up again when the cluster > thinks the fencing has just completed. > > Regards, > Ulrich > P.S

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-14 Thread Cesar Hernandez
> > > So if this is really the reason it would probably be worth > finding out what is really happening. > Thanks. Yes, I think this is really the reason. I fixed it one week ago and hasn't happened again ___ Users mailing list: Users@clusterlabs.o

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-12 Thread Cesar Hernandez
> El 6 jul 2017, a las 17:34, Ken Gaillot escribió: > > On 07/06/2017 10:27 AM, Cesar Hernandez wrote: >> >>> >>> It looks like a bug when the fenced node rejoins quickly enough that it >>> is a member again before its fencing confirmation has been

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-07 Thread Cesar Hernandez
>>> >> >> Could it be caused if node 2 becomes rebooted and alive before the stonith >> script has finished? > > That *shouldn't* cause any problems, but I'm not sure what's happening > in this case. Maybe is the cause for it... My other servers installations had a slow stonith device and als

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-06 Thread Cesar Hernandez
> > If node2 is getting the notification of its own fencing, it wasn't > successfully fenced. Successful fencing would render it incapacitated > (powered down, or at least cut off from the network and any shared > resources). Maybe I don't understand you, or maybe you don't understand me... ;)

Re: [ClusterLabs] Antw: Problem with stonith and starting services

2017-07-05 Thread Cesar Hernandez
> > I don't have answers, but questions: > Assuming node1 was DC when stopped: Will ist CIB still record it as DC after > being stopped? > Obviously node1 cannot know about any changes node2 did. And node1 when > started will find that node2 is unexpectedly down, so it will fence it to be > su

Re: [ClusterLabs] Antw: Re: Problem with stonith and starting services

2017-07-05 Thread Cesar Hernandez
> > AFAIK that's not proper fencing. SunOS once had a "fasthalt" command. In > Linux "halt -nf" might do a similar thing, or maybe trigger a reboot via > sysrq (echo b > /proc/sysrq-trigger). > > Fencing is everything but a clean shutdown. The specific problem is that > shutdown may be perfor

Re: [ClusterLabs] Antw: Re: Problem with stonith and starting services

2017-07-05 Thread Cesar Hernandez
>> >> >> Thanks. But I think is not a good idea to disable startup fencing: I have >> shared disks (drbd) and stonith is very important in this scenario > > AFAIK. DRBD is not considered to be a shared disk; it's a replicated disk at > best. > Of course I know it. Only 1 of the nodes can use

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-05 Thread Cesar Hernandez
> > >>> >>> But you definitely shouldn't have a fencing-agent that claims to have fenced >>> a node if it is not sure - rather the other way round if in doubt. >> >> > > True! Which is why I mentioned it to be dangerous. > But your fencing-agent is even more dangerous ;-) > > Well.. my sta

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-05 Thread Cesar Hernandez
> Not a good idea probably - and the reason for what you are experiencing ;-) > If you have problems starting the nodes within a certain time-window > disabling startup-fencing might be an option to consider although dangerous. > But you definitely shouldn't have a fencing-agent that claims to hav

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-05 Thread Cesar Hernandez
> Are you logging which ones went OK and which failed. > The script returns negatively if both go wrong? The script always returns OK ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-04 Thread Cesar Hernandez
> Might be kind of a strange race as well ... but without knowing what the > script actually does ... > The script first try to reboot the node using ssh, something like ssh $NODE reboot -f, then runs a remote reboot using AWS api Thanks ___ Users

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-04 Thread Cesar Hernandez
> The first line is the consequence of the 2nd. > And the 1st says that node2 just has seen some fencing-resource > positively reporting to have fenced himself - which > is why crmd is exiting in a way that it is not respawned > by pacemakerd. Thanks. But my script have a logfile, I've checked it

Re: [ClusterLabs] Problem with stonith and starting services

2017-07-04 Thread Cesar Hernandez
> > Agreed, I don't think it's multicast vs unicast. > > I can't see from this what's going wrong. Possibly node1 is trying to > re-fence node2 when it comes back. Check that the fencing resources are > configured correctly, and check whether node1 sees the first fencing > succeed. Thanks. Che

[ClusterLabs] Problem with stonith and starting services

2017-07-03 Thread Cesar Hernandez
Hi I have installed a pacemaker cluster with two nodes. The same type of installation has done before many times and the following error never appeared before. The situation is the following: both nodes running cluster services stop pacemaker&corosync on node 1 stop pacemaker&corosync on node 2