Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

Auer, Jens Tue, 20 Sep 2016 06:28:06 -0700

Hi,

>> I've decided to create two answers for the two problems. The cluster
>> still fails to relocate the resource after unloading the modules even
>> with resource-agents 3.9.7
> From the point of view of the resource agent,
> you configured it to use a non-existing network.
> Which it considers to be a configuration error,
> which is treated by pacemaker as
> "don't try to restart anywhere
> but let someone else configure it properly, first".
> Still, I have yet to see what scenario you are trying to test here.
> To me, this still looks like "scenario evil admin".  If so, I'd not even
> try, at least not on the pacemaker configuration level.
It's not evil admin as this would not make sense. I am trying to find a way to 
force a failover condition e.g. by simulating a network card defect or network 
outage without running to the server room every time.


> CONFIDENTIALITY NOTICE:
> Oh please :-/
> This is a public mailing list.
Sorry, this is a standard disclaimer I usually remove. We are forced to add 
this to e-mails, but I think this is fairly common for commercial companies.

>> Also the netmask and the ip address are wrong. I have configured the
>> device to 192.168.120.10 with netmask 192.168.120.10. How does IpAddr2
>> get the wrong configuration? I have no idea.
>A netmask of "192.168.120.10" is nonsense.
>That is the address, not a mask.
Oops, my fault when writing the e-mail. Obviously this is the address. The 
configured netmask for the device is 255.255.255.0, but after IPaddr2 brings it 
up again it is 255.255.255.255 which is not what I configured in the betwork 
configuration. 

> Also, according to some posts back,
> you have configured it in pacemaker with
> cidr_netmask=32, which is not particularly useful either.
Thanks for pointing this out. I copied the parameters from the manual/tutorial, 
but did not think about the values.

> Again: the IPaddr2 resource agent is supposed to control the assignment
> of an IP address, hence the name.
> It is not supposed to create or destroy network interfaces,
> or configure bonding, or bridges, or anything like that.
> In fact, it is not even supposed to bring up or down the interfaces,
> even though for "convenience" it seems to do "ip link set up".
This is what made me wonder in the beginning. When I bring down the device, 
this leads to a failure of the resource agent which is exactly what I expected. 
I did not expect it to bring the device up  again, and definitetly not ignoring 
the default network configuration.

> Monitoring connectivity, or dealing with removed interface drivers,
> or unplugged devices, or whatnot, has to be dealt with elsewhere.
I am using a ping daemon for that. 

> What you did is: down the bond, remove all slave assignments, even
> remove the driver, and expect the resource agent to "heal" things that
> it does not know about. It can not.
I am not expecting the RA to heal anything. How could it? And why would I 
expect it? In fact I am expecting the opposite that is a consistent failure 
when the device is down. This may be also wrong because you can assign ip 
addresses to downed devices.

My initial expectation was that the resource cannot be started when the device 
is down and then is relocated. I think this more or less the core functionality 
of the cluster. I can see a reason why it does not switch to another node when 
there is a configuration error in the cluster because it is fair to assume that 
the configuration is identical (wrong) on all nodes. But what happens if the 
network device is broken? The server would start, fail to assign the ip address 
and then prevent the whole cluster from working? What happens if the network 
card breaks while the cluster is running? 

Best wishes,
  Jens

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

Reply via email to