Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-20 Thread Andrew Beekhof
On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillot  wrote:

> Hi everybody,
>
> Currently, Pacemaker's on-fail property allows you to configure how the
> cluster reacts to operation failures. The default "restart" means try to
> restart on the same node, optionally moving to another node once
> migration-threshold is reached. Other possibilities are "ignore",
> "block", "stop", "fence", and "standby".
>
> Occasionally, we get requests to have something like migration-threshold
> for values besides restart. For example, try restarting the resource on
> the same node 3 times, then fence.
>
> I'd like to get your feedback on two alternative approaches we're
> considering.
>
> ###
>
> Our first proposed approach would add a new hard-fail-threshold
> operation property. If specified, the cluster would first try restarting
> the resource on the same node,


Well, just as now, it would be _allowed_ to start on the same node, but
this is not guaranteed.


> before doing the on-fail handling.
>
> For example, you could configure a promote operation with
> hard-fail-threshold=3 and on-fail=fence, to fence the node after 3
> failures.


> One point that's not settled is whether failures of *any* operation
> would count toward the 3 failures (which is how migration-threshold
> works now), or only failures of the specified operation.
>

I think if hard-fail-threshold is per-op, then only failures of that
operation should count.


>
> Currently, if a start fails (but is retried successfully), then a
> promote fails (but is retried successfully), then a monitor fails, the
> resource will move to another node if migration-threshold=3. We could
> keep that behavior with hard-fail-threshold, or only count monitor
> failures toward monitor's hard-fail-threshold. Each alternative has
> advantages and disadvantages.
>
> ###
>
> The second proposed approach would add a new on-restart-fail resource
> property.
>
> Same as now, on-fail set to anything but restart would be done
> immediately after the first failure. A new value, "ban", would
> immediately move the resource to another node. (on-fail=ban would behave
> like on-fail=restart with migration-threshold=1.)
>
> When on-fail=restart, and restarting on the same node doesn't work, the
> cluster would do the on-restart-fail handling. on-restart-fail would
> allow the same values as on-fail (minus "restart"), and would default to
> "ban".


I do wish you well tracking "is this a restart" across demote -> stop ->
start -> promote in 4 different transitions :-)


>
> So, if you want to fence immediately after any promote failure, you
> would still configure on-fail=fence; if you want to try restarting a few
> times first, you would configure on-fail=restart and on-restart-fail=fence.
>
> This approach keeps the current threshold behavior -- failures of any
> operation count toward the threshold. We'd rename migration-threshold to
> something like hard-fail-threshold, since it would apply to more than
> just migration, but unlike the first approach, it would stay a resource
> property.
>
> ###
>
> Comparing the two approaches, the first is more flexible, but also more
> complex and potentially confusing.
>

More complex to implement or more complex to configure?


>
> With either approach, we would deprecate the start-failure-is-fatal
> cluster property. start-failure-is-fatal=true would be equivalent to
> hard-fail-threshold=1 with the first approach, and on-fail=ban with the
> second approach. This would be both simpler and more useful -- it allows
> the value to be set differently per resource.
> --
> Ken Gaillot 
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Force Unmount - SLES 11 SP4

2016-09-20 Thread Jorge Fábregas
On 09/20/2016 12:51 PM, Kristoffer Grönlund wrote:
> The force_unmount option is available in more recent version of SLES as
> well, but not in SLES 11 SP4. You could try installing the upstream
> version of the Filesystem agent and see if that works for you.

Thanks Kristoffer for confirming.

All the best,
Jorge

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] No DRBD resource promoted to master in Active/Passive setup

2016-09-20 Thread Ken Gaillot
On 09/20/2016 07:15 AM, Auer, Jens wrote:
> Hi,
> 
> I did some more tests after updating DRBD to the latest version. The behavior 
> does not change, but I found out that
> - everything works fine when I physically unplug the network cables instead 
> of ifdown'ing the device

BTW that's a more accurate simulation of a network failure.

> - I can see in the log files that the device gets promoted after stopping the 
> initial master node, but then gets immediately demoted. I don't understand 
> why this happens:
> Sep 20 12:08:03 MDA1PFP-S02 crmd[2354]:  notice: Operation ACTIVE_start_0: ok 
> (node=MDA1PFP-PCS02, call=29, rc=0, cib-update=21, confirmed=true)
> Sep 20 12:08:03 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok 
> (node=MDA1PFP-PCS02, call=28, rc=0, cib-update=0, confirmed=true)
> Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: peer( Primary -> Secondary ) 
> Sep 20 12:08:04 MDA1PFP-S02 IPaddr2(mda-ip)[3528]: INFO: Adding inet address 
> 192.168.120.20/32 with broadcast address 192.168.120.255 to device bond0
> Sep 20 12:08:04 MDA1PFP-S02 avahi-daemon[1084]: Registering new address 
> record for 192.168.120.20 on bond0.IPv4.
> Sep 20 12:08:04 MDA1PFP-S02 IPaddr2(mda-ip)[3528]: INFO: Bringing device 
> bond0 up
> Sep 20 12:08:04 MDA1PFP-S02 IPaddr2(mda-ip)[3528]: INFO: 
> /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p 
> /var/run/resource-agents/send_arp-192.168.120.20 bond0 192.168.120.20 auto 
> not_used not_used
> Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation mda-ip_start_0: ok 
> (node=MDA1PFP-PCS02, call=31, rc=0, cib-update=23, confirmed=true)
> Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok 
> (node=MDA1PFP-PCS02, call=32, rc=0, cib-update=0, confirmed=true)
> Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok 
> (node=MDA1PFP-PCS02, call=34, rc=0, cib-update=0, confirmed=true)
> Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: peer( Secondary -> 
> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) 
> Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: ack_receiver terminated
> Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: Terminating 
> drbd_a_shared_f
> Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: Connection closed
> Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: conn( TearDown -> 
> Unconnected ) 
> Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: receiver terminated
> Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: Restarting receiver thread
> Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: receiver (re)started
> Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: conn( Unconnected -> 
> WFConnection ) 
> Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok 
> (node=MDA1PFP-PCS02, call=35, rc=0, cib-update=0, confirmed=true)
> Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok 
> (node=MDA1PFP-PCS02, call=36, rc=0, cib-update=0, confirmed=true)
> Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: helper command: 
> /sbin/drbdadm fence-peer shared_fs
> Sep 20 12:08:04 MDA1PFP-S02 crm-fence-peer.sh[3779]: invoked for shared_fs
> Sep 20 12:08:04 MDA1PFP-S02 crm-fence-peer.sh[3779]: INFO peer is not 
> reachable, my disk is UpToDate: placed constraint 
> 'drbd-fence-by-handler-shared_fs-drbd1_sync'
> Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: helper command: 
> /sbin/drbdadm fence-peer shared_fs exit code 5 (0x500)
> Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: fence-peer helper 
> returned 5 (peer is unreachable, assumed to be dead)
> Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: pdsk( DUnknown -> 
> Outdated ) 
> Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: role( Secondary -> Primary ) 

>From these logs, I don't see any request by Pacemaker for DRBD to be
promoted, so I'm wondering if DRBD decided to promote itself here.

> Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: new current UUID 
> 098EF9936C4F4D27:5157BB476E60F5AA:6BC19D97CF96E5D2:6BC09D97CF96E5D2
> Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:   error: pcmkRegisterNode: Triggered 
> assert at xml.c:594 : node->type == XML_ELEMENT_NODE
> Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_promote_0: 
> ok (node=MDA1PFP-PCS02, call=37, rc=0, cib-update=25, confirmed=true)
> Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok 
> (node=MDA1PFP-PCS02, call=38, rc=0, cib-update=0, confirmed=true)
> Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Our peer on the DC 
> (MDA1PFP-PCS01) is dead

Here, Pacemaker lost corosync connectivity to its peer. Isn't corosync
traffic on a separate interface? Or is this a different test than before?

> Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: State transition S_NOT_DC -> 
> S_ELECTION [ input=I_ELECTION cause=C_CRMD_STATUS_CALLBACK 
> origin=peer_update_callback ]
> Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: State transition 

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Dmitri Maziuk

On 2016-09-20 09:53, Ken Gaillot wrote:


I do think ifdown is not quite the best failure simulation, since there
aren't that many real-world situation that merely take an interface
down. To simulate network loss (without pulling the cable), I think
maybe using the firewall to block all traffic to and from the interface
might be better.


Or unloading the driver module to simulate NIC hardware failure.

Dep. on how close you look at the interface it may or may not matter 
that pulling the cable/the other side going down will result in NO 
CARRIER whereas firewalling it off will not.


Dima


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Dejan Muhamedagic
On Tue, Sep 20, 2016 at 01:13:23PM +, Auer, Jens wrote:
> Hi,
> 
> >> I've decided to create two answers for the two problems. The cluster
> >> still fails to relocate the resource after unloading the modules even
> >> with resource-agents 3.9.7
> > From the point of view of the resource agent,
> > you configured it to use a non-existing network.
> > Which it considers to be a configuration error,
> > which is treated by pacemaker as
> > "don't try to restart anywhere
> > but let someone else configure it properly, first".
> > Still, I have yet to see what scenario you are trying to test here.
> > To me, this still looks like "scenario evil admin".  If so, I'd not even
> > try, at least not on the pacemaker configuration level.
> It's not evil admin as this would not make sense. I am trying to find a way 
> to force a failover condition e.g. by simulating a network card defect or 
> network outage without running to the server room every time. 

Better use iptables. Bringing the interface down is not the same
as network card going bad.

Thanks,

Dejan

> > CONFIDENTIALITY NOTICE:
> > Oh please :-/
> > This is a public mailing list.
> Sorry, this is a standard disclaimer I usually remove. We are forced to add 
> this to e-mails, but I think this is fairly common for commercial companies.
> 
> >> Also the netmask and the ip address are wrong. I have configured the
> >> device to 192.168.120.10 with netmask 192.168.120.10. How does IpAddr2
> >> get the wrong configuration? I have no idea.
> >A netmask of "192.168.120.10" is nonsense.
> >That is the address, not a mask.
> Oops, my fault when writing the e-mail. Obviously this is the address. The 
> configured netmask for the device is 255.255.255.0, but after IPaddr2 brings 
> it up again it is 255.255.255.255 which is not what I configured in the 
> betwork configuration. 
> 
> > Also, according to some posts back,
> > you have configured it in pacemaker with
> > cidr_netmask=32, which is not particularly useful either.
> Thanks for pointing this out. I copied the parameters from the 
> manual/tutorial, but did not think about the values.
> 
> > Again: the IPaddr2 resource agent is supposed to control the assignment
> > of an IP address, hence the name.
> > It is not supposed to create or destroy network interfaces,
> > or configure bonding, or bridges, or anything like that.
> > In fact, it is not even supposed to bring up or down the interfaces,
> > even though for "convenience" it seems to do "ip link set up".
> This is what made me wonder in the beginning. When I bring down the device, 
> this leads to a failure of the resource agent which is exactly what I 
> expected. I did not expect it to bring the device up  again, and definitetly 
> not ignoring the default network configuration.
> 
> > Monitoring connectivity, or dealing with removed interface drivers,
> > or unplugged devices, or whatnot, has to be dealt with elsewhere.
> I am using a ping daemon for that. 
> 
> > What you did is: down the bond, remove all slave assignments, even
> > remove the driver, and expect the resource agent to "heal" things that
> > it does not know about. It can not.
> I am not expecting the RA to heal anything. How could it? And why would I 
> expect it? In fact I am expecting the opposite that is a consistent failure 
> when the device is down. This may be also wrong because you can assign ip 
> addresses to downed devices.
> 
> My initial expectation was that the resource cannot be started when the 
> device is down and then is relocated. I think this more or less the core 
> functionality of the cluster. I can see a reason why it does not switch to 
> another node when there is a configuration error in the cluster because it is 
> fair to assume that the configuration is identical (wrong) on all nodes. But 
> what happens if the network device is broken? The server would start, fail to 
> assign the ip address and then prevent the whole cluster from working? What 
> happens if the network card breaks while the cluster is running? 
> 
> Best wishes,
>   Jens
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Force Unmount - SLES 11 SP4

2016-09-20 Thread Jorge Fábregas
Hi,

I have an issue while shutting down one of our clusters.  The unmounting
of an OCFS2 filesystem (ocf:heartbeat:Filesystem) is triggering a node
fence (accordingly).  This is because the script for stopping the
application is not killing all processes using the filesystem.  Is there
a way to "force unmount" the filesystem using pacemaker as it is in SLES
11 SP4?

I searched for something related and found the "force_unmount" parameter
for ocf:heartbeat:Filesystem but it only works in RHEL (apparently it's
a newer OCF version).

It appears I'll have to deal with this out of pacemaker (perhaps thru an
init script using "fuser -k" that would run prior to openais at system
shutdown).

If anyone here using SUSE has a better idea please let me know.

Thanks,
Jorge

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] best practice fencing with ipmi in 2node-setups / cloneresource/monitor/timeout

2016-09-20 Thread Ken Gaillot
On 09/20/2016 06:42 AM, Digimer wrote:
> On 20/09/16 06:59 AM, Stefan Bauer wrote:
>> Hi,
>>
>> i run a 2 node cluster and want to be save in split-brain scenarios. For
>> this i setup external/ipmi to stonith the other node.
> 
> Please use 'fence_ipmilan'. I believe that the older external/ipmi are
> deprecated (someone correct me if I am wrong on this).

It's just an alternative. The "external/" agents come with the
cluster-glue package, which isn't provided by some distributions (such
as RHEL and its derivatives), so it's "deprecated" on those only.

>> Some possible issues jumped to my mind and i would ike to find the best
>> practice solution:
>>
>> - I have a primitive for each node to stonith. Many documents and guides
>> recommend to never let them run on the host it should fence. I would
>> setup clone resources to avoid dealing with locations that would also
>> influence scoring. Does that make sense?
> 
> Since v1.1.10 of pacemaker, you don't have to worry about this.
> Pacemaker is smart enough to know where to run a fence call from in
> order to terminate a target.

Right, fence devices can run anywhere now, and in fact they don't even
have to be "running" for pacemaker to use them -- as long as they are
configured and not intentionally disabled, pacemaker will use them.

There is still a slight advantage to not running a fence device on a
node it can fence. "Running" a fence device in pacemaker really means
running the recurring monitor for it. Since the node that runs the
monitor has "verified" access to the device, pacemaker will prefer to
use it to execute that device. However, pacemaker will not use a node to
fence itself, except as a last resort if no other node is available. So,
running a fence device on a node it can fence means that the preference
is lost.

That's a very minor detail, not worth worrying about. It's more a matter
of personal preference.

In this particular case, a more relevant concern is that you need
different configurations for the different targets (the IPMI address is
different).

One approach is to define two different fence devices, each with one
IPMI address. In that case, it makes sense to use the location
constraints to ensure the device prefers the node that's not its target.

Another approach (if the fence agent supports it) is to use
pcmk_host_map to provide a different "port" (IPMI address) depending on
which host is being fenced. In this case, you need only one fence device
to be able to fence both hosts. You don't need a clone. (Remember, the
node "running" the device merely refers to its monitor, so the cluster
can still use the fence device, even if that node crashes.)

>> - Monitoring operation on the stonith primitive is dangerous. I read
>> that if monitor operations fail for the stonith device, stonith action
>> is triggered. I think its not clever to give the cluster the option to
>> fence a node just because it has an issue to monitor a fence device.
>> That should not be a reason to shutdown a node. What is your opinion on
>> this? Can i just set the primitive monitor operation to disabled?
> 
> Monitoring is how you will detect that, for example, the IPMI cable
> failed or was unplugged. I do not believe the node will get fenced on
> fence agent monitor failing... At least not by default.

I am not aware of any situation in which a failing fence monitor
triggers a fence. Monitoring is good -- it verifies that the fence
device is still working.

One concern particular to on-board IPMI devices is that they typically
share the same power supply as their host. So if the machine loses
power, the cluster can't contact the IPMI to fence it -- which means it
will be unable to recover any resources from the lost node. (It can't
assume the node lost power -- it's possible just network connectivity
between the two nodes was lost.)

The only way around that is to have a second fence device (such as an
intelligent power switch). If the cluster can't reach the IPMI, it will
try the second device.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Auer, Jens
Hi,

>> I've decided to create two answers for the two problems. The cluster
>> still fails to relocate the resource after unloading the modules even
>> with resource-agents 3.9.7
> From the point of view of the resource agent,
> you configured it to use a non-existing network.
> Which it considers to be a configuration error,
> which is treated by pacemaker as
> "don't try to restart anywhere
> but let someone else configure it properly, first".
> Still, I have yet to see what scenario you are trying to test here.
> To me, this still looks like "scenario evil admin".  If so, I'd not even
> try, at least not on the pacemaker configuration level.
It's not evil admin as this would not make sense. I am trying to find a way to 
force a failover condition e.g. by simulating a network card defect or network 
outage without running to the server room every time. 

> CONFIDENTIALITY NOTICE:
> Oh please :-/
> This is a public mailing list.
Sorry, this is a standard disclaimer I usually remove. We are forced to add 
this to e-mails, but I think this is fairly common for commercial companies.

>> Also the netmask and the ip address are wrong. I have configured the
>> device to 192.168.120.10 with netmask 192.168.120.10. How does IpAddr2
>> get the wrong configuration? I have no idea.
>A netmask of "192.168.120.10" is nonsense.
>That is the address, not a mask.
Oops, my fault when writing the e-mail. Obviously this is the address. The 
configured netmask for the device is 255.255.255.0, but after IPaddr2 brings it 
up again it is 255.255.255.255 which is not what I configured in the betwork 
configuration. 

> Also, according to some posts back,
> you have configured it in pacemaker with
> cidr_netmask=32, which is not particularly useful either.
Thanks for pointing this out. I copied the parameters from the manual/tutorial, 
but did not think about the values.

> Again: the IPaddr2 resource agent is supposed to control the assignment
> of an IP address, hence the name.
> It is not supposed to create or destroy network interfaces,
> or configure bonding, or bridges, or anything like that.
> In fact, it is not even supposed to bring up or down the interfaces,
> even though for "convenience" it seems to do "ip link set up".
This is what made me wonder in the beginning. When I bring down the device, 
this leads to a failure of the resource agent which is exactly what I expected. 
I did not expect it to bring the device up  again, and definitetly not ignoring 
the default network configuration.

> Monitoring connectivity, or dealing with removed interface drivers,
> or unplugged devices, or whatnot, has to be dealt with elsewhere.
I am using a ping daemon for that. 

> What you did is: down the bond, remove all slave assignments, even
> remove the driver, and expect the resource agent to "heal" things that
> it does not know about. It can not.
I am not expecting the RA to heal anything. How could it? And why would I 
expect it? In fact I am expecting the opposite that is a consistent failure 
when the device is down. This may be also wrong because you can assign ip 
addresses to downed devices.

My initial expectation was that the resource cannot be started when the device 
is down and then is relocated. I think this more or less the core functionality 
of the cluster. I can see a reason why it does not switch to another node when 
there is a configuration error in the cluster because it is fair to assume that 
the configuration is identical (wrong) on all nodes. But what happens if the 
network device is broken? The server would start, fail to assign the ip address 
and then prevent the whole cluster from working? What happens if the network 
card breaks while the cluster is running? 

Best wishes,
  Jens

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Lars Ellenberg
On Tue, Sep 20, 2016 at 11:44:58AM +, Auer, Jens wrote:
> Hi,
> 
> I've decided to create two answers for the two problems. The cluster
> still fails to relocate the resource after unloading the modules even
> with resource-agents 3.9.7

>From the point of view of the resource agent,
you configured it to use a non-existing network.
Which it considers to be a configuration error,
which is treated by pacemaker as
"don't try to restart anywhere
but let someone else configure it properly, first".

I think the OCF_ERR_CONFIGURED is good, though, otherwise 
configuration errors might go unnoticed for quite some time.
A network interface is not supposed to "vanish".

You may disagree with that choice,
in which case you could edit the resource agent to treat it not as
configuration error, but as "required component not installed"
(OCF_ERR_CONFIGURED vs OCF_ERR_INSTALLED), and pacemaker will
"try to find some other node with required components available",
before giving up completely.

Still, I have yet to see what scenario you are trying to test here.
To me, this still looks like "scenario evil admin".  If so, I'd not even
try, at least not on the pacemaker configuration level.

> CONFIDENTIALITY NOTICE:

Oh please :-/
This is a public mailing list.

> There seems to be some difference because the device is not RUNNING;

> Also the netmask and the ip address are wrong. I have configured the
> device to 192.168.120.10 with netmask 192.168.120.10. How does IpAddr2
> get the wrong configuration? I have no idea.

A netmask of "192.168.120.10" is nonsense.
That is the address, not a mask.

Also, according to some posts back,
you have configured it in pacemaker with
cidr_netmask=32, which is not particularly useful either.

You should use the netmask of whatever subnet is supposedly actually
reachable via that address and interface. Typical masks are e.g.
/24, /20, /16 resp. 255.255.255.0, 255.255.240.0, 255.255.0.0

Apparently the RA is "nice" enough (or maybe buggy enough)
to let that slip, and guess the netmask from the routing tables,
or fall back to whatever builtin defaults there are on the various
layers of tools involved.

Again: the IPaddr2 resource agent is supposed to control the assignment
of an IP address, hence the name.

It is not supposed to create or destroy network interfaces,
or configure bonding, or bridges, or anything like that.

In fact, it is not even supposed to bring up or down the interfaces,
even though for "convenience" it seems to do "ip link set up".

That is not a bug, but limited scope.

If you wanted to test the reaction of the cluster to a vanishing
IP address, the correct test would be an
  "ip addr del 192.168.120.10 dev bond0"

And the expectation is that it will notice, and just re-add the address.
That is the scope of the IPaddr2 resource agent.

Monitoring connectivity, or dealing with removed interface drivers,
or unplugged devices, or whatnot, has to be dealt with elsewhere.

What you did is: down the bond, remove all slave assignments, even
remove the driver, and expect the resource agent to "heal" things that
it does not know about. It can not.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] No DRBD resource promoted to master in Active/Passive setup

2016-09-20 Thread Auer, Jens
Hi,

I did some more tests after updating DRBD to the latest version. The behavior 
does not change, but I found out that
- everything works fine when I physically unplug the network cables instead of 
ifdown'ing the device
- I can see in the log files that the device gets promoted after stopping the 
initial master node, but then gets immediately demoted. I don't understand why 
this happens:
Sep 20 12:08:03 MDA1PFP-S02 crmd[2354]:  notice: Operation ACTIVE_start_0: ok 
(node=MDA1PFP-PCS02, call=29, rc=0, cib-update=21, confirmed=true)
Sep 20 12:08:03 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok 
(node=MDA1PFP-PCS02, call=28, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: peer( Primary -> Secondary ) 
Sep 20 12:08:04 MDA1PFP-S02 IPaddr2(mda-ip)[3528]: INFO: Adding inet address 
192.168.120.20/32 with broadcast address 192.168.120.255 to device bond0
Sep 20 12:08:04 MDA1PFP-S02 avahi-daemon[1084]: Registering new address record 
for 192.168.120.20 on bond0.IPv4.
Sep 20 12:08:04 MDA1PFP-S02 IPaddr2(mda-ip)[3528]: INFO: Bringing device bond0 
up
Sep 20 12:08:04 MDA1PFP-S02 IPaddr2(mda-ip)[3528]: INFO: 
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p 
/var/run/resource-agents/send_arp-192.168.120.20 bond0 192.168.120.20 auto 
not_used not_used
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation mda-ip_start_0: ok 
(node=MDA1PFP-PCS02, call=31, rc=0, cib-update=23, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok 
(node=MDA1PFP-PCS02, call=32, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok 
(node=MDA1PFP-PCS02, call=34, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: peer( Secondary -> Unknown 
) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) 
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: ack_receiver terminated
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: Terminating drbd_a_shared_f
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: Connection closed
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: conn( TearDown -> 
Unconnected ) 
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: receiver terminated
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: Restarting receiver thread
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: receiver (re)started
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: conn( Unconnected -> 
WFConnection ) 
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok 
(node=MDA1PFP-PCS02, call=35, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok 
(node=MDA1PFP-PCS02, call=36, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: helper command: 
/sbin/drbdadm fence-peer shared_fs
Sep 20 12:08:04 MDA1PFP-S02 crm-fence-peer.sh[3779]: invoked for shared_fs
Sep 20 12:08:04 MDA1PFP-S02 crm-fence-peer.sh[3779]: INFO peer is not 
reachable, my disk is UpToDate: placed constraint 
'drbd-fence-by-handler-shared_fs-drbd1_sync'
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: helper command: 
/sbin/drbdadm fence-peer shared_fs exit code 5 (0x500)
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: fence-peer helper returned 
5 (peer is unreachable, assumed to be dead)
Sep 20 12:08:04 MDA1PFP-S02 kernel: drbd shared_fs: pdsk( DUnknown -> Outdated 
) 
Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: role( Secondary -> Primary ) 
Sep 20 12:08:04 MDA1PFP-S02 kernel: block drbd1: new current UUID 
098EF9936C4F4D27:5157BB476E60F5AA:6BC19D97CF96E5D2:6BC09D97CF96E5D2
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:   error: pcmkRegisterNode: Triggered 
assert at xml.c:594 : node->type == XML_ELEMENT_NODE
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_promote_0: ok 
(node=MDA1PFP-PCS02, call=37, rc=0, cib-update=25, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok 
(node=MDA1PFP-PCS02, call=38, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Our peer on the DC 
(MDA1PFP-PCS01) is dead
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: State transition S_NOT_DC -> 
S_ELECTION [ input=I_ELECTION cause=C_CRMD_STATUS_CALLBACK 
origin=peer_update_callback ]
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: State transition S_ELECTION -> 
S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED 
origin=election_timeout_popped ]
Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]:  notice: crm_update_peer_proc: Node 
MDA1PFP-PCS01[1] - state is now lost (was member)
Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]:  notice: Removing all MDA1PFP-PCS01 
attributes for attrd_peer_change_cb
Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]:  notice: Lost attribute writer 
MDA1PFP-PCS01
Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]:  notice: Removing MDA1PFP-PCS01/1 from 
the membership list
Sep 20 

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Auer, Jens
Hi,

one thing to add is that everything works as expected when I physically unplug 
the network cables to force a failover. 

Best wishes,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.


Von: Auer, Jens [jens.a...@cgi.com]
Gesendet: Dienstag, 20. September 2016 13:44
An: Cluster Labs - All topics related to open-source clustering welcomed
Betreff: Re: [ClusterLabs] Virtual ip resource restarted on node with down 
network device

Hi,

I've decided to create two answers for the two problems. The cluster still 
fails to relocate the resource after unloading the modules even with 
resource-agents 3.9.7
MDA1PFP-S01 11:42:50 2533 0 ~ # yum list resource-agents
Loaded plugins: langpacks, product-id, search-disabled-repos, 
subscription-manager
Installed Packages
resource-agents.x86_64  
  3.9.7-4.el7   
 @/resource-agents-3.9.7-4.el7.x86_64

Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]: warning: Action 9 (mda-ip_start_0) on 
MDA1PFP-PCS01 failed (target: 0 vs. rc: 6): Error
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]: warning: Action 9 (mda-ip_start_0) on 
MDA1PFP-PCS01 failed (target: 0 vs. rc: 6): Error
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]:  notice: Transition 5 (Complete=3, 
Pending=0, Fired=0, Skipped=0, Incomplete=1, 
Source=/var/lib/pacemaker/pengine/pe-input-552.bz2): Complete
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:  notice: On loss of CCM Quorum: 
Ignore
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]: warning: Processing failed op start 
for mda-ip on MDA1PFP-PCS01: not configured (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:   error: Preventing mda-ip from 
re-starting anywhere: operation start failed 'not configured' (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]: warning: Processing failed op start 
for mda-ip on MDA1PFP-PCS01: not configured (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:   error: Preventing mda-ip from 
re-starting anywhere: operation start failed 'not configured' (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:  notice: Stopmda-ip 
(MDA1PFP-PCS01)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:  notice: Calculated Transition 6: 
/var/lib/pacemaker/pengine/pe-input-553.bz2
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]:  notice: Initiating action 2: stop 
mda-ip_stop_0 on MDA1PFP-PCS01 (local)
Sep 20 11:42:52 MDA1PFP-S01 IPaddr2(mda-ip)[15336]: INFO: IP status = no, 
IP_CIP=
Sep 20 11:42:52 MDA1PFP-S01 lrmd[13905]:  notice: mda-ip_stop_0:15336:stderr [ 
Device "bond0" does not exist. ]
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]:  notice: Operation mda-ip_stop_0: ok 
(node=MDA1PFP-PCS01, call=18, rc=0, cib-update=48, confirmed=true)
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 93
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 93 96 98
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 93 98 9a 
9c
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Marking ringid 1 
interface 192.168.120.10 FAULTY
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 98 9c 9f 
a1
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: Transition 6 (Complete=2, 
Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-553.bz2): Complete
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: State transition 
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
origin=notify_crmd ]
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: State transition S_IDLE -> 
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]:  notice: On loss of CCM Quorum: 
Ignore
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]: warning: Processing failed op start 
for mda-ip on MDA1PFP-PCS01: not configured (6)
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]:   error: Preventing mda-ip from 
re-starting anywhere: operation start failed 'not configured' (6)
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]: warning: Forcing mda-ip away from 
MDA1PFP-PCS01 after 100 failures (max=100)
Sep 20 11:42:53 MDA1PFP-S01 

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Auer, Jens
Hi,

I've updated to resource-agents 3.9.7 which is the latest stable version, but I 
am still seeing the same issues.
MDA1PFP-S01 11:31:40 2495 130 ~ # yum list resource-agents
Loaded plugins: langpacks, product-id, search-disabled-repos, 
subscription-manager
Installed Packages
resource-agents.x86_64  
  3.9.7-4.el7   
 @/resource-agents-3.9.7-4.el7.x86_64

ifdown still shows the same behavior. Initially, I can see two ip addresses 
assigned to device bond0. After doing "ifdown bond0" on the command line, 
Pacemaker restarts the resource "successfully" but does not assign the default 
ip address to the device:
25: bond0:  mtu 1500 qdisc noqueue 
state DOWN qlen 3
link/ether 46:0a:be:70:36:11 brd ff:ff:ff:ff:ff:ff
inet 192.168.120.20/32 scope global bond0
   valid_lft forever preferred_lft forever

The log says that IPaddr2 assigns 192.168.120.20 to bond0, but nothing else:
Sep 20 11:34:25 MDA1PFP-S01 kernel: bond0: Removing slave eno49
Sep 20 11:34:25 MDA1PFP-S01 kernel: bond0: Releasing active interface eno49
Sep 20 11:34:25 MDA1PFP-S01 kernel: bond0: the permanent HWaddr of eno49 - 
5c:b9:01:9c:e7:fc - is still in use by bond0 - set the HWaddr of eno49 to a 
different address to avoid conflicts
Sep 20 11:34:25 MDA1PFP-S01 kernel: bond0: making interface eno50 the new 
active one
Sep 20 11:34:25 MDA1PFP-S01 kernel: ixgbe :04:00.0: removed PHC on eno49
Sep 20 11:34:25 MDA1PFP-S01 NetworkManager[881]:   (bond0): bond slave 
eno49 was released
Sep 20 11:34:25 MDA1PFP-S01 NetworkManager[881]:   (eno49): released from 
master bond0
Sep 20 11:34:26 MDA1PFP-S01 kernel: bond0: Removing slave eno50
Sep 20 11:34:26 MDA1PFP-S01 kernel: bond0: Releasing active interface eno50
Sep 20 11:34:26 MDA1PFP-S01 kernel: ixgbe :04:00.1: removed PHC on eno50
Sep 20 11:34:26 MDA1PFP-S01 NetworkManager[881]:   (bond0): bond slave 
eno50 was released
Sep 20 11:34:26 MDA1PFP-S01 NetworkManager[881]:   (eno50): released from 
master bond0
Sep 20 11:34:26 MDA1PFP-S01 NetworkManager[881]:   (eno50): link 
disconnected
Sep 20 11:34:26 MDA1PFP-S01 NetworkManager[881]:   (bond0): link 
disconnected
Sep 20 11:34:26 MDA1PFP-S01 avahi-daemon[912]: Withdrawing address record for 
192.168.120.10 on bond0.
Sep 20 11:34:26 MDA1PFP-S01 avahi-daemon[912]: Leaving mDNS multicast group on 
interface bond0.IPv4 with address 192.168.120.10.
Sep 20 11:34:26 MDA1PFP-S01 avahi-daemon[912]: Joining mDNS multicast group on 
interface bond0.IPv4 with address 192.168.120.20.
Sep 20 11:34:26 MDA1PFP-S01 avahi-daemon[912]: Withdrawing address record for 
192.168.120.20 on bond0.
Sep 20 11:34:26 MDA1PFP-S01 avahi-daemon[912]: Leaving mDNS multicast group on 
interface bond0.IPv4 with address 192.168.120.20.
Sep 20 11:34:26 MDA1PFP-S01 avahi-daemon[912]: Interface bond0.IPv4 no longer 
relevant for mDNS.
Sep 20 11:34:26 MDA1PFP-S01 avahi-daemon[912]: Withdrawing address record for 
fe80::5eb9:1ff:fe9c:e7fc on bond0.
Sep 20 11:34:29 MDA1PFP-S01 corosync[30167]: [TOTEM ] Retransmit List: 7e
Sep 20 11:34:29 MDA1PFP-S01 corosync[30167]: [TOTEM ] Retransmit List: 7e
Sep 20 11:34:29 MDA1PFP-S01 corosync[30167]: [TOTEM ] Marking ringid 1 
interface 192.168.120.10 FAULTY
Sep 20 11:34:29 MDA1PFP-S01 corosync[30167]: [TOTEM ] Retransmit List: 7e
Sep 20 11:34:29 MDA1PFP-S01 IPaddr2(mda-ip)[32025]: INFO: IP status = no, 
IP_CIP=
Sep 20 11:34:29 MDA1PFP-S01 crmd[30188]:  notice: Operation mda-ip_stop_0: ok 
(node=MDA1PFP-PCS01, call=9, rc=0, cib-update=17, confirmed=true)
Sep 20 11:34:29 MDA1PFP-S01 IPaddr2(mda-ip)[32072]: INFO: Adding inet address 
192.168.120.20/32 to device bond0
Sep 20 11:34:29 MDA1PFP-S01 IPaddr2(mda-ip)[32072]: INFO: Bringing device bond0 
up
Sep 20 11:34:29 MDA1PFP-S01 kernel: IPv6: ADDRCONF(NETDEV_UP): bond0: link is 
not ready
Sep 20 11:34:29 MDA1PFP-S01 avahi-daemon[912]: Joining mDNS multicast group on 
interface bond0.IPv4 with address 192.168.120.20.
Sep 20 11:34:29 MDA1PFP-S01 avahi-daemon[912]: New relevant interface 
bond0.IPv4 for mDNS.
Sep 20 11:34:29 MDA1PFP-S01 avahi-daemon[912]: Registering new address record 
for 192.168.120.20 on bond0.IPv4.
Sep 20 11:34:29 MDA1PFP-S01 IPaddr2(mda-ip)[32072]: INFO: 
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p 
/var/run/resource-agents/send_arp-192.168.120.20 bond0 192.168.120.20 auto 
not_used not_used
Sep 20 11:34:29 MDA1PFP-S01 crmd[30188]:  notice: Operation mda-ip_start_0: ok 
(node=MDA1PFP-PCS01, call=10, rc=0, cib-update=18, confirmed=true)

The VIP is reachable locally, but not from other hosts:
MDA1PFP-S01 11:36:12 2526 0 ~ # ping 192.168.120.20
PING 192.168.120.20 (192.168.120.20) 56(84) bytes of data.
64 bytes from 192.168.120.20: icmp_seq=1 ttl=64 time=0.027 ms
64 bytes from 192.168.120.20: icmp_seq=2 ttl=64 time=0.016 ms
64 bytes from 192.168.120.20: icmp_seq=3 

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Auer, Jens
Hi,

I've decided to create two answers for the two problems. The cluster still 
fails to relocate the resource after unloading the modules even with 
resource-agents 3.9.7
MDA1PFP-S01 11:42:50 2533 0 ~ # yum list resource-agents
Loaded plugins: langpacks, product-id, search-disabled-repos, 
subscription-manager
Installed Packages
resource-agents.x86_64  
  3.9.7-4.el7   
 @/resource-agents-3.9.7-4.el7.x86_64

Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]: warning: Action 9 (mda-ip_start_0) on 
MDA1PFP-PCS01 failed (target: 0 vs. rc: 6): Error
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]: warning: Action 9 (mda-ip_start_0) on 
MDA1PFP-PCS01 failed (target: 0 vs. rc: 6): Error
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]:  notice: Transition 5 (Complete=3, 
Pending=0, Fired=0, Skipped=0, Incomplete=1, 
Source=/var/lib/pacemaker/pengine/pe-input-552.bz2): Complete
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:  notice: On loss of CCM Quorum: 
Ignore
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]: warning: Processing failed op start 
for mda-ip on MDA1PFP-PCS01: not configured (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:   error: Preventing mda-ip from 
re-starting anywhere: operation start failed 'not configured' (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]: warning: Processing failed op start 
for mda-ip on MDA1PFP-PCS01: not configured (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:   error: Preventing mda-ip from 
re-starting anywhere: operation start failed 'not configured' (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:  notice: Stopmda-ip 
(MDA1PFP-PCS01)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:  notice: Calculated Transition 6: 
/var/lib/pacemaker/pengine/pe-input-553.bz2
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]:  notice: Initiating action 2: stop 
mda-ip_stop_0 on MDA1PFP-PCS01 (local)
Sep 20 11:42:52 MDA1PFP-S01 IPaddr2(mda-ip)[15336]: INFO: IP status = no, 
IP_CIP=
Sep 20 11:42:52 MDA1PFP-S01 lrmd[13905]:  notice: mda-ip_stop_0:15336:stderr [ 
Device "bond0" does not exist. ]
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]:  notice: Operation mda-ip_stop_0: ok 
(node=MDA1PFP-PCS01, call=18, rc=0, cib-update=48, confirmed=true)
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 93
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 93 96 98
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 93 98 9a 
9c
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Marking ringid 1 
interface 192.168.120.10 FAULTY
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 98 9c 9f 
a1
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: Transition 6 (Complete=2, 
Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-553.bz2): Complete
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: State transition 
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
origin=notify_crmd ]
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: State transition S_IDLE -> 
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]:  notice: On loss of CCM Quorum: 
Ignore
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]: warning: Processing failed op start 
for mda-ip on MDA1PFP-PCS01: not configured (6)
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]:   error: Preventing mda-ip from 
re-starting anywhere: operation start failed 'not configured' (6)
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]: warning: Forcing mda-ip away from 
MDA1PFP-PCS01 after 100 failures (max=100)
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]:  notice: Calculated Transition 7: 
/var/lib/pacemaker/pengine/pe-input-554.bz2
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: Transition 7 (Complete=0, 
Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-554.bz2): Complete
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: State transition 
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
origin=notify_crmd ]
Sep 20 11:43:02 MDA1PFP-S01 crmd[13908]:  notice: State transition S_IDLE -> 
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Sep 20 11:43:02 MDA1PFP-S01 pengine[13907]:  notice: On loss of CCM Quorum: 
Ignore
Sep 20 11:43:02 MDA1PFP-S01 pengine[13907]: warning: Processing failed op start 
for mda-ip on MDA1PFP-PCS01: not configured (6)
Sep 20 11:43:02 MDA1PFP-S01 pengine[13907]:   error: Preventing mda-ip from 
re-starting anywhere: operation start failed 'not configured' (6)
Sep 20 11:43:02 MDA1PFP-S01 pengine[13907]: warning: Forcing mda-ip away from 
MDA1PFP-PCS01 after 100 failures (max=100)
Sep 20 11:43:02 MDA1PFP-S01 pengine[13907]:  notice: Calculated Transition 8: 
/var/lib/pacemaker/pengine/pe-input-555.bz2
Sep 20 

Re: [ClusterLabs] best practice fencing with ipmi in 2node-setups / cloneresource/monitor/timeout

2016-09-20 Thread Digimer
On 20/09/16 06:59 AM, Stefan Bauer wrote:
> Hi,
> 
> i run a 2 node cluster and want to be save in split-brain scenarios. For
> this i setup external/ipmi to stonith the other node.

Please use 'fence_ipmilan'. I believe that the older external/ipmi are
deprecated (someone correct me if I am wrong on this).

> Some possible issues jumped to my mind and i would ike to find the best
> practice solution:
> 
> - I have a primitive for each node to stonith. Many documents and guides
> recommend to never let them run on the host it should fence. I would
> setup clone resources to avoid dealing with locations that would also
> influence scoring. Does that make sense?

Since v1.1.10 of pacemaker, you don't have to worry about this.
Pacemaker is smart enough to know where to run a fence call from in
order to terminate a target.

> - Monitoring operation on the stonith primitive is dangerous. I read
> that if monitor operations fail for the stonith device, stonith action
> is triggered. I think its not clever to give the cluster the option to
> fence a node just because it has an issue to monitor a fence device.
> That should not be a reason to shutdown a node. What is your opinion on
> this? Can i just set the primitive monitor operation to disabled?

Monitoring is how you will detect that, for example, the IPMI cable
failed or was unplugged. I do not believe the node will get fenced on
fence agent monitor failing... At least not by default.


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] best practice fencing with ipmi in 2node-setups / cloneresource/monitor/timeout

2016-09-20 Thread Stefan Bauer
Hi,



i run a 2 node cluster and want to be save in split-brain scenarios. For this i 
setup external/ipmi to stonith the other node.



Some possible issues jumped to my mind and i would ike to find the best 
practice solution:



- I have a primitive for each node to stonith. Many documents and guides 
recommend to never let them run on the host it should fence. I would setup 
clone resources to avoid dealing with locations that would also influence 
scoring. Does that make sense?



- Monitoring operation on the stonith primitive is dangerous. I read that if 
monitor operations fail for the stonith device, stonith action is triggered. I 
think its not clever to give the cluster the option to fence a node just 
because it has an issue to monitor a fence device. That should not be a reason 
to shutdown a node. What is your opinion on this? Can i just set the primitive 
monitor operation to disabled?



Thank you.



Stefan

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync-quorum tool, output name key on Name column if set?

2016-09-20 Thread Christine Caulfield
On 20/09/16 10:46, Thomas Lamprecht wrote:
> Hi,
> 
> when I'm using corosync-quorumtool [-l] and have my ring0_addr set to a
> IP address,
> which does not resolve to a hostname, I get the nodes IP addresses for
> the 'Name' column.
> 
> As I'm using the nodelist.node.X.name key to set the name of a node it
> seems a bit confusing
> to me that not this one gets preferred or at least also outputted. Its
> quite a minor issue if
> not nit picking but as I associate my nodes with there name I.
> 
> I'd be ready to assemble a patch and one possibility would be adapting
> the output to something
> like:
> 
>> # corosync-quorumtool
>>
>> Quorum information
>> --
>> Date: Tue Sep 20 11:12:14 2016
>> Quorum provider:  corosync_votequorum
>> Nodes:3
>> Node ID:  1
>> Ring ID:  1/1784
>> Quorate:  Yes
>>
>> Votequorum information
>> --
>> Expected votes:   3
>> Highest expected: 3
>> Total votes:  3
>> Quorum:   2
>> Flags:Quorate
>>
>> Membership information
>> --
>> Nodeid  Votes Name ring0_addr
>>  1  1 uno  10.10.20.1 (local)
>>  2  1 due  10.10.20.2
>>  3  1 tre  10.10.20.3
>>
> 
> And respective:
> 
>>
>> # corosync-quorumtool -l
>>
>> Membership information
>> --
>> Nodeid  Votes Name ring0_addr
>>  1  1 uno  10.10.20.1 (local)
>>  2  1 due  10.10.20.2
>>  3  1 tre  10.10.20.3
> 
> additional ring1_addr could be also outputted if set.
> 
> This would be just a general idea, if there are suggestions I'll gladly
> hear them.
> 
> As such a change may be not ideal during a stable release, e.g as
> corosync user could
> parse the corosync-quorumtool output (I mean there are quite better
> places to get the
> info but there may be still user doing this)  another possibility would
> be adding an
> option flag to corosync similar to '-i' (show node IP addresses instead
> of the resolved
> name) which then shows the nodelist.node.X.name value instead of IP or
> resolved name.
> 
> Another, third, option would be letting the output as is but if the '-i'
> option is not
> set prefer the nodelist.node.X.name over the resolved hostname and fall
> back to IP if
> both are not available.
> I'd almost prefer this change the most, it lets the output as it is and
> it seems logical
> that the Name column outputs the name key if possible, imo.
> 
> Would such a patch be welcomed or is this just something I find a little
> strange?

Hi Tomas,

I'd be happy to receive such a patch. The main reason it's not done this
way is that it's not always obvious how to resolve a name from it's IP
address. If corosync.conf has a nodelist then using that does seem like
the best option though (and bear in mind that more than 1 ring is
possible). If corosync.conf is set up to use multicast then we have no
choice to guess at what the name might be (as happens now).

Most of corosync-quorumtool was written when nodelist was not the
dominant way of configuring a cluster which is why it is the way it is
at the moment.

As to what should be the default and which options are most useful, I
would be interested to hear the views of the community as to what they
would like to see :)

Chrissie

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] group resources without order behavior / monitor timeout smaller than interval?

2016-09-20 Thread Dejan Muhamedagic
Hi,

On Wed, Sep 14, 2016 at 02:41:10PM -0500, Ken Gaillot wrote:
> On 09/14/2016 03:01 AM, Stefan Bauer wrote:
> > Hi,
> > 
> > I'm trying to understand some cluster internals and would be happy to
> > get some best practice recommendations:
> > 
> > monitor interval and timeout: shouldn't timeout value always be smaller
> > than interval to avoid another check even though the first is not over yet?
> 
> The cluster handles it intelligently. If the previous monitor is still
> in progress when the interval expires, it won't run another one.

The lrmd which got replaced would schedule the next monitor
operation only once the current monitor operation finished, hence
the timeout value was essentialy irrelevant. Is that still the
case with the new lrmd?

> It certainly makes sense that the timeout will generally be smaller than
> the interval, but there may be cases where a monitor on rare occasions
> takes a long time, and the user wants the high timeout for those
> occasions, but a shorter interval that will be used most of the time.

Just to add that there's a tendency to make monitor intervals
quite short, often without taking a good look at the nature of
the resource.

Thanks,

Dejan

> > Additionally i would like to use the group function to put all my VMS
> > (ocf:heartbeat:VirtualDomain) in one group and colocate the group with
> > the VIP and my LVM-volume. Unfortunately group function starts the
> > resources in the listed order. So if i stop one VM, the following VMs
> > are also stopped.
> > 
> > Right now I'm having the following configuration and want to make it
> > less redundant:
> 
> You can use one ordering constraint and one colocation constraint, each
> with two resource sets, one containing the IP and volume with
> sequential=true, and the other containing the VMs with sequential=false.
> See:
> 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-resource-sets
> 
> 
> > 
> > # never let the stonith_service run on the host to stonith
> > 
> > location l_st_srv20 st_ipmi_srv20 -inf: srv20
> > location l_st_srv21 st_ipmi_srv21 -inf: srv21
> > 
> > 
> > # do not run resources on quorum only node
> > location loc_r_lvm_vg-storage_quorum_only_node r_lvm_vg-storage -inf:
> > quorum_only_node
> > location loc_r_vm_ado01_quorum_only_node r_vm_ado01 -inf: quorum_only_node
> > location loc_r_vm_bar01_quorum_only_node r_vm_bar01 -inf: quorum_only_node
> > location loc_r_vm_cmt01_quorum_only_node r_vm_cmt01 -inf: quorum_only_node
> > location loc_r_vm_con01_quorum_only_node r_vm_con01 -inf: quorum_only_node
> > location loc_r_vm_con02_quorum_only_node r_vm_con02 -inf: quorum_only_node
> > location loc_r_vm_dsm01_quorum_only_node r_vm_dsm01 -inf: quorum_only_node
> > location loc_r_vm_jir01_quorum_only_node r_vm_jir01 -inf: quorum_only_node
> > location loc_r_vm_jir02_quorum_only_node r_vm_jir02 -inf: quorum_only_node
> > location loc_r_vm_prx02_quorum_only_node r_vm_prx02 -inf: quorum_only_node
> > location loc_r_vm_src01_quorum_only_node r_vm_src01 -inf: quorum_only_node
> > 
> > 
> > # colocate ip with lvm storage
> > colocation col_r_Failover_IP_r_lvm_vg-storage inf: r_Failover_IP
> > r_lvm_vg-storage
> > 
> > 
> > # colocate each VM with lvm storage
> > colocation col_r_vm_ado01_r_lvm_vg-storage inf: r_vm_ado01 r_lvm_vg-storage
> > colocation col_r_vm_bar01_r_lvm_vg-storage inf: r_vm_bar01 r_lvm_vg-storage
> > colocation col_r_vm_cmt01_r_lvm_vg-storage inf: r_vm_cmt01 r_lvm_vg-storage
> > colocation col_r_vm_con01_r_lvm_vg-storage inf: r_vm_jir01 r_lvm_vg-storage
> > colocation col_r_vm_con02_r_lvm_vg-storage inf: r_vm_con02 r_lvm_vg-storage
> > colocation col_r_vm_dsm01_r_lvm_vg-storage inf: r_vm_dsm01 r_lvm_vg-storage
> > colocation col_r_vm_jir01_r_lvm_vg-storage inf: r_vm_con01 r_lvm_vg-storage
> > colocation col_r_vm_jir02_r_lvm_vg-storage inf: r_vm_jir02 r_lvm_vg-storage
> > colocation col_r_vm_prx02_r_lvm_vg-storage inf: r_vm_prx02 r_lvm_vg-storage
> > colocation col_r_vm_src01_r_lvm_vg-storage inf: r_vm_src01 r_lvm_vg-storage
> > 
> > # start lvm storage before VIP
> > 
> > order ord_r_lvm_vg-storage_r_Failover_IP inf: r_lvm_vg-storage r_Failover_IP
> > 
> > 
> > # start lvm storage before each VM
> > order ord_r_lvm_vg-storage_r_vm_ado01 inf: r_lvm_vg-storage r_vm_ado01
> > order ord_r_lvm_vg-storage_r_vm_bar01 inf: r_lvm_vg-storage r_vm_bar01
> > order ord_r_lvm_vg-storage_r_vm_cmt01 inf: r_lvm_vg-storage r_vm_cmt01
> > order ord_r_lvm_vg-storage_r_vm_con01 inf: r_lvm_vg-storage r_vm_con01
> > order ord_r_lvm_vg-storage_r_vm_con02 inf: r_lvm_vg-storage r_vm_con02
> > order ord_r_lvm_vg-storage_r_vm_dsm01 inf: r_lvm_vg-storage r_vm_dsm01
> > order ord_r_lvm_vg-storage_r_vm_jir01 inf: r_lvm_vg-storage r_vm_jir01
> > order ord_r_lvm_vg-storage_r_vm_jir02 inf: r_lvm_vg-storage r_vm_jir02
> > order ord_r_lvm_vg-storage_r_vm_prx02 inf: r_lvm_vg-storage r_vm_prx02
> > order ord_r_lvm_vg-storage_r_vm_src01 inf: r_lvm_vg-storage r_vm_src01
> > 
> > 

[ClusterLabs] corosync-quorum tool, output name key on Name column if set?

2016-09-20 Thread Thomas Lamprecht

Hi,

when I'm using corosync-quorumtool [-l] and have my ring0_addr set to a 
IP address,
which does not resolve to a hostname, I get the nodes IP addresses for 
the 'Name' column.


As I'm using the nodelist.node.X.name key to set the name of a node it 
seems a bit confusing
to me that not this one gets preferred or at least also outputted. Its 
quite a minor issue if

not nit picking but as I associate my nodes with there name I.

I'd be ready to assemble a patch and one possibility would be adapting 
the output to something

like:


# corosync-quorumtool

Quorum information
--
Date: Tue Sep 20 11:12:14 2016
Quorum provider:  corosync_votequorum
Nodes:3
Node ID:  1
Ring ID:  1/1784
Quorate:  Yes

Votequorum information
--
Expected votes:   3
Highest expected: 3
Total votes:  3
Quorum:   2
Flags:Quorate

Membership information
--
Nodeid  Votes Name ring0_addr
 1  1 uno  10.10.20.1 (local)
 2  1 due  10.10.20.2
 3  1 tre  10.10.20.3



And respective:



# corosync-quorumtool -l

Membership information
--
Nodeid  Votes Name ring0_addr
 1  1 uno  10.10.20.1 (local)
 2  1 due  10.10.20.2
 3  1 tre  10.10.20.3


additional ring1_addr could be also outputted if set.

This would be just a general idea, if there are suggestions I'll gladly 
hear them.


As such a change may be not ideal during a stable release, e.g as 
corosync user could
parse the corosync-quorumtool output (I mean there are quite better 
places to get the
info but there may be still user doing this)  another possibility would 
be adding an
option flag to corosync similar to '-i' (show node IP addresses instead 
of the resolved
name) which then shows the nodelist.node.X.name value instead of IP or 
resolved name.


Another, third, option would be letting the output as is but if the '-i' 
option is not
set prefer the nodelist.node.X.name over the resolved hostname and fall 
back to IP if

both are not available.
I'd almost prefer this change the most, it lets the output as it is and 
it seems logical

that the Name column outputs the name key if possible, imo.

Would such a patch be welcomed or is this just something I find a little 
strange?


Thanks,
Thomas



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org