[ClusterLabs] Force Unmount - SLES 11 SP4

2016-09-20 Thread Jorge Fábregas
Hi, I have an issue while shutting down one of our clusters. The unmounting of an OCFS2 filesystem (ocf:heartbeat:Filesystem) is triggering a node fence (accordingly). This is because the script for stopping the application is not killing all processes using the filesystem. Is there a way to

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Dmitri Maziuk
On 2016-09-20 09:53, Ken Gaillot wrote: I do think ifdown is not quite the best failure simulation, since there aren't that many real-world situation that merely take an interface down. To simulate network loss (without pulling the cable), I think maybe using the firewall to block all traffic

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Dejan Muhamedagic
On Tue, Sep 20, 2016 at 01:13:23PM +, Auer, Jens wrote: > Hi, > > >> I've decided to create two answers for the two problems. The cluster > >> still fails to relocate the resource after unloading the modules even > >> with resource-agents 3.9.7 > > From the point of view of the resource

Re: [ClusterLabs] No DRBD resource promoted to master in Active/Passive setup

2016-09-20 Thread Ken Gaillot
On 09/20/2016 07:15 AM, Auer, Jens wrote: > Hi, > > I did some more tests after updating DRBD to the latest version. The behavior > does not change, but I found out that > - everything works fine when I physically unplug the network cables instead > of ifdown'ing the device BTW that's a more

[ClusterLabs] corosync-quorum tool, output name key on Name column if set?

2016-09-20 Thread Thomas Lamprecht
Hi, when I'm using corosync-quorumtool [-l] and have my ring0_addr set to a IP address, which does not resolve to a hostname, I get the nodes IP addresses for the 'Name' column. As I'm using the nodelist.node.X.name key to set the name of a node it seems a bit confusing to me that not this

Re: [ClusterLabs] group resources without order behavior / monitor timeout smaller than interval?

2016-09-20 Thread Dejan Muhamedagic
Hi, On Wed, Sep 14, 2016 at 02:41:10PM -0500, Ken Gaillot wrote: > On 09/14/2016 03:01 AM, Stefan Bauer wrote: > > Hi, > > > > I'm trying to understand some cluster internals and would be happy to > > get some best practice recommendations: > > > > monitor interval and timeout: shouldn't

Re: [ClusterLabs] corosync-quorum tool, output name key on Name column if set?

2016-09-20 Thread Christine Caulfield
On 20/09/16 10:46, Thomas Lamprecht wrote: > Hi, > > when I'm using corosync-quorumtool [-l] and have my ring0_addr set to a > IP address, > which does not resolve to a hostname, I get the nodes IP addresses for > the 'Name' column. > > As I'm using the nodelist.node.X.name key to set the name

[ClusterLabs] best practice fencing with ipmi in 2node-setups / cloneresource/monitor/timeout

2016-09-20 Thread Stefan Bauer
Hi, i run a 2 node cluster and want to be save in split-brain scenarios. For this i setup external/ipmi to stonith the other node. Some possible issues jumped to my mind and i would ike to find the best practice solution: - I have a primitive for each node to stonith. Many documents and

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Auer, Jens
Hi, I've updated to resource-agents 3.9.7 which is the latest stable version, but I am still seeing the same issues. MDA1PFP-S01 11:31:40 2495 130 ~ # yum list resource-agents Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager Installed Packages

Re: [ClusterLabs] best practice fencing with ipmi in 2node-setups / cloneresource/monitor/timeout

2016-09-20 Thread Digimer
On 20/09/16 06:59 AM, Stefan Bauer wrote: > Hi, > > i run a 2 node cluster and want to be save in split-brain scenarios. For > this i setup external/ipmi to stonith the other node. Please use 'fence_ipmilan'. I believe that the older external/ipmi are deprecated (someone correct me if I am wrong

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Auer, Jens
Hi, I've decided to create two answers for the two problems. The cluster still fails to relocate the resource after unloading the modules even with resource-agents 3.9.7 MDA1PFP-S01 11:42:50 2533 0 ~ # yum list resource-agents Loaded plugins: langpacks, product-id, search-disabled-repos,

Re: [ClusterLabs] No DRBD resource promoted to master in Active/Passive setup

2016-09-20 Thread Auer, Jens
Hi, I did some more tests after updating DRBD to the latest version. The behavior does not change, but I found out that - everything works fine when I physically unplug the network cables instead of ifdown'ing the device - I can see in the log files that the device gets promoted after stopping

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Auer, Jens
Hi, one thing to add is that everything works as expected when I physically unplug the network cables to force a failover. Best wishes, Jens -- Jens Auer | CGI | Software-Engineer CGI (Germany) GmbH & Co. KG Rheinstraße 95 | 64295 Darmstadt | Germany T: +49 6151 36860 154 jens.a...@cgi.com

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Lars Ellenberg
On Tue, Sep 20, 2016 at 11:44:58AM +, Auer, Jens wrote: > Hi, > > I've decided to create two answers for the two problems. The cluster > still fails to relocate the resource after unloading the modules even > with resource-agents 3.9.7 >From the point of view of the resource agent, you

Re: [ClusterLabs] RFC: allowing soft recovery attempts before ignore/block/etc.

2016-09-20 Thread Andrew Beekhof
On Wed, Sep 21, 2016 at 6:25 AM, Ken Gaillot wrote: > Hi everybody, > > Currently, Pacemaker's on-fail property allows you to configure how the > cluster reacts to operation failures. The default "restart" means try to > restart on the same node, optionally moving to another

Re: [ClusterLabs] Force Unmount - SLES 11 SP4

2016-09-20 Thread Jorge Fábregas
On 09/20/2016 12:51 PM, Kristoffer Grönlund wrote: > The force_unmount option is available in more recent version of SLES as > well, but not in SLES 11 SP4. You could try installing the upstream > version of the Filesystem agent and see if that works for you. Thanks Kristoffer for confirming.

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Auer, Jens
Hi, >> I've decided to create two answers for the two problems. The cluster >> still fails to relocate the resource after unloading the modules even >> with resource-agents 3.9.7 > From the point of view of the resource agent, > you configured it to use a non-existing network. > Which it

Re: [ClusterLabs] best practice fencing with ipmi in 2node-setups / cloneresource/monitor/timeout

2016-09-20 Thread Ken Gaillot
On 09/20/2016 06:42 AM, Digimer wrote: > On 20/09/16 06:59 AM, Stefan Bauer wrote: >> Hi, >> >> i run a 2 node cluster and want to be save in split-brain scenarios. For >> this i setup external/ipmi to stonith the other node. > > Please use 'fence_ipmilan'. I believe that the older external/ipmi