Re: [ClusterLabs] Pacemaker failover failure

2015-07-14 Thread Digimer
As said before, fencing.

On 01/07/15 06:54 AM, alex austin wrote:
> so did another test:
> 
> two nodes: node1 and node2
> 
> Case: node1 is the active node
> node2: is pasive
> 
> if I killall -9 pacemakerd corosync on node 1 the services do not fail
> over to node2, but if I start corosync and pacemaker on node1 then it
> fails over to node 2.
> 
> Where am I mistaking?
> 
> Alex
> 
> On Wed, Jul 1, 2015 at 12:42 PM, alex austin  > wrote:
> 
> Hi all,
> 
> I have configured a virtual ip and redis in master-slave with
> corosync pacemaker. If redis fails, then the failover is successful,
> and redis gets promoted on the other node. However if pacemaker
> itself fails on the active node, the failover is not performed. Is
> there anything I missed in the configuration?
> 
> Here's my configuration (i have hashed the ip address out):
> 
> node host1.com 
> 
> nodehost2.com 
> 
> primitiveClusterIP IPaddr2 \
> 
> paramsip=xxx.xxx.xxx.xxx cidr_netmask=23\
> 
> opmonitor interval=1stimeout=20s\
> 
> opstart interval=0timeout=20s\
> 
> opstop interval=0timeout=20s\
> 
> metais-managed=truetarget-role=Startedresource-stickiness=500
> 
> primitiveredis redis \
> 
> metatarget-role=Masteris-managed=true\
> 
> opmonitor interval=1srole=Mastertimeout=5son-fail=restart
> 
> msredis_clone redis\
> 
> 
> metanotify=trueis-managed=trueordered=falseinterleave=falseglobally-unique=falsetarget-role=Mastermigration-threshold=1
> 
> colocationClusterIP-on-redis inf: ClusterIPredis_clone:Master
> 
> colocationip-on-redis inf: ClusterIPredis_clone:Master
> 
> propertycib-bootstrap-options: \
> 
> dc-version=1.1.11-97629de\
> 
> cluster-infrastructure="classic openais (with plugin)"\
> 
> expected-quorum-votes=2\
> 
> stonith-enabled=false
> 
> propertyredis_replication: \
> 
> redis_REPL_INFO=host.com 
> 
> 
> thank you in advance
> 
> 
> Kind regards,
> 
> 
> Alex 
> 
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker failover failure

2015-07-14 Thread Andrei Borzenkov
В Wed, 15 Jul 2015 00:31:45 +0100
alex austin  пишет:

> Unfortunately I have nothing yet ...
> 
> There's something I don't quite understand though. What's the role of
> stonith if the other machine crashes unexpectedly and totally unclean? Is
> it to reboot the machine and recreate the cluster, thus making the drbd
> volume available again? or is it other?
> 

The role of stonith is to ensure known "offline" state of unclean node,
so that other nodes can start resources that were previously active on
unclean node. Without stonith remaining nodes cannot decide whether
unclean node is no more functional or whether it is simply
communication channel between nodes. 

> The way I see it, even if stonith is functional the other node's drbd
> filesystem will not be acessible until the crashed node is back up, is this
> correct?
> 

No, that is something in your configuration. Unfortunately I do not
have experience specifically with DRBD so cannot comment. But in
general once other node knows that crashed node is definitely down it
should allow full read-write access to local DRBD copy.

I'm not sure what "other's node drbd filesystem" means here though ...
DRBD is replicated by definition and does not belong to one or another
node. It has two replicas, and each replica is physically owned by one
node; nodes coordinate and replicate updates of local copy to copy on
its partner.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker failover failure

2015-07-14 Thread alex austin
vmuas005.edc.nam.gm.com \
> > > >>>>
> > > >>>> attributes standby=off
> > > >>>>
> > > >>>> primitive ClusterIP IPaddr2 \
> > > >>>>
> > > >>>> params ip=198.208.86.242 cidr_netmask=23 \
> > > >>>>
> > > >>>> op monitor interval=1s timeout=20s \
> > > >>>>
> > > >>>> op start interval=0 timeout=20s \
> > > >>>>
> > > >>>> op stop interval=0 timeout=20s \
> > > >>>>
> > > >>>> meta is-managed=true target-role=Started
> > > resource-stickiness=500
> > > >>>>
> > > >>>> primitive pcmk-fencing stonith:fence_pcmk \
> > > >>>>
> > > >>>> params pcmk_host_list="dcwbpvmuas004.edc.nam.gm.com
> > > >>>> dcwbpvmuas005.edc.nam.gm.com" \
> > > >>>>
> > > >>>> op monitor interval=10s \
> > > >>>>
> > > >>>> meta target-role=Started
> > > >>>>
> > > >>>> primitive redis redis \
> > > >>>>
> > > >>>> meta target-role=Master is-managed=true \
> > > >>>>
> > > >>>> op monitor interval=1s role=Master timeout=5s
> on-fail=restart
> > > >>>>
> > > >>>> ms redis_clone redis \
> > > >>>>
> > > >>>> meta notify=true is-managed=true ordered=false
> > > interleave=false
> > > >>>> globally-unique=false target-role=Master migration-threshold=1
> > > >>>>
> > > >>>> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
> > > >>>>
> > > >>>> colocation ip-on-redis inf: ClusterIP redis_clone:Master
> > > >>>>
> > > >>>> colocation pcmk-fencing-on-redis inf: pcmk-fencing
> redis_clone:Master
> > > >>>>
> > > >>>> property cib-bootstrap-options: \
> > > >>>>
> > > >>>> dc-version=1.1.11-97629de \
> > > >>>>
> > > >>>> cluster-infrastructure="classic openais (with plugin)" \
> > > >>>>
> > > >>>> expected-quorum-votes=2 \
> > > >>>>
> > > >>>> stonith-enabled=true
> > > >>>>
> > > >>>> property redis_replication: \
> > > >>>>
> > > >>>> redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com
> > > >>>>
> > > >>>> On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander <
> > > >>>> alexander.nekra...@emc.com> wrote:
> > > >>>>
> > > >>>>> stonith-enabled=false
> > > >>>>>
> > > >>>>> this might be the issue. The way peer node death is resolved, the
> > > >>>>> surviving node must call STONITH on the peer. If it’s disabled it
> > > >>> might not
> > > >>>>> be able to resolve the event
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> Alex
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> *From:* alex austin [mailto:alexixa...@gmail.com]
> > > >>>>> *Sent:* Wednesday, July 01, 2015 9:51 AM
> > > >>>>> *To:* Users@clusterlabs.org
> > > >>>>> *Subject:* Re: [ClusterLabs] Pacemaker failover failure
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> So I noticed that if I kill redis on one node, it starts on the
> > > other,
> > > >>> no
> > > >>>>> problem, but if I actually kill pacemaker itself on one node, the
> > > other
> > > >>>>> doesn't "sense" it so it doesn't fail over.
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On Wed, Jul 1, 2015 at 12:42 PM, alex austin <
> alexixa...@gmail.com>
> > > >>> wrote:
> > > >>>>>
> > > >>>>> Hi all,
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> I have configured a virtual ip and redis in master-slave with
> > > corosync
> > > >>>>> pacemaker. If redis fails, then the failover is successful, and
> redis
> > > >>> gets
> > > >>>>> promoted on the other node. However if pacemaker itself fails on
> the
> > > >>> active
> > > >>>>> node, the failover is not performed. Is there anything I missed
> in
> > > the
> > > >>>>> configuration?
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> Here's my configuration (i have hashed the ip address out):
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> node host1.com
> > > >>>>>
> > > >>>>> node host2.com
> > > >>>>>
> > > >>>>> primitive ClusterIP IPaddr2 \
> > > >>>>>
> > > >>>>> params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \
> > > >>>>>
> > > >>>>> op monitor interval=1s timeout=20s \
> > > >>>>>
> > > >>>>> op start interval=0 timeout=20s \
> > > >>>>>
> > > >>>>> op stop interval=0 timeout=20s \
> > > >>>>>
> > > >>>>> meta is-managed=true target-role=Started resource-stickiness=500
> > > >>>>>
> > > >>>>> primitive redis redis \
> > > >>>>>
> > > >>>>> meta target-role=Master is-managed=true \
> > > >>>>>
> > > >>>>> op monitor interval=1s role=Master timeout=5s on-fail=restart
> > > >>>>>
> > > >>>>> ms redis_clone redis \
> > > >>>>>
> > > >>>>> meta notify=true is-managed=true ordered=false interleave=false
> > > >>>>> globally-unique=false target-role=Master migration-threshold=1
> > > >>>>>
> > > >>>>> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
> > > >>>>>
> > > >>>>> colocation ip-on-redis inf: ClusterIP redis_clone:Master
> > > >>>>>
> > > >>>>> property cib-bootstrap-options: \
> > > >>>>>
> > > >>>>> dc-version=1.1.11-97629de \
> > > >>>>>
> > > >>>>> cluster-infrastructure="classic openais (with plugin)" \
> > > >>>>>
> > > >>>>> expected-quorum-votes=2 \
> > > >>>>>
> > > >>>>> stonith-enabled=false
> > > >>>>>
> > > >>>>> property redis_replication: \
> > > >>>>>
> > > >>>>> redis_REPL_INFO=host.com
> > >
> > >
> >
>
> --
> -- Ken Gaillot 
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker failover failure

2015-07-02 Thread Ken Gaillot
;
> > >>>> op monitor interval=10s \
> > >>>>
> > >>>> meta target-role=Started
> > >>>>
> > >>>> primitive redis redis \
> > >>>>
> > >>>> meta target-role=Master is-managed=true \
> > >>>>
> > >>>> op monitor interval=1s role=Master timeout=5s on-fail=restart
> > >>>>
> > >>>> ms redis_clone redis \
> > >>>>
> > >>>> meta notify=true is-managed=true ordered=false
> > interleave=false
> > >>>> globally-unique=false target-role=Master migration-threshold=1
> > >>>>
> > >>>> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
> > >>>>
> > >>>> colocation ip-on-redis inf: ClusterIP redis_clone:Master
> > >>>>
> > >>>> colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master
> > >>>>
> > >>>> property cib-bootstrap-options: \
> > >>>>
> > >>>> dc-version=1.1.11-97629de \
> > >>>>
> > >>>> cluster-infrastructure="classic openais (with plugin)" \
> > >>>>
> > >>>> expected-quorum-votes=2 \
> > >>>>
> > >>>> stonith-enabled=true
> > >>>>
> > >>>> property redis_replication: \
> > >>>>
> > >>>> redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com
> > >>>>
> > >>>> On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander <
> > >>>> alexander.nekra...@emc.com> wrote:
> > >>>>
> > >>>>> stonith-enabled=false
> > >>>>>
> > >>>>> this might be the issue. The way peer node death is resolved, the
> > >>>>> surviving node must call STONITH on the peer. If it’s disabled it
> > >>> might not
> > >>>>> be able to resolve the event
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Alex
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> *From:* alex austin [mailto:alexixa...@gmail.com]
> > >>>>> *Sent:* Wednesday, July 01, 2015 9:51 AM
> > >>>>> *To:* Users@clusterlabs.org
> > >>>>> *Subject:* Re: [ClusterLabs] Pacemaker failover failure
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> So I noticed that if I kill redis on one node, it starts on the
> > other,
> > >>> no
> > >>>>> problem, but if I actually kill pacemaker itself on one node, the
> > other
> > >>>>> doesn't "sense" it so it doesn't fail over.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Wed, Jul 1, 2015 at 12:42 PM, alex austin 
> > >>> wrote:
> > >>>>>
> > >>>>> Hi all,
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> I have configured a virtual ip and redis in master-slave with
> > corosync
> > >>>>> pacemaker. If redis fails, then the failover is successful, and redis
> > >>> gets
> > >>>>> promoted on the other node. However if pacemaker itself fails on the
> > >>> active
> > >>>>> node, the failover is not performed. Is there anything I missed in
> > the
> > >>>>> configuration?
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Here's my configuration (i have hashed the ip address out):
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> node host1.com
> > >>>>>
> > >>>>> node host2.com
> > >>>>>
> > >>>>> primitive ClusterIP IPaddr2 \
> > >>>>>
> > >>>>> params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \
> > >>>>>
> > >>>>> op monitor interval=1s timeout=20s \
> > >>>>>
> > >>>>> op start interval=0 timeout=20s \
> > >>>>>
> > >>>>> op stop interval=0 timeout=20s \
> > >>>>>
> > >>>>> meta is-managed=true target-role=Started resource-stickiness=500
> > >>>>>
> > >>>>> primitive redis redis \
> > >>>>>
> > >>>>> meta target-role=Master is-managed=true \
> > >>>>>
> > >>>>> op monitor interval=1s role=Master timeout=5s on-fail=restart
> > >>>>>
> > >>>>> ms redis_clone redis \
> > >>>>>
> > >>>>> meta notify=true is-managed=true ordered=false interleave=false
> > >>>>> globally-unique=false target-role=Master migration-threshold=1
> > >>>>>
> > >>>>> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
> > >>>>>
> > >>>>> colocation ip-on-redis inf: ClusterIP redis_clone:Master
> > >>>>>
> > >>>>> property cib-bootstrap-options: \
> > >>>>>
> > >>>>> dc-version=1.1.11-97629de \
> > >>>>>
> > >>>>> cluster-infrastructure="classic openais (with plugin)" \
> > >>>>>
> > >>>>> expected-quorum-votes=2 \
> > >>>>>
> > >>>>> stonith-enabled=false
> > >>>>>
> > >>>>> property redis_replication: \
> > >>>>>
> > >>>>> redis_REPL_INFO=host.com
> >
> >
> 

-- 
-- Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker failover failure

2015-07-02 Thread Digimer
> <http://dcwbpvmuas005.edc.nam.gm.com>" \
> >>>>
> >>>> op monitor interval=10s \
> >>>>
> >>>> meta target-role=Started
> >>>>
> >>>> primitive redis redis \
> >>>>
> >>>> meta target-role=Master is-managed=true \
> >>>>
> >>>> op monitor interval=1s role=Master timeout=5s
> on-fail=restart
> >>>>
> >>>> ms redis_clone redis \
> >>>>
> >>>> meta notify=true is-managed=true ordered=false
> interleave=false
> >>>> globally-unique=false target-role=Master migration-threshold=1
> >>>>
> >>>> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
> >>>>
> >>>> colocation ip-on-redis inf: ClusterIP redis_clone:Master
> >>>>
> >>>> colocation pcmk-fencing-on-redis inf: pcmk-fencing
> redis_clone:Master
> >>>>
> >>>> property cib-bootstrap-options: \
> >>>>
> >>>> dc-version=1.1.11-97629de \
> >>>>
> >>>> cluster-infrastructure="classic openais (with plugin)" \
> >>>>
> >>>> expected-quorum-votes=2 \
> >>>>
> >>>> stonith-enabled=true
> >>>>
> >>>> property redis_replication: \
> >>>>
> >>>> redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com
> <http://dcwbpvmuas005.edc.nam.gm.com>
> >>>>
> >>>> On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander <
> >>>> alexander.nekra...@emc.com <mailto:alexander.nekra...@emc.com>>
> wrote:
> >>>>
> >>>>> stonith-enabled=false
> >>>>>
> >>>>> this might be the issue. The way peer node death is resolved, the
> >>>>> surviving node must call STONITH on the peer. If it’s disabled it
> >>> might not
> >>>>> be able to resolve the event
> >>>>>
> >>>>>
> >>>>>
> >>>>> Alex
> >>>>>
> >>>>>
> >>>>>
> >>>>> *From:* alex austin [mailto:alexixa...@gmail.com
> <mailto:alexixa...@gmail.com>]
> >>>>> *Sent:* Wednesday, July 01, 2015 9:51 AM
> >>>>> *To:* Users@clusterlabs.org <mailto:Users@clusterlabs.org>
> >>>>> *Subject:* Re: [ClusterLabs] Pacemaker failover failure
> >>>>>
> >>>>>
> >>>>>
> >>>>> So I noticed that if I kill redis on one node, it starts on
> the other,
> >>> no
> >>>>> problem, but if I actually kill pacemaker itself on one node,
> the other
> >>>>> doesn't "sense" it so it doesn't fail over.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Jul 1, 2015 at 12:42 PM, alex austin
> mailto:alexixa...@gmail.com>>
> >>> wrote:
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>>
> >>>>>
> >>>>> I have configured a virtual ip and redis in master-slave with
> corosync
> >>>>> pacemaker. If redis fails, then the failover is successful,
> and redis
> >>> gets
> >>>>> promoted on the other node. However if pacemaker itself fails
> on the
> >>> active
> >>>>> node, the failover is not performed. Is there anything I
> missed in the
> >>>>> configuration?
> >>>>>
> >>>>>
> >>>>>
> >>>>> Here's my configuration (i have hashed the ip address out):
> >>>>>
> >>>>>
> >>>>>
> >>>>> node host1.com <http://host1.com>
> >>>>>
> >>>>> node host2.com <http://host2.com>
> >>>>>
> >>>>> primitive ClusterIP IPaddr2 \
> >>>>>
> >>>>> params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \
> >>>>>
> >>>>> op monitor interval=1s timeout=20s \
> >>>>>
> >>>>> op start interval=0 timeout=20s \
> >>>>>
> >>>>> op stop interval=0 timeout=20s \
> >>>>>
> >>>>> meta is-managed=true target-role=Started resource-stickiness=500
> >>>>>
> >>>>> primitive redis redis \
> >>>>>
> >>>>> meta target-role=Master is-managed=true \
> >>>>>
> >>>>> op monitor interval=1s role=Master timeout=5s on-fail=restart
> >>>>>
> >>>>> ms redis_clone redis \
> >>>>>
> >>>>> meta notify=true is-managed=true ordered=false interleave=false
> >>>>> globally-unique=false target-role=Master migration-threshold=1
> >>>>>
> >>>>> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
> >>>>>
> >>>>> colocation ip-on-redis inf: ClusterIP redis_clone:Master
> >>>>>
> >>>>> property cib-bootstrap-options: \
> >>>>>
> >>>>> dc-version=1.1.11-97629de \
> >>>>>
> >>>>> cluster-infrastructure="classic openais (with plugin)" \
> >>>>>
> >>>>> expected-quorum-votes=2 \
> >>>>>
> >>>>> stonith-enabled=false
> >>>>>
> >>>>> property redis_replication: \
> >>>>>
> >>>>> redis_REPL_INFO=host.com <http://host.com>
> 
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker failover failure

2015-07-02 Thread alex austin
Thank you!

However, what is proper fencing in this situation?

Kind Regards,

Alex

On Wed, Jul 1, 2015 at 11:30 PM, Ken Gaillot  wrote:

> On 07/01/2015 09:39 AM, alex austin wrote:
> > This is what crm_mon shows
> >
> >
> > Last updated: Wed Jul  1 10:35:40 2015
> >
> > Last change: Wed Jul  1 09:52:46 2015
> >
> > Stack: classic openais (with plugin)
> >
> > Current DC: host2 - partition with quorum
> >
> > Version: 1.1.11-97629de
> >
> > 2 Nodes configured, 2 expected votes
> >
> > 4 Resources configured
> >
> >
> >
> > Online: [ host1 host2 ]
> >
> >
> > ClusterIP (ocf::heartbeat:IPaddr2): Started host2
> >
> >  Master/Slave Set: redis_clone [redis]
> >
> >  Masters: [ host2 ]
> >
> >  Slaves: [ host1 ]
> >
> > pcmk-fencing(stonith:fence_pcmk):   Started host2
> >
> > On Wed, Jul 1, 2015 at 3:37 PM, alex austin 
> wrote:
> >
> >> I am running version 1.4.7 of corosync
>
> If you can't upgrade to corosync 2 (which has many improvements), you'll
> need to set the no-quorum-policy=ignore cluster option.
>
> Proper fencing is necessary to avoid a split-brain situation, which can
> corrupt your data.
>
> >> On Wed, Jul 1, 2015 at 3:25 PM, Ken Gaillot 
> wrote:
> >>
> >>> On 07/01/2015 08:57 AM, alex austin wrote:
> >>>> I have now configured stonith-enabled=true. What device should I use
> for
> >>>> fencing given the fact that it's a virtual machine but I don't have
> >>> access
> >>>> to its configuration. would fence_pcmk do? if so, what parameters
> >>> should I
> >>>> configure for it to work properly?
> >>>
> >>> No, fence_pcmk is not for using in pacemaker, but for using in RHEL6's
> >>> CMAN to redirect its fencing requests to pacemaker.
> >>>
> >>> For a virtual machine, ideally you'd use fence_virtd running on the
> >>> physical host, but I'm guessing from your comment that you can't do
> >>> that. Does whoever provides your VM also provide an API for controlling
> >>> it (starting/stopping/rebooting)?
> >>>
> >>> Regarding your original problem, it sounds like the surviving node
> >>> doesn't have quorum. What version of corosync are you using? If you're
> >>> using corosync 2, you need "two_node: 1" in corosync.conf, in addition
> >>> to configuring fencing in pacemaker.
> >>>
> >>>> This is my new config:
> >>>>
> >>>>
> >>>> node dcwbpvmuas004.edc.nam.gm.com \
> >>>>
> >>>> attributes standby=off
> >>>>
> >>>> node dcwbpvmuas005.edc.nam.gm.com \
> >>>>
> >>>> attributes standby=off
> >>>>
> >>>> primitive ClusterIP IPaddr2 \
> >>>>
> >>>> params ip=198.208.86.242 cidr_netmask=23 \
> >>>>
> >>>> op monitor interval=1s timeout=20s \
> >>>>
> >>>> op start interval=0 timeout=20s \
> >>>>
> >>>> op stop interval=0 timeout=20s \
> >>>>
> >>>> meta is-managed=true target-role=Started
> resource-stickiness=500
> >>>>
> >>>> primitive pcmk-fencing stonith:fence_pcmk \
> >>>>
> >>>> params pcmk_host_list="dcwbpvmuas004.edc.nam.gm.com
> >>>> dcwbpvmuas005.edc.nam.gm.com" \
> >>>>
> >>>> op monitor interval=10s \
> >>>>
> >>>> meta target-role=Started
> >>>>
> >>>> primitive redis redis \
> >>>>
> >>>> meta target-role=Master is-managed=true \
> >>>>
> >>>> op monitor interval=1s role=Master timeout=5s on-fail=restart
> >>>>
> >>>> ms redis_clone redis \
> >>>>
> >>>> meta notify=true is-managed=true ordered=false
> interleave=false
> >>>> globally-unique=false target-role=Master migration-threshold=1
> >>>>
> >>>> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
> >>>>
> >>>> colocation ip-on-redis inf: ClusterIP redis_clone:Master
> >>>>
> >>>

Re: [ClusterLabs] Pacemaker failover failure

2015-07-01 Thread Ken Gaillot
On 07/01/2015 09:39 AM, alex austin wrote:
> This is what crm_mon shows
> 
> 
> Last updated: Wed Jul  1 10:35:40 2015
> 
> Last change: Wed Jul  1 09:52:46 2015
> 
> Stack: classic openais (with plugin)
> 
> Current DC: host2 - partition with quorum
> 
> Version: 1.1.11-97629de
> 
> 2 Nodes configured, 2 expected votes
> 
> 4 Resources configured
> 
> 
> 
> Online: [ host1 host2 ]
> 
> 
> ClusterIP (ocf::heartbeat:IPaddr2): Started host2
> 
>  Master/Slave Set: redis_clone [redis]
> 
>  Masters: [ host2 ]
> 
>  Slaves: [ host1 ]
> 
> pcmk-fencing(stonith:fence_pcmk):   Started host2
> 
> On Wed, Jul 1, 2015 at 3:37 PM, alex austin  wrote:
> 
>> I am running version 1.4.7 of corosync

If you can't upgrade to corosync 2 (which has many improvements), you'll
need to set the no-quorum-policy=ignore cluster option.

Proper fencing is necessary to avoid a split-brain situation, which can
corrupt your data.

>> On Wed, Jul 1, 2015 at 3:25 PM, Ken Gaillot  wrote:
>>
>>> On 07/01/2015 08:57 AM, alex austin wrote:
>>>> I have now configured stonith-enabled=true. What device should I use for
>>>> fencing given the fact that it's a virtual machine but I don't have
>>> access
>>>> to its configuration. would fence_pcmk do? if so, what parameters
>>> should I
>>>> configure for it to work properly?
>>>
>>> No, fence_pcmk is not for using in pacemaker, but for using in RHEL6's
>>> CMAN to redirect its fencing requests to pacemaker.
>>>
>>> For a virtual machine, ideally you'd use fence_virtd running on the
>>> physical host, but I'm guessing from your comment that you can't do
>>> that. Does whoever provides your VM also provide an API for controlling
>>> it (starting/stopping/rebooting)?
>>>
>>> Regarding your original problem, it sounds like the surviving node
>>> doesn't have quorum. What version of corosync are you using? If you're
>>> using corosync 2, you need "two_node: 1" in corosync.conf, in addition
>>> to configuring fencing in pacemaker.
>>>
>>>> This is my new config:
>>>>
>>>>
>>>> node dcwbpvmuas004.edc.nam.gm.com \
>>>>
>>>> attributes standby=off
>>>>
>>>> node dcwbpvmuas005.edc.nam.gm.com \
>>>>
>>>> attributes standby=off
>>>>
>>>> primitive ClusterIP IPaddr2 \
>>>>
>>>> params ip=198.208.86.242 cidr_netmask=23 \
>>>>
>>>> op monitor interval=1s timeout=20s \
>>>>
>>>> op start interval=0 timeout=20s \
>>>>
>>>> op stop interval=0 timeout=20s \
>>>>
>>>> meta is-managed=true target-role=Started resource-stickiness=500
>>>>
>>>> primitive pcmk-fencing stonith:fence_pcmk \
>>>>
>>>> params pcmk_host_list="dcwbpvmuas004.edc.nam.gm.com
>>>> dcwbpvmuas005.edc.nam.gm.com" \
>>>>
>>>> op monitor interval=10s \
>>>>
>>>> meta target-role=Started
>>>>
>>>> primitive redis redis \
>>>>
>>>> meta target-role=Master is-managed=true \
>>>>
>>>> op monitor interval=1s role=Master timeout=5s on-fail=restart
>>>>
>>>> ms redis_clone redis \
>>>>
>>>> meta notify=true is-managed=true ordered=false interleave=false
>>>> globally-unique=false target-role=Master migration-threshold=1
>>>>
>>>> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
>>>>
>>>> colocation ip-on-redis inf: ClusterIP redis_clone:Master
>>>>
>>>> colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master
>>>>
>>>> property cib-bootstrap-options: \
>>>>
>>>> dc-version=1.1.11-97629de \
>>>>
>>>> cluster-infrastructure="classic openais (with plugin)" \
>>>>
>>>> expected-quorum-votes=2 \
>>>>
>>>> stonith-enabled=true
>>>>
>>>> property redis_replication: \
>>>>
>>>> redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com
>>>>
>>>> On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander <
>>>> alexander.nekra...@emc.com

Re: [ClusterLabs] Pacemaker failover failure

2015-07-01 Thread alex austin
I am running version 1.4.7 of corosync



On Wed, Jul 1, 2015 at 3:25 PM, Ken Gaillot  wrote:

> On 07/01/2015 08:57 AM, alex austin wrote:
> > I have now configured stonith-enabled=true. What device should I use for
> > fencing given the fact that it's a virtual machine but I don't have
> access
> > to its configuration. would fence_pcmk do? if so, what parameters should
> I
> > configure for it to work properly?
>
> No, fence_pcmk is not for using in pacemaker, but for using in RHEL6's
> CMAN to redirect its fencing requests to pacemaker.
>
> For a virtual machine, ideally you'd use fence_virtd running on the
> physical host, but I'm guessing from your comment that you can't do
> that. Does whoever provides your VM also provide an API for controlling
> it (starting/stopping/rebooting)?
>
> Regarding your original problem, it sounds like the surviving node
> doesn't have quorum. What version of corosync are you using? If you're
> using corosync 2, you need "two_node: 1" in corosync.conf, in addition
> to configuring fencing in pacemaker.
>
> > This is my new config:
> >
> >
> > node dcwbpvmuas004.edc.nam.gm.com \
> >
> > attributes standby=off
> >
> > node dcwbpvmuas005.edc.nam.gm.com \
> >
> > attributes standby=off
> >
> > primitive ClusterIP IPaddr2 \
> >
> > params ip=198.208.86.242 cidr_netmask=23 \
> >
> > op monitor interval=1s timeout=20s \
> >
> > op start interval=0 timeout=20s \
> >
> > op stop interval=0 timeout=20s \
> >
> > meta is-managed=true target-role=Started resource-stickiness=500
> >
> > primitive pcmk-fencing stonith:fence_pcmk \
> >
> > params pcmk_host_list="dcwbpvmuas004.edc.nam.gm.com
> > dcwbpvmuas005.edc.nam.gm.com" \
> >
> > op monitor interval=10s \
> >
> > meta target-role=Started
> >
> > primitive redis redis \
> >
> > meta target-role=Master is-managed=true \
> >
> > op monitor interval=1s role=Master timeout=5s on-fail=restart
> >
> > ms redis_clone redis \
> >
> > meta notify=true is-managed=true ordered=false interleave=false
> > globally-unique=false target-role=Master migration-threshold=1
> >
> > colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
> >
> > colocation ip-on-redis inf: ClusterIP redis_clone:Master
> >
> > colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master
> >
> > property cib-bootstrap-options: \
> >
> > dc-version=1.1.11-97629de \
> >
> > cluster-infrastructure="classic openais (with plugin)" \
> >
> > expected-quorum-votes=2 \
> >
> > stonith-enabled=true
> >
> > property redis_replication: \
> >
> > redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com
> >
> > On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander <
> > alexander.nekra...@emc.com> wrote:
> >
> >> stonith-enabled=false
> >>
> >> this might be the issue. The way peer node death is resolved, the
> >> surviving node must call STONITH on the peer. If it’s disabled it might
> not
> >> be able to resolve the event
> >>
> >>
> >>
> >> Alex
> >>
> >>
> >>
> >> *From:* alex austin [mailto:alexixa...@gmail.com]
> >> *Sent:* Wednesday, July 01, 2015 9:51 AM
> >> *To:* Users@clusterlabs.org
> >> *Subject:* Re: [ClusterLabs] Pacemaker failover failure
> >>
> >>
> >>
> >> So I noticed that if I kill redis on one node, it starts on the other,
> no
> >> problem, but if I actually kill pacemaker itself on one node, the other
> >> doesn't "sense" it so it doesn't fail over.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Jul 1, 2015 at 12:42 PM, alex austin 
> wrote:
> >>
> >> Hi all,
> >>
> >>
> >>
> >> I have configured a virtual ip and redis in master-slave with corosync
> >> pacemaker. If redis fails, then the failover is successful, and redis
> gets
> >> promoted on the other node. However if pacemaker itself fails on the
> active
> >> node, the failover is not performed. Is there anything I missed in the
> >> configuration?
> >>
> >>
> >>
> >> Here's my con

Re: [ClusterLabs] Pacemaker failover failure

2015-07-01 Thread alex austin
This is what crm_mon shows


Last updated: Wed Jul  1 10:35:40 2015

Last change: Wed Jul  1 09:52:46 2015

Stack: classic openais (with plugin)

Current DC: host2 - partition with quorum

Version: 1.1.11-97629de

2 Nodes configured, 2 expected votes

4 Resources configured



Online: [ host1 host2 ]


ClusterIP (ocf::heartbeat:IPaddr2): Started host2

 Master/Slave Set: redis_clone [redis]

 Masters: [ host2 ]

 Slaves: [ host1 ]

pcmk-fencing(stonith:fence_pcmk):   Started host2

On Wed, Jul 1, 2015 at 3:37 PM, alex austin  wrote:

> I am running version 1.4.7 of corosync
>
>
>
> On Wed, Jul 1, 2015 at 3:25 PM, Ken Gaillot  wrote:
>
>> On 07/01/2015 08:57 AM, alex austin wrote:
>> > I have now configured stonith-enabled=true. What device should I use for
>> > fencing given the fact that it's a virtual machine but I don't have
>> access
>> > to its configuration. would fence_pcmk do? if so, what parameters
>> should I
>> > configure for it to work properly?
>>
>> No, fence_pcmk is not for using in pacemaker, but for using in RHEL6's
>> CMAN to redirect its fencing requests to pacemaker.
>>
>> For a virtual machine, ideally you'd use fence_virtd running on the
>> physical host, but I'm guessing from your comment that you can't do
>> that. Does whoever provides your VM also provide an API for controlling
>> it (starting/stopping/rebooting)?
>>
>> Regarding your original problem, it sounds like the surviving node
>> doesn't have quorum. What version of corosync are you using? If you're
>> using corosync 2, you need "two_node: 1" in corosync.conf, in addition
>> to configuring fencing in pacemaker.
>>
>> > This is my new config:
>> >
>> >
>> > node dcwbpvmuas004.edc.nam.gm.com \
>> >
>> > attributes standby=off
>> >
>> > node dcwbpvmuas005.edc.nam.gm.com \
>> >
>> > attributes standby=off
>> >
>> > primitive ClusterIP IPaddr2 \
>> >
>> > params ip=198.208.86.242 cidr_netmask=23 \
>> >
>> > op monitor interval=1s timeout=20s \
>> >
>> > op start interval=0 timeout=20s \
>> >
>> > op stop interval=0 timeout=20s \
>> >
>> > meta is-managed=true target-role=Started resource-stickiness=500
>> >
>> > primitive pcmk-fencing stonith:fence_pcmk \
>> >
>> > params pcmk_host_list="dcwbpvmuas004.edc.nam.gm.com
>> > dcwbpvmuas005.edc.nam.gm.com" \
>> >
>> > op monitor interval=10s \
>> >
>> > meta target-role=Started
>> >
>> > primitive redis redis \
>> >
>> > meta target-role=Master is-managed=true \
>> >
>> > op monitor interval=1s role=Master timeout=5s on-fail=restart
>> >
>> > ms redis_clone redis \
>> >
>> > meta notify=true is-managed=true ordered=false interleave=false
>> > globally-unique=false target-role=Master migration-threshold=1
>> >
>> > colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
>> >
>> > colocation ip-on-redis inf: ClusterIP redis_clone:Master
>> >
>> > colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master
>> >
>> > property cib-bootstrap-options: \
>> >
>> > dc-version=1.1.11-97629de \
>> >
>> > cluster-infrastructure="classic openais (with plugin)" \
>> >
>> > expected-quorum-votes=2 \
>> >
>> > stonith-enabled=true
>> >
>> > property redis_replication: \
>> >
>> > redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com
>> >
>> > On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander <
>> > alexander.nekra...@emc.com> wrote:
>> >
>> >> stonith-enabled=false
>> >>
>> >> this might be the issue. The way peer node death is resolved, the
>> >> surviving node must call STONITH on the peer. If it’s disabled it
>> might not
>> >> be able to resolve the event
>> >>
>> >>
>> >>
>> >> Alex
>> >>
>> >>
>> >>
>> >> *From:* alex austin [mailto:alexixa...@gmail.com]
>> >> *Sent:* Wednesday, July 01, 2015 9:51 AM
>> >> *To:* Users@clusterlabs.org
>> >> *Subject:* Re: [ClusterLabs] Pacemaker failover failure
>> >>
>> 

Re: [ClusterLabs] Pacemaker failover failure

2015-07-01 Thread Ken Gaillot
On 07/01/2015 08:57 AM, alex austin wrote:
> I have now configured stonith-enabled=true. What device should I use for
> fencing given the fact that it's a virtual machine but I don't have access
> to its configuration. would fence_pcmk do? if so, what parameters should I
> configure for it to work properly?

No, fence_pcmk is not for using in pacemaker, but for using in RHEL6's
CMAN to redirect its fencing requests to pacemaker.

For a virtual machine, ideally you'd use fence_virtd running on the
physical host, but I'm guessing from your comment that you can't do
that. Does whoever provides your VM also provide an API for controlling
it (starting/stopping/rebooting)?

Regarding your original problem, it sounds like the surviving node
doesn't have quorum. What version of corosync are you using? If you're
using corosync 2, you need "two_node: 1" in corosync.conf, in addition
to configuring fencing in pacemaker.

> This is my new config:
> 
> 
> node dcwbpvmuas004.edc.nam.gm.com \
> 
> attributes standby=off
> 
> node dcwbpvmuas005.edc.nam.gm.com \
> 
> attributes standby=off
> 
> primitive ClusterIP IPaddr2 \
> 
> params ip=198.208.86.242 cidr_netmask=23 \
> 
> op monitor interval=1s timeout=20s \
> 
> op start interval=0 timeout=20s \
> 
> op stop interval=0 timeout=20s \
> 
> meta is-managed=true target-role=Started resource-stickiness=500
> 
> primitive pcmk-fencing stonith:fence_pcmk \
> 
> params pcmk_host_list="dcwbpvmuas004.edc.nam.gm.com
> dcwbpvmuas005.edc.nam.gm.com" \
> 
> op monitor interval=10s \
> 
> meta target-role=Started
> 
> primitive redis redis \
> 
> meta target-role=Master is-managed=true \
> 
> op monitor interval=1s role=Master timeout=5s on-fail=restart
> 
> ms redis_clone redis \
> 
> meta notify=true is-managed=true ordered=false interleave=false
> globally-unique=false target-role=Master migration-threshold=1
> 
> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
> 
> colocation ip-on-redis inf: ClusterIP redis_clone:Master
> 
> colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master
> 
> property cib-bootstrap-options: \
> 
> dc-version=1.1.11-97629de \
> 
> cluster-infrastructure="classic openais (with plugin)" \
> 
> expected-quorum-votes=2 \
> 
> stonith-enabled=true
> 
> property redis_replication: \
> 
> redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com
> 
> On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander <
> alexander.nekra...@emc.com> wrote:
> 
>> stonith-enabled=false
>>
>> this might be the issue. The way peer node death is resolved, the
>> surviving node must call STONITH on the peer. If it’s disabled it might not
>> be able to resolve the event
>>
>>
>>
>> Alex
>>
>>
>>
>> *From:* alex austin [mailto:alexixa...@gmail.com]
>> *Sent:* Wednesday, July 01, 2015 9:51 AM
>> *To:* Users@clusterlabs.org
>> *Subject:* Re: [ClusterLabs] Pacemaker failover failure
>>
>>
>>
>> So I noticed that if I kill redis on one node, it starts on the other, no
>> problem, but if I actually kill pacemaker itself on one node, the other
>> doesn't "sense" it so it doesn't fail over.
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Jul 1, 2015 at 12:42 PM, alex austin  wrote:
>>
>> Hi all,
>>
>>
>>
>> I have configured a virtual ip and redis in master-slave with corosync
>> pacemaker. If redis fails, then the failover is successful, and redis gets
>> promoted on the other node. However if pacemaker itself fails on the active
>> node, the failover is not performed. Is there anything I missed in the
>> configuration?
>>
>>
>>
>> Here's my configuration (i have hashed the ip address out):
>>
>>
>>
>> node host1.com
>>
>> node host2.com
>>
>> primitive ClusterIP IPaddr2 \
>>
>> params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \
>>
>> op monitor interval=1s timeout=20s \
>>
>> op start interval=0 timeout=20s \
>>
>> op stop interval=0 timeout=20s \
>>
>> meta is-managed=true target-role=Started resource-stickiness=500
>>
>> primitive redis redis \
>>
>> meta target-role=Master is-managed=true \
>>
>> op monitor interval=1s role=Master timeout=5s on-fail=restart
>>
>> ms redis_clone redis \
>>
>> meta notify=

Re: [ClusterLabs] Pacemaker failover failure

2015-07-01 Thread alex austin
I have now configured stonith-enabled=true. What device should I use for
fencing given the fact that it's a virtual machine but I don't have access
to its configuration. would fence_pcmk do? if so, what parameters should I
configure for it to work properly?

This is my new config:


node dcwbpvmuas004.edc.nam.gm.com \

attributes standby=off

node dcwbpvmuas005.edc.nam.gm.com \

attributes standby=off

primitive ClusterIP IPaddr2 \

params ip=198.208.86.242 cidr_netmask=23 \

op monitor interval=1s timeout=20s \

op start interval=0 timeout=20s \

op stop interval=0 timeout=20s \

meta is-managed=true target-role=Started resource-stickiness=500

primitive pcmk-fencing stonith:fence_pcmk \

params pcmk_host_list="dcwbpvmuas004.edc.nam.gm.com
dcwbpvmuas005.edc.nam.gm.com" \

op monitor interval=10s \

meta target-role=Started

primitive redis redis \

meta target-role=Master is-managed=true \

op monitor interval=1s role=Master timeout=5s on-fail=restart

ms redis_clone redis \

meta notify=true is-managed=true ordered=false interleave=false
globally-unique=false target-role=Master migration-threshold=1

colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master

colocation ip-on-redis inf: ClusterIP redis_clone:Master

colocation pcmk-fencing-on-redis inf: pcmk-fencing redis_clone:Master

property cib-bootstrap-options: \

dc-version=1.1.11-97629de \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2 \

stonith-enabled=true

property redis_replication: \

redis_REPL_INFO=dcwbpvmuas005.edc.nam.gm.com

On Wed, Jul 1, 2015 at 2:53 PM, Nekrasov, Alexander <
alexander.nekra...@emc.com> wrote:

> stonith-enabled=false
>
> this might be the issue. The way peer node death is resolved, the
> surviving node must call STONITH on the peer. If it’s disabled it might not
> be able to resolve the event
>
>
>
> Alex
>
>
>
> *From:* alex austin [mailto:alexixa...@gmail.com]
> *Sent:* Wednesday, July 01, 2015 9:51 AM
> *To:* Users@clusterlabs.org
> *Subject:* Re: [ClusterLabs] Pacemaker failover failure
>
>
>
> So I noticed that if I kill redis on one node, it starts on the other, no
> problem, but if I actually kill pacemaker itself on one node, the other
> doesn't "sense" it so it doesn't fail over.
>
>
>
>
>
>
>
> On Wed, Jul 1, 2015 at 12:42 PM, alex austin  wrote:
>
> Hi all,
>
>
>
> I have configured a virtual ip and redis in master-slave with corosync
> pacemaker. If redis fails, then the failover is successful, and redis gets
> promoted on the other node. However if pacemaker itself fails on the active
> node, the failover is not performed. Is there anything I missed in the
> configuration?
>
>
>
> Here's my configuration (i have hashed the ip address out):
>
>
>
> node host1.com
>
> node host2.com
>
> primitive ClusterIP IPaddr2 \
>
> params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \
>
> op monitor interval=1s timeout=20s \
>
> op start interval=0 timeout=20s \
>
> op stop interval=0 timeout=20s \
>
> meta is-managed=true target-role=Started resource-stickiness=500
>
> primitive redis redis \
>
> meta target-role=Master is-managed=true \
>
> op monitor interval=1s role=Master timeout=5s on-fail=restart
>
> ms redis_clone redis \
>
> meta notify=true is-managed=true ordered=false interleave=false
> globally-unique=false target-role=Master migration-threshold=1
>
> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
>
> colocation ip-on-redis inf: ClusterIP redis_clone:Master
>
> property cib-bootstrap-options: \
>
> dc-version=1.1.11-97629de \
>
> cluster-infrastructure="classic openais (with plugin)" \
>
> expected-quorum-votes=2 \
>
> stonith-enabled=false
>
> property redis_replication: \
>
> redis_REPL_INFO=host.com
>
>
>
> thank you in advance
>
>
>
> Kind regards,
>
>
>
> Alex
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker failover failure

2015-07-01 Thread alex austin
so did another test:

two nodes: node1 and node2

Case: node1 is the active node
node2: is pasive

if I killall -9 pacemakerd corosync on node 1 the services do not fail over
to node2, but if I start corosync and pacemaker on node1 then it fails over
to node 2.

Where am I mistaking?

Alex

On Wed, Jul 1, 2015 at 12:42 PM, alex austin  wrote:

> Hi all,
>
> I have configured a virtual ip and redis in master-slave with corosync
> pacemaker. If redis fails, then the failover is successful, and redis gets
> promoted on the other node. However if pacemaker itself fails on the active
> node, the failover is not performed. Is there anything I missed in the
> configuration?
>
> Here's my configuration (i have hashed the ip address out):
>
> node host1.com
>
> node host2.com
>
> primitive ClusterIP IPaddr2 \
>
> params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \
>
> op monitor interval=1s timeout=20s \
>
> op start interval=0 timeout=20s \
>
> op stop interval=0 timeout=20s \
>
> meta is-managed=true target-role=Started resource-stickiness=500
>
> primitive redis redis \
>
> meta target-role=Master is-managed=true \
>
> op monitor interval=1s role=Master timeout=5s on-fail=restart
>
> ms redis_clone redis \
>
> meta notify=true is-managed=true ordered=false interleave=false
> globally-unique=false target-role=Master migration-threshold=1
>
> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
>
> colocation ip-on-redis inf: ClusterIP redis_clone:Master
>
> property cib-bootstrap-options: \
>
> dc-version=1.1.11-97629de \
>
> cluster-infrastructure="classic openais (with plugin)" \
>
> expected-quorum-votes=2 \
>
> stonith-enabled=false
>
> property redis_replication: \
>
> redis_REPL_INFO=host.com
>
>
> thank you in advance
>
>
> Kind regards,
>
>
> Alex
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker failover failure

2015-07-01 Thread Nekrasov, Alexander
stonith-enabled=false
this might be the issue. The way peer node death is resolved, the surviving 
node must call STONITH on the peer. If it’s disabled it might not be able to 
resolve the event

Alex

From: alex austin [mailto:alexixa...@gmail.com]
Sent: Wednesday, July 01, 2015 9:51 AM
To: Users@clusterlabs.org
Subject: Re: [ClusterLabs] Pacemaker failover failure

So I noticed that if I kill redis on one node, it starts on the other, no 
problem, but if I actually kill pacemaker itself on one node, the other doesn't 
"sense" it so it doesn't fail over.



On Wed, Jul 1, 2015 at 12:42 PM, alex austin 
mailto:alexixa...@gmail.com>> wrote:
Hi all,

I have configured a virtual ip and redis in master-slave with corosync 
pacemaker. If redis fails, then the failover is successful, and redis gets 
promoted on the other node. However if pacemaker itself fails on the active 
node, the failover is not performed. Is there anything I missed in the 
configuration?

Here's my configuration (i have hashed the ip address out):


node host1.com<http://host1.com>

node host2.com<http://host2.com>

primitive ClusterIP IPaddr2 \

params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \

op monitor interval=1s timeout=20s \

op start interval=0 timeout=20s \

op stop interval=0 timeout=20s \

meta is-managed=true target-role=Started resource-stickiness=500

primitive redis redis \

meta target-role=Master is-managed=true \

op monitor interval=1s role=Master timeout=5s on-fail=restart

ms redis_clone redis \

meta notify=true is-managed=true ordered=false interleave=false 
globally-unique=false target-role=Master migration-threshold=1

colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master

colocation ip-on-redis inf: ClusterIP redis_clone:Master

property cib-bootstrap-options: \

dc-version=1.1.11-97629de \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2 \

stonith-enabled=false

property redis_replication: \

redis_REPL_INFO=host.com<http://host.com>



thank you in advance



Kind regards,



Alex

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker failover failure

2015-07-01 Thread alex austin
So I noticed that if I kill redis on one node, it starts on the other, no
problem, but if I actually kill pacemaker itself on one node, the other
doesn't "sense" it so it doesn't fail over.



On Wed, Jul 1, 2015 at 12:42 PM, alex austin  wrote:

> Hi all,
>
> I have configured a virtual ip and redis in master-slave with corosync
> pacemaker. If redis fails, then the failover is successful, and redis gets
> promoted on the other node. However if pacemaker itself fails on the active
> node, the failover is not performed. Is there anything I missed in the
> configuration?
>
> Here's my configuration (i have hashed the ip address out):
>
> node host1.com
>
> node host2.com
>
> primitive ClusterIP IPaddr2 \
>
> params ip=xxx.xxx.xxx.xxx cidr_netmask=23 \
>
> op monitor interval=1s timeout=20s \
>
> op start interval=0 timeout=20s \
>
> op stop interval=0 timeout=20s \
>
> meta is-managed=true target-role=Started resource-stickiness=500
>
> primitive redis redis \
>
> meta target-role=Master is-managed=true \
>
> op monitor interval=1s role=Master timeout=5s on-fail=restart
>
> ms redis_clone redis \
>
> meta notify=true is-managed=true ordered=false interleave=false
> globally-unique=false target-role=Master migration-threshold=1
>
> colocation ClusterIP-on-redis inf: ClusterIP redis_clone:Master
>
> colocation ip-on-redis inf: ClusterIP redis_clone:Master
>
> property cib-bootstrap-options: \
>
> dc-version=1.1.11-97629de \
>
> cluster-infrastructure="classic openais (with plugin)" \
>
> expected-quorum-votes=2 \
>
> stonith-enabled=false
>
> property redis_replication: \
>
> redis_REPL_INFO=host.com
>
>
> thank you in advance
>
>
> Kind regards,
>
>
> Alex
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org