Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-18 Thread Ken Gaillot
On 01/18/2017 03:49 AM, Ferenc Wágner wrote:
> Ken Gaillot  writes:
> 
>> * When you move the VM, the cluster detects that it is not running on
>> the node you told it to keep it running on. Because there is no
>> "Stopped" monitor, the cluster doesn't immediately realize that a new
>> rogue instance is running on another node. So, the cluster thinks the VM
>> crashed on the original node, and recovers it by starting it again.
> 
> Ken, do you mean that if a periodic "stopped" monitor is configured, it
> is forced to run immediately (out of schedule) when the regular periodic
> monitor unexpectedly returns with stopped status?  That is, before the
> cluster takes the recovery action?  Conceptually, that would be similar
> to the probe run on node startup.  If not, then maybe it would be a
> useful resource option to have (I mean running cluster-wide probes on an
> unexpected monitor failure, before recovery).  An optional safety check.

No, there is nothing like that currently. The regular and "Stopped"
monitors run independently. Because they must have different intervals,
that does mean that the two sides of the issue may be detected at
different times.

It is an interesting idea to have an option to reprobe on operation
failure. I think it may be overkill; the only failure situation it would
be good for is one like this, where a resource was moved out of cluster
control. The vast majority of failure scenarios wouldn't be helped. If
that sort of thing happens a lot in your cluster, you really need to
figure out how to stop doing that. :)

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-18 Thread Ferenc Wágner
Ken Gaillot  writes:

> * When you move the VM, the cluster detects that it is not running on
> the node you told it to keep it running on. Because there is no
> "Stopped" monitor, the cluster doesn't immediately realize that a new
> rogue instance is running on another node. So, the cluster thinks the VM
> crashed on the original node, and recovers it by starting it again.

Ken, do you mean that if a periodic "stopped" monitor is configured, it
is forced to run immediately (out of schedule) when the regular periodic
monitor unexpectedly returns with stopped status?  That is, before the
cluster takes the recovery action?  Conceptually, that would be similar
to the probe run on node startup.  If not, then maybe it would be a
useful resource option to have (I mean running cluster-wide probes on an
unexpected monitor failure, before recovery).  An optional safety check.
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-17 Thread Ken Gaillot
On 01/17/2017 10:05 AM, Oscar Segarra wrote:
> Hi, 
> 
> * It is also possible to configure a monitor to ensure that the resource
> is not running on nodes where it's not supposed to be (a monitor with
> role="Stopped"). You don't have one of these (which is fine, and common).
> 
> Can you provide more information/documentation about role="Stopped"

Since you're using pcs, you can either configure monitors when you
create the resource with pcs resource create, or you can add/remove
monitors later with pcs resource op add/remove.

For example:

pcs resource op add my-resource-name op monitor interval=10s role="Stopped"

With a normal monitor op (role="Started" or omitted), the cluster will
run the resource agent's monitor command on any node that's supposed to
be running the resource. With the above example, it will additionally
run a monitor on all other nodes, so that if it finds the resource
running somewhere it's not supposed to be, it can stop it.

Note that each monitor op must have a unique timeout. So if your
existing monitor runs every 10s, you need to pick a different value for
the new monitor.

> And, please, can you explain how VirtualDomain resource agents manages
> the scenario I've presented?
> 
> /What happens If I stop pacemaker and corosync services in all nodes and
> I start them again... ¿will I have all guests running twice?/
> 
> Thanks a lot

If you stop cluster services, by default the cluster will first stop all
resources. You can set maintenance mode, or unmanage one or more
resources, to prevent the stops.

When cluster services first start on a node, the cluster "probes" the
status of all resources on that node, by running a one-time monitor. So
it will detect anything running at that time, and start or stop services
as needed to meet the configured requirements.

> 2017-01-17 16:38 GMT+01:00 Ken Gaillot  >:
> 
> On 01/17/2017 08:52 AM, Ulrich Windl wrote:
>  Oscar Segarra  > schrieb am
> 17.01.2017 um 10:15 in
> > Nachricht
> >  
> >:
> >> Hi,
> >>
> >> Yes, I will try to explain myself better.
> >>
> >> *Initially*
> >> On node1 (vdicnode01-priv)
> >>> virsh list
> >> ==
> >> vdicdb01 started
> >>
> >> On node2 (vdicnode02-priv)
> >>> virsh list
> >> ==
> >> vdicdb02 started
> >>
> >> --> Now, I execute the migrate command (outside the cluster <-- not 
> using
> >> pcs resource move)
> >> virsh migrate --live vdicdb01 qemu:/// qemu+ssh://vdicnode02-priv
> >> tcp://vdicnode02-priv
> >
> > One of the rules of successful clustering is: If resurces are managed 
> by the cluster, they are managed by the cluster only! ;-)
> >
> > I guess one node is trying to restart the VM once it vanished, and the 
> other node might try to shut down the VM while it's being migrated.
> > Or any other undesired combination...
> 
> 
> As Ulrich says here, you can't use virsh to manage VMs once they are
> managed by the cluster. Instead, configure your cluster to support live
> migration:
> 
> 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-migrating-resources
> 
> 
> 
> and then use pcs resource move (which is just location constraints under
> the hood) to move VMs.
> 
> What's happening in your example is:
> 
> * Your VM cluster resource has a monitor operation ensuring that is
> running properly on the desired node.
> 
> * It is also possible to configure a monitor to ensure that the resource
> is not running on nodes where it's not supposed to be (a monitor with
> role="Stopped"). You don't have one of these (which is fine, and
> common).
> 
> * When you move the VM, the cluster detects that it is not running on
> the node you told it to keep it running on. Because there is no
> "Stopped" monitor, the cluster doesn't immediately realize that a new
> rogue instance is running on another node. So, the cluster thinks the VM
> crashed on the original node, and recovers it by starting it again.
> 
> If your goal is to take a VM out of cluster management without stopping
> it, you can "unmanage" the resource.
> 
> 
> >> *Finally*
> >> On node1 (vdicnode01-priv)
> >>> virsh list
> >> ==
> >> *vdicdb01 started*
> >>
> >> On node2 (vdicnode02-priv)
> >>> virsh list
> >> ==
> >> vdicdb02 started
> >> vdicdb01 started
> >>
> >> If I query cluster pcs status, cluster thinks 

Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-17 Thread Oscar Segarra
Hi,

* It is also possible to configure a monitor to ensure that the resource
is not running on nodes where it's not supposed to be (a monitor with
role="Stopped"). You don't have one of these (which is fine, and common).

Can you provide more information/documentation about role="Stopped"

And, please, can you explain how VirtualDomain resource agents manages the
scenario I've presented?

*What happens If I stop pacemaker and corosync services in all nodes and I
start them again... ¿will I have all guests running twice?*

Thanks a lot

2017-01-17 16:38 GMT+01:00 Ken Gaillot :

> On 01/17/2017 08:52 AM, Ulrich Windl wrote:
>  Oscar Segarra  schrieb am 17.01.2017 um
> 10:15 in
> > Nachricht
> > :
> >> Hi,
> >>
> >> Yes, I will try to explain myself better.
> >>
> >> *Initially*
> >> On node1 (vdicnode01-priv)
> >>> virsh list
> >> ==
> >> vdicdb01 started
> >>
> >> On node2 (vdicnode02-priv)
> >>> virsh list
> >> ==
> >> vdicdb02 started
> >>
> >> --> Now, I execute the migrate command (outside the cluster <-- not
> using
> >> pcs resource move)
> >> virsh migrate --live vdicdb01 qemu:/// qemu+ssh://vdicnode02-priv
> >> tcp://vdicnode02-priv
> >
> > One of the rules of successful clustering is: If resurces are managed by
> the cluster, they are managed by the cluster only! ;-)
> >
> > I guess one node is trying to restart the VM once it vanished, and the
> other node might try to shut down the VM while it's being migrated.
> > Or any other undesired combination...
>
>
> As Ulrich says here, you can't use virsh to manage VMs once they are
> managed by the cluster. Instead, configure your cluster to support live
> migration:
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-
> single/Pacemaker_Explained/index.html#s-migrating-resources
>
> and then use pcs resource move (which is just location constraints under
> the hood) to move VMs.
>
> What's happening in your example is:
>
> * Your VM cluster resource has a monitor operation ensuring that is
> running properly on the desired node.
>
> * It is also possible to configure a monitor to ensure that the resource
> is not running on nodes where it's not supposed to be (a monitor with
> role="Stopped"). You don't have one of these (which is fine, and common).
>
> * When you move the VM, the cluster detects that it is not running on
> the node you told it to keep it running on. Because there is no
> "Stopped" monitor, the cluster doesn't immediately realize that a new
> rogue instance is running on another node. So, the cluster thinks the VM
> crashed on the original node, and recovers it by starting it again.
>
> If your goal is to take a VM out of cluster management without stopping
> it, you can "unmanage" the resource.
>
>
> >> *Finally*
> >> On node1 (vdicnode01-priv)
> >>> virsh list
> >> ==
> >> *vdicdb01 started*
> >>
> >> On node2 (vdicnode02-priv)
> >>> virsh list
> >> ==
> >> vdicdb02 started
> >> vdicdb01 started
> >>
> >> If I query cluster pcs status, cluster thinks resource vm-vdicdb01 is
> only
> >> started on node vdicnode01-priv.
> >>
> >> Thanks a lot.
> >>
> >>
> >>
> >> 2017-01-17 10:03 GMT+01:00 emmanuel segura :
> >>
> >>> sorry,
> >>>
> >>> But do you mean, when you say, you migrated the vm outside of the
> >>> cluster? one server out side of you cluster?
> >>>
> >>> 2017-01-17 9:27 GMT+01:00 Oscar Segarra :
>  Hi,
> 
>  I have configured a two node cluster whewe run 4 kvm guests on.
> 
>  The hosts are:
>  vdicnode01
>  vdicnode02
> 
>  And I have created a dedicated network card for cluster management. I
> >>> have
>  created required entries in /etc/hosts:
>  vdicnode01-priv
>  vdicnode02-priv
> 
>  The four guests have collocation rules in order to make them
> distribute
>  proportionally between my two nodes.
> 
>  The problem I have is that if I migrate a guest outside the cluster, I
> >>> mean
>  using the virsh migrate - - live...  Cluster,  instead of moving back
> the
>  guest to its original node (following collocation sets),  Cluster
> starts
>  again the guest and suddenly I have the same guest running on both
> nodes
>  causing xfs corruption in guest.
> 
>  Is there any configuration applicable to avoid this unwanted behavior?
> 
>  Thanks a lot
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users


Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-17 Thread Ken Gaillot
On 01/17/2017 08:52 AM, Ulrich Windl wrote:
 Oscar Segarra  schrieb am 17.01.2017 um 10:15 in
> Nachricht
> :
>> Hi,
>>
>> Yes, I will try to explain myself better.
>>
>> *Initially*
>> On node1 (vdicnode01-priv)
>>> virsh list
>> ==
>> vdicdb01 started
>>
>> On node2 (vdicnode02-priv)
>>> virsh list
>> ==
>> vdicdb02 started
>>
>> --> Now, I execute the migrate command (outside the cluster <-- not using
>> pcs resource move)
>> virsh migrate --live vdicdb01 qemu:/// qemu+ssh://vdicnode02-priv
>> tcp://vdicnode02-priv
> 
> One of the rules of successful clustering is: If resurces are managed by the 
> cluster, they are managed by the cluster only! ;-)
> 
> I guess one node is trying to restart the VM once it vanished, and the other 
> node might try to shut down the VM while it's being migrated.
> Or any other undesired combination...


As Ulrich says here, you can't use virsh to manage VMs once they are
managed by the cluster. Instead, configure your cluster to support live
migration:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-migrating-resources

and then use pcs resource move (which is just location constraints under
the hood) to move VMs.

What's happening in your example is:

* Your VM cluster resource has a monitor operation ensuring that is
running properly on the desired node.

* It is also possible to configure a monitor to ensure that the resource
is not running on nodes where it's not supposed to be (a monitor with
role="Stopped"). You don't have one of these (which is fine, and common).

* When you move the VM, the cluster detects that it is not running on
the node you told it to keep it running on. Because there is no
"Stopped" monitor, the cluster doesn't immediately realize that a new
rogue instance is running on another node. So, the cluster thinks the VM
crashed on the original node, and recovers it by starting it again.

If your goal is to take a VM out of cluster management without stopping
it, you can "unmanage" the resource.


>> *Finally*
>> On node1 (vdicnode01-priv)
>>> virsh list
>> ==
>> *vdicdb01 started*
>>
>> On node2 (vdicnode02-priv)
>>> virsh list
>> ==
>> vdicdb02 started
>> vdicdb01 started
>>
>> If I query cluster pcs status, cluster thinks resource vm-vdicdb01 is only
>> started on node vdicnode01-priv.
>>
>> Thanks a lot.
>>
>>
>>
>> 2017-01-17 10:03 GMT+01:00 emmanuel segura :
>>
>>> sorry,
>>>
>>> But do you mean, when you say, you migrated the vm outside of the
>>> cluster? one server out side of you cluster?
>>>
>>> 2017-01-17 9:27 GMT+01:00 Oscar Segarra :
 Hi,

 I have configured a two node cluster whewe run 4 kvm guests on.

 The hosts are:
 vdicnode01
 vdicnode02

 And I have created a dedicated network card for cluster management. I
>>> have
 created required entries in /etc/hosts:
 vdicnode01-priv
 vdicnode02-priv

 The four guests have collocation rules in order to make them distribute
 proportionally between my two nodes.

 The problem I have is that if I migrate a guest outside the cluster, I
>>> mean
 using the virsh migrate - - live...  Cluster,  instead of moving back the
 guest to its original node (following collocation sets),  Cluster starts
 again the guest and suddenly I have the same guest running on both nodes
 causing xfs corruption in guest.

 Is there any configuration applicable to avoid this unwanted behavior?

 Thanks a lot

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-17 Thread Souvignier, Daniel
Hi,

if you have a cluster between VMs, please note that it may be problematic to
use multicast (which is the default after setting up the cluster with pcs
cluster setup), instead use unicast. That was what I ran into initially.

Regards,
Daniel



--
Daniel Souvignier

IT Center
Gruppe: Linux-basierte Anwendungen
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel.: +49 241 80-29267
souvign...@itc.rwth-aachen.de
www.itc.rwth-aachen.de


-Ursprüngliche Nachricht-
Von: Ulrich Windl [mailto:ulrich.wi...@rz.uni-regensburg.de] 
Gesendet: Dienstag, 17. Januar 2017 15:53
An: users@clusterlabs.org
Betreff: [ClusterLabs] Antw: Re: VirtualDomain started in two hosts

>>> Oscar Segarra <oscar.sega...@gmail.com> schrieb am 17.01.2017 um 
>>> 10:15 in
Nachricht
<cajq8tag8vhx5j1xqpqmrq-9omfnxkhqs54mbzz491_6df9a...@mail.gmail.com>:
> Hi,
> 
> Yes, I will try to explain myself better.
> 
> *Initially*
> On node1 (vdicnode01-priv)
>>virsh list
> ==
> vdicdb01 started
> 
> On node2 (vdicnode02-priv)
>>virsh list
> ==
> vdicdb02 started
> 
> --> Now, I execute the migrate command (outside the cluster <-- not 
> --> using
> pcs resource move)
> virsh migrate --live vdicdb01 qemu:/// qemu+ssh://vdicnode02-priv 
> tcp://vdicnode02-priv

One of the rules of successful clustering is: If resurces are managed by the
cluster, they are managed by the cluster only! ;-)

I guess one node is trying to restart the VM once it vanished, and the other
node might try to shut down the VM while it's being migrated.
Or any other undesired combination...

> 
> *Finally*
> On node1 (vdicnode01-priv)
>>virsh list
> ==
> *vdicdb01 started*
> 
> On node2 (vdicnode02-priv)
>>virsh list
> ==
> vdicdb02 started
> vdicdb01 started
> 
> If I query cluster pcs status, cluster thinks resource vm-vdicdb01 is 
> only started on node vdicnode01-priv.
> 
> Thanks a lot.
> 
> 
> 
> 2017-01-17 10:03 GMT+01:00 emmanuel segura <emi2f...@gmail.com>:
> 
>> sorry,
>>
>> But do you mean, when you say, you migrated the vm outside of the 
>> cluster? one server out side of you cluster?
>>
>> 2017-01-17 9:27 GMT+01:00 Oscar Segarra <oscar.sega...@gmail.com>:
>> > Hi,
>> >
>> > I have configured a two node cluster whewe run 4 kvm guests on.
>> >
>> > The hosts are:
>> > vdicnode01
>> > vdicnode02
>> >
>> > And I have created a dedicated network card for cluster management. 
>> > I
>> have
>> > created required entries in /etc/hosts:
>> > vdicnode01-priv
>> > vdicnode02-priv
>> >
>> > The four guests have collocation rules in order to make them 
>> > distribute proportionally between my two nodes.
>> >
>> > The problem I have is that if I migrate a guest outside the 
>> > cluster, I
>> mean
>> > using the virsh migrate - - live...  Cluster,  instead of moving 
>> > back the guest to its original node (following collocation sets),  
>> > Cluster starts again the guest and suddenly I have the same guest 
>> > running on both nodes causing xfs corruption in guest.
>> >
>> > Is there any configuration applicable to avoid this unwanted behavior?
>> >
>> > Thanks a lot
>> >
>> > ___
>> > Users mailing list: Users@clusterlabs.org 
>> > http://lists.clusterlabs.org/mailman/listinfo/users
>> >
>> > Project Home: http://www.clusterlabs.org Getting started: 
>> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>>
>>
>>
>> --
>>   .~.
>>   /V\
>>  //  \\
>> /(   )\
>> ^`~'^
>>
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


smime.p7s
Description: S/MIME cryptographic signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: VirtualDomain started in two hosts

2017-01-17 Thread Ulrich Windl
>>> Oscar Segarra  schrieb am 17.01.2017 um 10:15 in
Nachricht
:
> Hi,
> 
> Yes, I will try to explain myself better.
> 
> *Initially*
> On node1 (vdicnode01-priv)
>>virsh list
> ==
> vdicdb01 started
> 
> On node2 (vdicnode02-priv)
>>virsh list
> ==
> vdicdb02 started
> 
> --> Now, I execute the migrate command (outside the cluster <-- not using
> pcs resource move)
> virsh migrate --live vdicdb01 qemu:/// qemu+ssh://vdicnode02-priv
> tcp://vdicnode02-priv

One of the rules of successful clustering is: If resurces are managed by the 
cluster, they are managed by the cluster only! ;-)

I guess one node is trying to restart the VM once it vanished, and the other 
node might try to shut down the VM while it's being migrated.
Or any other undesired combination...

> 
> *Finally*
> On node1 (vdicnode01-priv)
>>virsh list
> ==
> *vdicdb01 started*
> 
> On node2 (vdicnode02-priv)
>>virsh list
> ==
> vdicdb02 started
> vdicdb01 started
> 
> If I query cluster pcs status, cluster thinks resource vm-vdicdb01 is only
> started on node vdicnode01-priv.
> 
> Thanks a lot.
> 
> 
> 
> 2017-01-17 10:03 GMT+01:00 emmanuel segura :
> 
>> sorry,
>>
>> But do you mean, when you say, you migrated the vm outside of the
>> cluster? one server out side of you cluster?
>>
>> 2017-01-17 9:27 GMT+01:00 Oscar Segarra :
>> > Hi,
>> >
>> > I have configured a two node cluster whewe run 4 kvm guests on.
>> >
>> > The hosts are:
>> > vdicnode01
>> > vdicnode02
>> >
>> > And I have created a dedicated network card for cluster management. I
>> have
>> > created required entries in /etc/hosts:
>> > vdicnode01-priv
>> > vdicnode02-priv
>> >
>> > The four guests have collocation rules in order to make them distribute
>> > proportionally between my two nodes.
>> >
>> > The problem I have is that if I migrate a guest outside the cluster, I
>> mean
>> > using the virsh migrate - - live...  Cluster,  instead of moving back the
>> > guest to its original node (following collocation sets),  Cluster starts
>> > again the guest and suddenly I have the same guest running on both nodes
>> > causing xfs corruption in guest.
>> >
>> > Is there any configuration applicable to avoid this unwanted behavior?
>> >
>> > Thanks a lot
>> >
>> > ___
>> > Users mailing list: Users@clusterlabs.org 
>> > http://lists.clusterlabs.org/mailman/listinfo/users 
>> >
>> > Project Home: http://www.clusterlabs.org 
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> > Bugs: http://bugs.clusterlabs.org 
>> >
>>
>>
>>
>> --
>>   .~.
>>   /V\
>>  //  \\
>> /(   )\
>> ^`~'^
>>
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> http://lists.clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>>





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org