Re: [Linux-HA] Xen RA and rebooting

2013-09-18 Thread Ferenc Wagner
Tom Parker tpar...@cbnco.com writes:

 On 09/17/2013 04:18 AM, Lars Marowsky-Bree wrote:

 On 2013-09-16T16:36:38, Tom Parker tpar...@cbnco.com wrote:

 It definitely leads to data corruption and I think has to do with the
 way that the locking is not working properly on my lvm partitions. 

 Well, not really an LVM issue. The RA thinks the guest is gone, the
 cluster reacts and schedules it to be started (perhaps elsewhere); and
 then the hypervisor starts it locally again *too*.

 I mean the locking of the LVs.  I should not be able to mount the same
 LV in two places.  I know I can lock each LV exclusive to a node but I
 am not sure how to tell the RA to do that for me.

CLVM can provide exclusive activation, but that would make live
migration impossible.
-- 
Feri.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Xen RA and rebooting

2013-09-17 Thread Lars Marowsky-Bree
On 2013-09-16T16:36:38, Tom Parker tpar...@cbnco.com wrote:

  Can you kindly file a bug report here so it doesn't get lost
  https://github.com/ClusterLabs/resource-agents/issues ?
 Submitted (Issue *#308)*

Thanks.

 It definitely leads to data corruption and I think has to do with the
 way that the locking is not working properly on my lvm partitions. 

Well, not really an LVM issue. The RA thinks the guest is gone, the
cluster reacts and schedules it to be started (perhaps elsewhere); and
then the hypervisor starts it locally again *too*.

I think changing those libvirt settings to destroy could work - the
cluster will then restart the guest appropriately, not the hypervisor.


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Xen RA and rebooting

2013-09-17 Thread Ferenc Wagner
Lars Marowsky-Bree l...@suse.com writes:

 The RA thinks the guest is gone, the cluster reacts and schedules it
 to be started (perhaps elsewhere); and then the hypervisor starts it
 locally again *too*.

 I think changing those libvirt settings to destroy could work - the
 cluster will then restart the guest appropriately, not the hypervisor.

Maybe the RA is just too picky about the reported VM state.  This is one
of the reasons* I'm using my own RA for managing libvirt virtual
domains: mine does not care about the fine points, if the domain is
active in any state, it's running, as far as the RA is concerned, so a
domain reset is not a cluster event in any case.

On the other hand, doesn't the recover action after a monitor failure
consist of a stop action on the original host before the new start, just
to make sure?  Or maybe I'm confusing things...

Regards,
Feri.

* Another is that mine gets the VM definition as a parameter, not via
  some shared filesystem.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Xen RA and rebooting

2013-09-17 Thread Lars Marowsky-Bree
On 2013-09-17T11:38:34, Ferenc Wagner wf...@niif.hu wrote:

 On the other hand, doesn't the recover action after a monitor failure
 consist of a stop action on the original host before the new start, just
 to make sure?  Or maybe I'm confusing things...

Yes, it would - but it seems there's a brief period during reboot where
the guest is shown as gone/cleanly stopped, and then the stop action
will just see the very same.

Actually that strikes me as a problem with Xen/libvirt's reporting.

Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Xen RA and rebooting

2013-09-17 Thread Ferenc Wagner
Lars Marowsky-Bree l...@suse.com writes:

 On 2013-09-17T11:38:34, Ferenc Wagner wf...@niif.hu wrote:

 On the other hand, doesn't the recover action after a monitor failure
 consist of a stop action on the original host before the new start, just
 to make sure?  Or maybe I'm confusing things...

 Yes, it would - but it seems there's a brief period during reboot where
 the guest is shown as gone/cleanly stopped, and then the stop action
 will just see the very same.

 Actually that strikes me as a problem with Xen/libvirt's reporting.

Absolutely.  KVM under libvirt does not exhibit such behaviour on our
systems, and I find this most natural and correct.
-- 
Regards,
Feri.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Xen RA and rebooting

2013-09-17 Thread Tom Parker

On 09/17/2013 01:13 AM, Vladislav Bogdanov wrote:
 14.09.2013 07:28, Tom Parker wrote:
 Hello All

 Does anyone know of a good way to prevent pacemaker from declaring a vm
 dead if it's rebooted from inside the vm.  It seems to be detecting the
 vm as stopped for the brief moment between shutting down and starting
 up.  Often this causes the cluster to have two copies of the same vm if
 the locks are not set properly (which I have found to be unreliable) one
 that is managed and one that is abandonded.

 If anyone has any suggestions or parameters that I should be tweaking
 that would be appreciated.
 I use following in libvirt VM definitions to prevent this:
   on_poweroffdestroy/on_poweroff
   on_rebootdestroy/on_reboot
   on_crashdestroy/on_crash

 Vladislav
Does this not show as a lot of failed operations?  I guess they will
clean themselves up after the failure expires.

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Xen RA and rebooting

2013-09-17 Thread Vladislav Bogdanov
17.09.2013 20:51, Tom Parker wrote:
 
 On 09/17/2013 01:13 AM, Vladislav Bogdanov wrote:
 14.09.2013 07:28, Tom Parker wrote:
 Hello All

 Does anyone know of a good way to prevent pacemaker from declaring a vm
 dead if it's rebooted from inside the vm.  It seems to be detecting the
 vm as stopped for the brief moment between shutting down and starting
 up.  Often this causes the cluster to have two copies of the same vm if
 the locks are not set properly (which I have found to be unreliable) one
 that is managed and one that is abandonded.

 If anyone has any suggestions or parameters that I should be tweaking
 that would be appreciated.
 I use following in libvirt VM definitions to prevent this:
   on_poweroffdestroy/on_poweroff
   on_rebootdestroy/on_reboot
   on_crashdestroy/on_crash

 Vladislav
 Does this not show as a lot of failed operations?  I guess they will
 clean themselves up after the failure expires.

Exactly. And this is much better than data corruption.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Xen RA and rebooting

2013-09-17 Thread Tom Parker

On 09/17/2013 04:18 AM, Lars Marowsky-Bree wrote:
 On 2013-09-16T16:36:38, Tom Parker tpar...@cbnco.com wrote:

 Can you kindly file a bug report here so it doesn't get lost
 https://github.com/ClusterLabs/resource-agents/issues ?
 Submitted (Issue *#308)*
 Thanks.

 It definitely leads to data corruption and I think has to do with the
 way that the locking is not working properly on my lvm partitions. 
 Well, not really an LVM issue. The RA thinks the guest is gone, the
 cluster reacts and schedules it to be started (perhaps elsewhere); and
 then the hypervisor starts it locally again *too*.
I mean the locking of the LVs.  I should not be able to mount the same
LV in two places.  I know I can lock each LV exclusive to a node but I
am not sure how to tell the RA to do that for me.  At the moment I am
activating a VG with the LVM RA and that is shared across all my
physical machines.  If I do exclusive activation I think that locks the
vg to a particular node instead of the LVs.

 I think changing those libvirt settings to destroy could work - the
 cluster will then restart the guest appropriately, not the hypervisor.


 Regards,
 Lars


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Xen RA and rebooting

2013-09-16 Thread Tom Parker

On 09/14/2013 07:18 AM, Lars Marowsky-Bree wrote:
 On 2013-09-14T00:28:30, Tom Parker tpar...@cbnco.com wrote:

 Does anyone know of a good way to prevent pacemaker from declaring a vm
 dead if it's rebooted from inside the vm.  It seems to be detecting the
 vm as stopped for the brief moment between shutting down and starting
 up. 
 Hrm. Good question. Because to the monitor, it really looks as if the VM
 is temporarily gone, and it doesn't know ... Perhaps we need to keep
 looking for it for a few seconds.

 Can you kindly file a bug report here so it doesn't get lost
 https://github.com/ClusterLabs/resource-agents/issues ?
Submitted (Issue *#308)*
 Often this causes the cluster to have two copies of the same vm if the
 locks are not set properly (which I have found to be unreliable) one
 that is managed and one that is abandonded.
 *This* however is really, really worrisome and sounds like data
 corruption. How is this happening?
It definitely leads to data corruption and I think has to do with the
way that the locking is not working properly on my lvm partitions.   It
seems to mostly happen on clusters where I am using lvm slices on an MSA
as shared storage (they don't seem to lock at the lv level) and the
placement-strategy is utilization.  If Xen reboots and the cluster
declares the vm as dead it seems to try to start it on another node that
has more resources instead of the node where it was running.  It doesn't
happen consistently enough for me to detect a pattern and seems to never
happen on my QA system where I can actually cause corruption without
anyone getting mad.  If I can isolate how it happens I will file a bug.


 The work-around right now is to put the VM resource into maintenance
 mode for the reboot, or to reboot it via stop/start of the cluster
 manager.


 Regards,
 Lars


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Xen RA and rebooting

2013-09-16 Thread Vladislav Bogdanov
14.09.2013 07:28, Tom Parker wrote:
 Hello All
 
 Does anyone know of a good way to prevent pacemaker from declaring a vm
 dead if it's rebooted from inside the vm.  It seems to be detecting the
 vm as stopped for the brief moment between shutting down and starting
 up.  Often this causes the cluster to have two copies of the same vm if
 the locks are not set properly (which I have found to be unreliable) one
 that is managed and one that is abandonded.
 
 If anyone has any suggestions or parameters that I should be tweaking
 that would be appreciated.

I use following in libvirt VM definitions to prevent this:
  on_poweroffdestroy/on_poweroff
  on_rebootdestroy/on_reboot
  on_crashdestroy/on_crash

Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Xen RA and rebooting

2013-09-14 Thread Lars Marowsky-Bree
On 2013-09-14T00:28:30, Tom Parker tpar...@cbnco.com wrote:

 Does anyone know of a good way to prevent pacemaker from declaring a vm
 dead if it's rebooted from inside the vm.  It seems to be detecting the
 vm as stopped for the brief moment between shutting down and starting
 up. 

Hrm. Good question. Because to the monitor, it really looks as if the VM
is temporarily gone, and it doesn't know ... Perhaps we need to keep
looking for it for a few seconds.

Can you kindly file a bug report here so it doesn't get lost
https://github.com/ClusterLabs/resource-agents/issues ?

 Often this causes the cluster to have two copies of the same vm if the
 locks are not set properly (which I have found to be unreliable) one
 that is managed and one that is abandonded.

*This* however is really, really worrisome and sounds like data
corruption. How is this happening?


The work-around right now is to put the VM resource into maintenance
mode for the reboot, or to reboot it via stop/start of the cluster
manager.


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Xen RA and rebooting

2013-09-13 Thread Tom Parker
Hello All

Does anyone know of a good way to prevent pacemaker from declaring a vm
dead if it's rebooted from inside the vm.  It seems to be detecting the
vm as stopped for the brief moment between shutting down and starting
up.  Often this causes the cluster to have two copies of the same vm if
the locks are not set properly (which I have found to be unreliable) one
that is managed and one that is abandonded.

If anyone has any suggestions or parameters that I should be tweaking
that would be appreciated.

Tom
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems