Hi Ilya, How are you doing Ivan?,

Glad to hear that there's work being done on the HA piece for KVM -
unfortunately until this hits 'production ready' we have to struggle along
with what we've got.

I'm well aware what happens when NFS is disconnected (as this behaviour was
the trigger for us having to dig so deeply into the KVM HA behaviour).

I've attached the patch I've developed to help protect against the worst
excesses of the kvmheartbeat.sh (without totally disabling the reboots).
The essence of the change is that the script now checks for KVM (qemu)
processes using the mountpoint being checked, and only allows a host reboot
to be conducted when VMs are impacted.


Cheers,
Rohan

On Tue, Jul 12, 2016 at 1:15 PM, ilya <ilya.mailing.li...@gmail.com> wrote:

> Rohan
>
> As of now:
> Disconnect the primary NFS from your KVM and see what happens.
>
> In the future release:
>
> Also, HA piece is being rewritten now. The specs are posted by John
> Burwell (and me to a smaller extent) if you search cloudstack mailing
> lists via markmail.org for "KVM HA" you can see the thread with many
> details.
>
> In summary, we will be changing the behavior to something more precise -
> similar to how VmWare does it.
>
> Example: host A, B and C are part of 1 cluster that use a common
> clustered storage
>
> host A hangs and halts the VMs ability to write to disk (or crash the vms)
>
> CloudStack MS will retreive the list of volumes used by VMs for host A
> ask the neighbor host B to check for when the last write has been
> performed.
>
> If all VMs with their disks have no disk activity for predefined
> interval (several intervals), cloudstack MS will use IMPI interface to
> shoot the node in the head.
>
> This is a very high level overview - there is alot more to this with
> many safeguards and tun-able parameters.
>
> Regards
> ilya
>
>
> On 7/11/16 5:33 PM, Rohan T wrote:
> > Hi All,
> >
> > Having been smashed by the unexpected behaviour of the KVM Heartbeat / HA
> > process, we've been working through the logic of the process, and  I now
> > believe the intent of the process is sumarised by:
> >
> >
> > =================
> > The heartbeat process consists of 3 parts:
> >
> > 1. a shell script that's distributed to each of the hypervisors during
> the
> > CloudStack installation process:
> > /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh
> > 2. Two java classes, built into CloudStack
> > com.cloud.hypervisor.kvm.resource.KVMHAMonitor
> > com.cloud.hypervisor.kvm.resource.KVMHAChecker
> >
> > Behaviour
> >
> > Each of the classes periodically calls the kvmheartbeat.sh script with
> > different arguments, the script is used to confirm the existence of NFS
> > mounts,  remount any that are missing, clean up (i.e. kill) VMs in
> > indeterminate state, read and write heartbeats to NFS volumes and force
> the
> > host hypervisor to reboot (as part of a "shoot the node in the head"
> > approach to restoring sanity to the cluster).
> >
> > The KVMHAMonitor script writes a timestamp to each of the NFS volumes
> > (pools), each minute,  if this process times out  (4 times), then calls
> the
> > script once more to force a spontaneous reboot of the host (via: echo b >
> > /proc/sysrq_trigger).
> >
> > The KVMHAChecker is responsible for triggering the script to read the
> > heartbeat value and compare with the current timestamp. Where ALL NFS
> > volumes are determined to be "DEAD" (i.e timestamp is older than 60
> > seconds),
> >
> > ================
> >
> > Is my understanding correct?
> >
> > The problem is, when testing this logic in my test lab (currently 4.4.4,
> > but there's been no significant updates committed to these files since),
> > I've been unable to see any evidence of the KVMHAChecker actually
> > executing!  I see plenty of evidence of heartbeat writes (and of
> hypervisor
> > reboots triggered when this process timesout).
> >
> >
> > Thanks,
> > Rohan
> >
>
--- kvmheartbeat.sh.bak	2016-07-14 14:56:31.151499359 +1000
+++ kvmheartbeat.sh	2016-07-15 10:27:40.658151974 +1000
@@ -71,6 +71,15 @@
    exit 1
 fi
 
+/usr/bin/logger -t heartbeat -p user.debug "kvmheartbeat.sh called with $*"
+
+#Before we start, how many parallel executions are there, (Expecting these to be hung/unkillable/zombie). Exit if it's excessive
+numScripts=`ps axfw | grep -- "${0} ${*}" | grep -v grep |  awk -v RS='$\3' -v FS='\n' '{ print NF-1; }'`
+if [ ${numScripts} -gt 12 ]
+then
+        /usr/bin/logger -t heartbeat -p user.debug "kvmheartbeat.sh too many stuck executions, exiting."
+        exit 0
+fi
 
 #delete VMs on this mountpoint
 deleteVMs() {
@@ -155,6 +164,15 @@
   exit 0
 elif [ "$cflag" == "1" ]
 then
+   if [ -n "$MountPoint" ] ; then
+        vmPids=$(ps aux| grep qemu | grep "$MountPoint" | awk '{print $2}' 2> /dev/null)
+   else
+	vmPids=$(ps aux| grep qemu | grep -v grep | awk '{print $2}' 2> /dev/null)
+   fi
+   if [ -z "$vmPids" ] ; then
+	/usr/bin/logger -t heartbeat -p user.info "kvmheartbeat.sh reboot requested, but no VM found running on affected storage. Returning."
+	exit 0
+   fi
   /usr/bin/logger -t heartbeat "kvmheartbeat.sh rebooted system because it was unable to write the heartbeat to the storage."
   sync &
   sleep 5

Reply via email to