Hi Greg, Thank you for proposal. #BTW, I replied to our discussion in [1].
Masakari mainly focuses on black box monitoring the VMs. But that does not mean Masakari do not do white box type of monitoring. There will be a configuration options for operators for whether to use it or not and how to configure it. For masakari, this is one of the ways to extend its instance monitoring capabilities. I really appreciate it if you could write a spec for this in [2], and it will help masakari community and openstack-ha community to understand the requirements and support them in future developments. [1] http://lists.openstack.org/pipermail/openstack-dev/2017-May/117003.html [2] https://github.com/openstack/masakari-specs --- Regards, Sampath On Thu, May 18, 2017 at 6:15 AM, Waines, Greg <[email protected]> wrote: > ( I have been having a discussion with Adam Spiers on > [openstack-dev][vitrage][nova] on this topic ... thought I would switchover > to [masakari] ) > > > > I am interested in contributing an implementation of Intrusive Instance > Monitoring, > > initially specifically VM Heartbeat / Heath-check Monitoring thru the QEMU > Guest Agent (https://wiki.libvirt.org/page/Qemu_guest_agent). > > > > I’d like to know whether Masakari project leaders would consider a blueprint > on “VM Heartbeat / Health-check Monitoring”. > > See below for some more details, > > Greg. > > > > ------------------------------------- > > > > > > VM Heartbeating / Health-check Monitoring would introduce intrusive / > white-box type monitoring of VMs / Instances to Masakari. > > > > Briefly, “VM Heartbeat / Health-check Monitoring” > > · is optionally enabled thru a Nova flavor extra-spec, > > · is a service that runs on an OpenStack Compute Node, > > · it sends periodic Heartbeat / Health-check Challenge Requests to a > VM > over a virtio-serial-device setup between the Compute Node and the VM thru > QEMU, > ( https://wiki.libvirt.org/page/Qemu_guest_agent ) > > · on loss of heartbeat or a failed health check status will result > in fault event, against the VM, being > reported to Masakari and any other registered reporting backends like > Mistral, or Vitrage. > > > > I realize this is somewhat in the gray-zone of what a cloud should be > monitoring or not, > > but I believe it provides an alternative for Applications deployed in VMs > that do not have an external monitoring/management entity like a VNF Manager > in the MANO architecture. > > And even for VMs with VNF Managers, it provides a highly reliable alternate > monitoring path that does not rely on Tenant Networking. > > > > VM HB/HC Monitoring would leverage > https://wiki.libvirt.org/page/Qemu_guest_agent > > that would require the agent to be installed in the images for talking back > to the compute host. > > ( there are other examples of similar approaches in openstack ... the > murano-agent for installation, the swift-agent for object store management ) > > Although here, in the case of VM HB/HC Monitoring, via the QEMU Guest Agent, > the messaging path is internal thru a QEMU virtual serial device. i.e. a > very simple interface with very few dependencies ... it’s up and available > very early in VM lifecycle and virtually always up. > > > > Wrt failure modes / use-cases > > · a VM’s response to a Heartbeat Challenge Request can be as simple > as just ACK-ing, > this alone allows for detection of: > > o a failed or hung QEMU/KVM instance, or > > o a failed or hung VM’s OS, or > > o a failure of the VM’s OS to schedule the QEMU Guest Agent daemon, or > > o a failure of the VM to route basic IO via linux sockets. > > · I have had feedback that this is similar to the virtual hardware > watchdog of QEMU/KVM (https://libvirt.org/formatdomain.html#elementsWatchdog > ) > > · However, the VM Heartbeat / Health-check Monitoring > > o provides a higher-level (i.e. application-level) heartbeating > > § i.e. if the Heartbeat requests are being answered by the Application > running within the VM > > o provides more than just heartbeating, as the Application can use it to > trigger a variety of audits, > > o provides a mechanism for the Application within the VM to report a > Health Status / Info back to the Host / Cloud, > > o provides notification of the Heartbeat / Health-check status to > higher-level cloud entities thru Masakari, Mistral and/or Vitrage > > § e.g. VM-Heartbeat-Monitor - to - Vitrage - (EventAlarm) - Aodh - ... - > VNF-Manager > > - (StateChange) - Nova - ... - VNF Manager > > > > NOTE: perhaps the reporting to Vitrage would be a separate blueprint within > Masakari. > > > > > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
