On Tue, Mar 18, 2014 at 10:54:08PM +0800, Qiming Teng wrote: > Hi, Folks, > > I have been trying to implement a HACluster resource type in Heat. I > haven't created a BluePrint for this because I am not sure everything > will work as expected. > > The basic idea is to extend the OS::Heat::ResourceGroup resource type > with inner resource types fixed to be OS::Nova::Server. Properties for > this HACluster resource may include: > > - init_size: initial number of Server instances; > - min_size: minimal number of Server instances; > - sig_handler: a reference to a sub-class of SignalResponder; > - zones: a list of strings representing the availability zones, which > could be a names of the rack where the Server can be booted; > - recovery_action: a list of supported failure recovery actions, such > as 'restart', 'remote-restart', 'migrate'; > - fencing_options: a dict specifying what to do to shutdown the Server > in a clean way so that data consistency in storage and network are > reserved; > - resource_ref: a dict for defining the Server instances to be > created. > > Attributes of the HACluster may include: > - refs: a list of resource IDs for the currently active Servers; > - ips: a list of IP addresses for convenience. > > Note that the 'remote-restart' action above is today referred to as > 'evacuate'. > > The most difficult issue here is to come up with a reliable VM failure > detection mechanism. The service_group feature in Nova only concerns > about the OpenStack services themselves, not the VMs. Considering that > in our customer's cloud environment, user provided images can be used, > we cannot assume some agents in the VMs to send heartbeat signals. > > I have checked the 'instance' table in Nova database, it seemed that > the 'update_at' column is only updated when VM state changed and > reported. If the 'heartbeat' messages are coming in from many VMs very > frequently, there could be a DB query performance/scalability issue, > right? > > So, how can I detect VM failures reliably, so that I can notify Heat > to take the appropriate recovery action? Hi, Monitoring depends on what that VM is doing. For instance, a VM that hosts a web server will not be monitored the same as an SQL server.
You might also want to take a look here: http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Ceilometer::Alarm > > Regards, > - Qiming > > Research Scientist > IBM Research - China > tengqim at cn dot ibm dot com > > > _______________________________________________ > OpenStack-dev mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ---end quoted text--- -- Best Regards, Amit.
pgpuercI2Q95t.pgp
Description: PGP signature
_______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
