Excerpts from Richard Su's message of 2014-01-27 17:59:34 -0800: > Hi, > > I have been looking into how to add process/service monitoring to > tripleo. Here I want to be able to detect when an openstack dependent > component that is deployed on an instance has failed. And when a failure > has occurred I want to be notified and eventually see it in Tuskar. > > Ceilometer doesn't handle this particular use case today. So I have been > doing some research and there are many options out there that provides > process checks: nagios, sensu, zabbix, and monit. I am a bit wary of > pulling one of these options into tripleo. There is some increased > operational and maintenance costs when pulling in each of them. And > physical device monitoring is currently in the works for Ceilometer > lessening the need for some of the other abilities that an another > monitoring tool would provide. > > For the particular use case of monitoring processes/services, at a high > level, I am considering writing a simple daemon to perform the check. > Checks and failures are written out as messages to the notification bus. > Interested parties like Tuskar or Ceilometer can subscribe to these > messages. > > In general does this sound like a reasonable approach?
Writing a new one, no. But using notifications in OpenStack: yes! I suggest finding the simplest one possible and teaching it to send OpenStack notifications. > > There is also the question of how to configure or figure out which > processes we are interested in monitoring. I need to do more research > here but I'm considering either looking at the elements listed by > diskimage-builder or by looking at the orc post-configure.d scripts to > find service that are restarted. > There are basically two things you need to look for: things listening, and things connected to rabbitmq/qpid. So one crazy way to find things to monitor is to look at netstat or ss and just monitor processes doing one of those things. I believe assimilation monitoring's nanoprobe daemon already has the listening part done: http://techthoughts.typepad.com/managing_computers/2012/10/zero-configuration-discovery-and-server-monitoring-in-the-assimilation-monitoring-project.html Also you may want to do two orc scripts in post-configure.d: 00-disruption-coming-stop-process-monitor 99-all-clear-start-process-monitor Anyway, as Robert says, just keep it modular so that orgs that already have a rich set of tools for this will be able to replace it. _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
