Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
On Wed, May 13, 2015 at 3:27 PM, David Kranz dkr...@redhat.com wrote: On 05/13/2015 09:06 AM, Simon Pasquier wrote: Hello, Like many others commented before, I don't quite understand how unique are the Cloudpulse use cases. For operators, I got the feeling that existing solutions fit well: - Traditional monitoring tools (Nagios, Zabbix, ) are necessary anyway for infrastructure monitoring (CPU, RAM, disks, operating system, RabbitMQ, databases and more) and diagnostic purposes. Adding OpenStack service checks is fairly easy if you already have the toolchain. Is it really so easy? Rabbitmq has an aliveness test that is easy to hook into. I don't know exactly what it does, other than what the doc says, but I should not have to. If I want my standard monitoring system to call into a cloud and ask is nova healthy?, is glance healthy?, etc. are their such calls? Regarding RabbitMQ aliveness test, it has its own limits (more on that latter, I've got an interesting RabbitMQ outage that I'm going to discuss in a new thread) and it doesn't replicate exactly what the clients (eg OpenStack services) are doing. Regarding the service checks, there are already plenty of scripts that exist for Nagios, Collectd and so on. Some of them are listed in the Wiki [1]. There are various sets of calls associated with nagios, zabbix, etc. but those seem like after-market parts for a car. Seems to me the services themselves would know best how to check if they are healthy, particularly as that could change version to version. Has their been discussion of adding a health-check (admin) api in each service? Lacking that, is there documentation from any OpenStack projects about how to check the health of nova? When I saw this thread start, that is what I thought it was going to be about. Starting with Kilo, you could configure your OpenStack API services with the healthcheck middleware [2]. This has been inspired by what Swift's been doing for some time now [3].IIUC the default healthcheck is minimalist and doesn't check that dependent services (like RabbitMQ, database) are healthy but the framework is extensible and more healthchecks can be added. -David BR, Simon [1] https://wiki.openstack.org/wiki/Operations/Tools#Monitoring_and_Trending [2] http://docs.openstack.org/developer/oslo.middleware/api.html#oslo_middleware.Healthcheck [3] http://docs.openstack.org/kilo/config-reference/content/object-storage-healthcheck.html - OpenStack projects like Rally or Tempest can generate synthetic loads and run end-to-end tests. Integrating them with a monitoring system isn't terribly difficult either. As far as Monitoring-as-a-service is concerned, do you have plans to integrate/leverage Ceilometer? BR, Simon On Tue, May 12, 2015 at 7:20 PM, Vinod Pandarinathan (vpandari) vpand...@cisco.com wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven Dake and Vinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
On 05/13/2015 09:06 AM, Simon Pasquier wrote: Hello, Like many others commented before, I don't quite understand how unique are the Cloudpulse use cases. For operators, I got the feeling that existing solutions fit well: - Traditional monitoring tools (Nagios, Zabbix, ) are necessary anyway for infrastructure monitoring (CPU, RAM, disks, operating system, RabbitMQ, databases and more) and diagnostic purposes. Adding OpenStack service checks is fairly easy if you already have the toolchain. Is it really so easy? Rabbitmq has an aliveness test that is easy to hook into. I don't know exactly what it does, other than what the doc says, but I should not have to. If I want my standard monitoring system to call into a cloud and ask is nova healthy?, is glance healthy?, etc. are their such calls? There are various sets of calls associated with nagios, zabbix, etc. but those seem like after-market parts for a car. Seems to me the services themselves would know best how to check if they are healthy, particularly as that could change version to version. Has their been discussion of adding a health-check (admin) api in each service? Lacking that, is there documentation from any OpenStack projects about how to check the health of nova? When I saw this thread start, that is what I thought it was going to be about. -David - OpenStack projects like Rally or Tempest can generate synthetic loads and run end-to-end tests. Integrating them with a monitoring system isn't terribly difficult either. As far as Monitoring-as-a-service is concerned, do you have plans to integrate/leverage Ceilometer? BR, Simon On Tue, May 12, 2015 at 7:20 PM, Vinod Pandarinathan (vpandari) vpand...@cisco.com mailto:vpand...@cisco.com wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements are in an operational state. For information about blueprints check out: https://blueprints.launchpad.net/cloudpulse https://blueprints.launchpad.net/python-cloudpulseclient For more details, check out our Wiki: https://wiki.openstack.org/wiki/Cloudpulse Plase join the CloudPulse team in designing and implementing a world-class Carrier Grade system for checking the health of OpenStack clouds. We look forward to seeing you on IRC on #openstack-cloudpulse. Regards, Vinod Pandarinathan [1] https://github.com/openstack-dev/cookiecutter __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
From: David Kranz dkr...@redhat.commailto:dkr...@redhat.com Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Date: Wednesday, May 13, 2015 at 6:27 AM To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments On 05/13/2015 09:06 AM, Simon Pasquier wrote: Hello, Like many others commented before, I don't quite understand how unique are the Cloudpulse use cases. For operators, I got the feeling that existing solutions fit well: - Traditional monitoring tools (Nagios, Zabbix, ) are necessary anyway for infrastructure monitoring (CPU, RAM, disks, operating system, RabbitMQ, databases and more) and diagnostic purposes. Adding OpenStack service checks is fairly easy if you already have the toolchain. Is it really so easy? Rabbitmq has an aliveness test that is easy to hook into. I don't know exactly what it does, other than what the doc says, but I should not have to. If I want my standard monitoring system to call into a cloud and ask is nova healthy?, is glance healthy?, etc. are their such calls? David, I think a healthchecking API per service is a fantastic idea. I would like to see the same thing in OpenStack services. The real way to check health of nova for example, is for Nova to do the job of checking it’s own health. It knows its internals best and can do the job. Maybe this project can introduce API calls and implementations into the major services to do such work. Regards -steve There are various sets of calls associated with nagios, zabbix, etc. but those seem like after-market parts for a car. Seems to me the services themselves would know best how to check if they are healthy, particularly as that could change version to version. Has their been discussion of adding a health-check (admin) api in each service? Lacking that, is there documentation from any OpenStack projects about how to check the health of nova? When I saw this thread start, that is what I thought it was going to be about. -David - OpenStack projects like Rally or Tempest can generate synthetic loads and run end-to-end tests. Integrating them with a monitoring system isn't terribly difficult either. As far as Monitoring-as-a-service is concerned, do you have plans to integrate/leverage Ceilometer? BR, Simon On Tue, May 12, 2015 at 7:20 PM, Vinod Pandarinathan (vpandari) vpand...@cisco.commailto:vpand...@cisco.com wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven Dake and Vinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements are in an operational state. For information about blueprints check out: https://blueprints.launchpad.net/cloudpulse https
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
On 5/12/15, 1:28 PM, Julien Danjou jul...@danjou.info wrote: On Tue, May 12 2015, Steven Dake (stdake) wrote: This is a great idea that would make a solid extension to the software. If I read the wiki page correctly, the real goal is for operators and tenants to be able to be notified via querying the ReST API so they could write their own email/pager-duty app. Then leveraging Ceilometer polling and alarming systems could make you avoid reinventing a large portion of the wheel. Julien, Reading the wiki page, I don¹t expect there would be a need for an agent. But who knows, atm, all the software is is a wiki page ;) If there were a need for agents, the project would definitely use the ceilometer agents and extend there if needed via the normal development process. Regards -steve -- Julien Danjou // Free Software hacker // http://julien.danjou.info __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
On 13 May 2015 at 10:30, Vinod Pandarinathan (vpandari) vpand...@cisco.com wrote: - Traditional monitoring tools (Nagios, Zabbix, ) are necessary anyway for infrastructure monitoring (CPU, RAM, disks, operating system, RabbitMQ, databases and more) and diagnostic purposes. Adding OpenStack service checks is fairly easy if you already have the toolchain. The solution is for health-checking, which includes periodically running light/mid/heavy Control and data plane tests and provide test data. The tool shall not have any dependency on one particular monitoring tool If monitoring tool is installed, then monitoring data shall be exposed to the applications in a consumable fashion. As I mentioned earlier, we are not replacing any monitoring solution available out there we are leveraging those solutions and provide a clean interface so that the application/tenants and Operators know if the cloud is healthy. To rephrase this: - Zabbix and friends will monitor an operator's cloud and tell the operator bad things are happening. Or they can monitor an application's VMs and see if the app is happy, and tell the app or its owner. - Ceilometer will front cloud monitoring solutions and offer those statistics to tenants of the cloud in ways that (ideally) make sense to the client. It lets tenants see stats they couldn't get for themselves. This isn't quite what we're trying to address. We had one specific use case: a cloud application that needs to provide reasonably high availability uses the Openstack APIs occasionally to try and correct problems (VM died, app overloaded, etc.) - a pretty normal cloud application. If you're interested in maintaining service, you need to know about single points of failure to work around them, and the cloud control plane failing is a single point of failure - the APIs stop working, and the app runs just fine until a second failure that causes them to be used, and if you haven't done something by that point you get a meltdown. The idea of CloudPulse was to be able to say 'the cloud APIs are operating normally' to applications that are interested. If they're *not* normal then the application can take corrective action; for instance, spinning up extra capacity in another cloud and moving traffic over there. As you can see, that's a cross-domain sort of monitoring similar to Ceilometer - the tenant finding out information about the infrastructure that they can't see directly. That said, it's a very concise summary ('working'), and we also had in mind that you ran the tests to freshen the results if the tests hadn't been run recently, rather than looping them continually. Also, the history of the results are not really relevant - my app cares about about whether the control plane works *now*, not if it worked for 8 hours out of the last 24. We're scratching an itch. Absolutely the point of mailing everyone about it was to see if anyone had better scratching tools, and if people would like to chat about it at the summit. What seems to have come out of it is that yes, there are tools out there that might be usable for the purpose, and we'd love to hear your opinions and what ideas you have about how we should do this. Apparently there are also a lot of people with slightly different itches to scratch, and I hope you all take the opportunity to get together at the summit too. -- Ian. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
Mooney, Sean K wrote: Will cloudpulse be under the governance of the OpenStack Telemetry program Or will this be an independent StackFoge repository? I think there would be great value in having cloud monitoring or monitoring as a service In the telemetry program. Except we don't do programs anymore, because forcing new projects to inherit another team leadership was not the best way to foster innovation. http://git.openstack.org/cgit/openstack/governance/commit/?id=fcc4046f7d866d0516f2810571aad0c0ce2cc361 -- Thierry Carrez (ttx) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
Will cloudpulse be under the governance of the OpenStack Telemetry program Or will this be an independent StackFoge repository? I think there would be great value in having cloud monitoring or monitoring as a service In the telemetry program. Regards Sean. -Original Message- From: Steven Dake (stdake) [mailto:std...@cisco.com] Sent: Wednesday, May 13, 2015 3:39 PM To: Julien Danjou Cc: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments On 5/12/15, 1:28 PM, Julien Danjou jul...@danjou.info wrote: On Tue, May 12 2015, Steven Dake (stdake) wrote: This is a great idea that would make a solid extension to the software. If I read the wiki page correctly, the real goal is for operators and tenants to be able to be notified via querying the ReST API so they could write their own email/pager-duty app. Then leveraging Ceilometer polling and alarming systems could make you avoid reinventing a large portion of the wheel. Julien, Reading the wiki page, I don¹t expect there would be a need for an agent. But who knows, atm, all the software is is a wiki page ;) If there were a need for agents, the project would definitely use the ceilometer agents and extend there if needed via the normal development process. Regards -steve -- Julien Danjou // Free Software hacker // http://julien.danjou.info __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
On 05/13/2015 09:51 AM, Simon Pasquier wrote: On Wed, May 13, 2015 at 3:27 PM, David Kranz dkr...@redhat.com mailto:dkr...@redhat.com wrote: On 05/13/2015 09:06 AM, Simon Pasquier wrote: Hello, Like many others commented before, I don't quite understand how unique are the Cloudpulse use cases. For operators, I got the feeling that existing solutions fit well: - Traditional monitoring tools (Nagios, Zabbix, ) are necessary anyway for infrastructure monitoring (CPU, RAM, disks, operating system, RabbitMQ, databases and more) and diagnostic purposes. Adding OpenStack service checks is fairly easy if you already have the toolchain. Is it really so easy? Rabbitmq has an aliveness test that is easy to hook into. I don't know exactly what it does, other than what the doc says, but I should not have to. If I want my standard monitoring system to call into a cloud and ask is nova healthy?, is glance healthy?, etc. are their such calls? Regarding RabbitMQ aliveness test, it has its own limits (more on that latter, I've got an interesting RabbitMQ outage that I'm going to discuss in a new thread) and it doesn't replicate exactly what the clients (eg OpenStack services) are doing. I'm sure it has limits but my point was that the developers of rabbitmq understood that it would be difficult for users to know exactly what should be poked at inside to check health, so they provide a call to do it. Regarding the service checks, there are already plenty of scripts that exist for Nagios, Collectd and so on. Some of them are listed in the Wiki [1]. I understand and that is what I meant by after-market. If some one puts a new feature in service X, that requires some monitoring to be healthy, then all those different scripts need to chase after it to keep up to date. Poking at service internals to check the health of a service is an abstraction violation. As some one on this thread said, tempest/rally can be used to check a certain kind of health but it is akin to black-box testing whereas health monitoring should be more akin to whitebox-testing. There are various sets of calls associated with nagios, zabbix, etc. but those seem like after-market parts for a car. Seems to me the services themselves would know best how to check if they are healthy, particularly as that could change version to version. Has their been discussion of adding a health-check (admin) api in each service? Lacking that, is there documentation from any OpenStack projects about how to check the health of nova? When I saw this thread start, that is what I thought it was going to be about. Starting with Kilo, you could configure your OpenStack API services with the healthcheck middleware [2]. This has been inspired by what Swift's been doing for some time now [3].IIUC the default healthcheck is minimalist and doesn't check that dependent services (like RabbitMQ, database) are healthy but the framework is extensible and more healthchecks can be added. I can see that but the real value would be in abstracting the details of what it means for a service to be healthy inside the implementation and exporting an api. If that were present, the question of whether calling it used middleware or not would be secondary. I'm not sure what the value-add of middleware would be in this case. -David -David BR, Simon [1] https://wiki.openstack.org/wiki/Operations/Tools#Monitoring_and_Trending [2] http://docs.openstack.org/developer/oslo.middleware/api.html#oslo_middleware.Healthcheck [3] http://docs.openstack.org/kilo/config-reference/content/object-storage-healthcheck.html - OpenStack projects like Rally or Tempest can generate synthetic loads and run end-to-end tests. Integrating them with a monitoring system isn't terribly difficult either. As far as Monitoring-as-a-service is concerned, do you have plans to integrate/leverage Ceilometer? BR, Simon On Tue, May 12, 2015 at 7:20 PM, Vinod Pandarinathan (vpandari) vpand...@cisco.com mailto:vpand...@cisco.com wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting,
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
On 5/12/15, 2:43 PM, Richard Raseley rich...@raseley.com wrote: On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements are in an operational state. For information about blueprints check out: https://blueprints.launchpad.net/cloudpulse https://blueprints.launchpad.net/python-cloudpulseclient For more details, check out our Wiki: https://wiki.openstack.org/wiki/Cloudpulse Plase join the CloudPulse team in designing and implementing a world-class Carrier Grade system for checking the health of OpenStack clouds. We look forward to seeing you on IRC on #openstack-cloudpulse. Regards, Vinod Pandarinathan [1] https://github.com/openstack-dev/cookiecutter As others have expressed - I am a little skeptical about the need to 'reinvent the wheel' with regards to monitoring. Are there a well-defined set of business or user requirements which would be enabled by CloudPulse which are not enabled by existing solutions? I am just trying to better wrap my need around the problem... The solution is for health-checking, which includes periodically running light/mid/heavy Control and data plane tests and provide test data. The tool shall not have any dependency on one particular monitoring tool If monitoring tool is installed, then monitoring data shall be exposed to the applications in a consumable fashion. As I mentioned earlier, we are not replacing any monitoring solution available out there we are leveraging those solutions and provide a clean interface so that the application/tenants and Operators know if the cloud is healthy. Thanks Vinod. Regards, Richard __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
Great Idea. I have seen this request coming from multiple folks. Especially to detect key events through Zaqar and pass it on to the application, which can then take app specific action. Thanks Vinod. From: Fox, Kevin M kevin@pnnl.govmailto:kevin@pnnl.gov Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Date: Tuesday, May 12, 2015 at 12:51 PM To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments Hooking it into Zaqar would be awesome too. Once you can trigger Mistral workflows based on Zaqar messages, just imagine the possibilities... Kevin From: Steven Dake (stdake) Sent: Tuesday, May 12, 2015 12:02:59 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments Kevin, This is a great idea that would make a solid extension to the software. If I read the wiki page correctly, the real goal is for operators and tenants to be able to be notified via querying the ReST API so they could write their own email/pager-duty app. Regards -steve On 5/12/15, 11:16 AM, Fox, Kevin M kevin@pnnl.govmailto:kevin@pnnl.gov wrote: Nagios/watever As A Service would actually be very useful I think. Setting up a monitoring server is a fair amount of work. If Cloud Apps downloaded from an OpenStack Catalog had a Monitoring Heat resource built in, that would register the launched app with a multitenant aware Cloud Monitoring Service, the user would only have to launch an app, and then go into the Dashboard and associate some kind of alerting policy with the registered checks. Say, email this address when things break. That would be awesome. :) Thanks, Kevin From: Jay Pipes [jaypi...@gmail.commailto:jaypi...@gmail.com] Sent: Tuesday, May 12, 2015 10:48 AM To: openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments For operators: * Nagios * Icinga * Zabbix installed on baremetal machines deployed with the OpenStack and other infrastructure services. For tenants: * Nagios * Icinga * Zabbix installed on their VMs. Why are we re-inventing excellent open-source implementations of monitoring systems that have been around for over a decade? Best, -jay p.s. Sorry for top-posting. On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
Hi Simon, Thanks for your feedback. Please see inline. From: Simon Pasquier spasqu...@mirantis.commailto:spasqu...@mirantis.com Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Date: Wednesday, May 13, 2015 at 6:06 AM To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments Hello, Like many others commented before, I don't quite understand how unique are the Cloudpulse use cases. For operators, I got the feeling that existing solutions fit well: - Traditional monitoring tools (Nagios, Zabbix, ) are necessary anyway for infrastructure monitoring (CPU, RAM, disks, operating system, RabbitMQ, databases and more) and diagnostic purposes. Adding OpenStack service checks is fairly easy if you already have the toolchain. The solution is for health-checking, which includes periodically running light/mid/heavy Control and data plane tests and provide test data. The tool shall not have any dependency on one particular monitoring tool If monitoring tool is installed, then monitoring data shall be exposed to the applications in a consumable fashion. As I mentioned earlier, we are not replacing any monitoring solution available out there we are leveraging those solutions and provide a clean interface so that the application/tenants and Operators know if the cloud is healthy. - OpenStack projects like Rally or Tempest can generate synthetic loads and run end-to-end tests. Integrating them with a monitoring system isn't terribly difficult either. You put it well, right now the ask is simple and there is no solution that has integrated control/dataplane tests and made it flexible and configurable from both application and tenant perspective. We will levarage any of these existing tests as part of our comprehensive tests, which can be run by the operator on periodic basis with long intervals. At this point these tests cannot be run in short intervals since they are heavy weight, and several times the tests leave several orphan resources that needs manual cleanup. As far as Monitoring-as-a-service is concerned, do you have plans to integrate/leverage Ceilometer? Yes, that will be exposed as an extension, when some application/operator needs the data. Thanks Vinod. BR, Simon On Tue, May 12, 2015 at 7:20 PM, Vinod Pandarinathan (vpandari) vpand...@cisco.commailto:vpand...@cisco.com wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven Dake and Vinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements are in an operational state. For information about blueprints check out: https://blueprints.launchpad.net/cloudpulse https://blueprints.launchpad.net/python-cloudpulseclient
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
How is the different/same as Monasca? Regards. -- Deklan Dieterly Hewlett-Packard Company Sr. Systems Software Engineer HP Cloud On 5/12/15, 11:48 AM, Jay Pipes jaypi...@gmail.com wrote: For operators: * Nagios * Icinga * Zabbix installed on baremetal machines deployed with the OpenStack and other infrastructure services. For tenants: * Nagios * Icinga * Zabbix installed on their VMs. Why are we re-inventing excellent open-source implementations of monitoring systems that have been around for over a decade? Best, -jay p.s. Sorry for top-posting. On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements are in an operational state. For information about blueprints check out: https://blueprints.launchpad.net/cloudpulse https://blueprints.launchpad.net/python-cloudpulseclient For more details, check out our Wiki: https://wiki.openstack.org/wiki/Cloudpulse Plase join the CloudPulse team in designing and implementing a world-class Carrier Grade system for checking the health of OpenStack clouds. We look forward to seeing you on IRC on #openstack-cloudpulse. Regards, Vinod Pandarinathan [1] https://github.com/openstack-dev/cookiecutter _ _ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
Jay Pipes jaypi...@gmail.com writes: On 05/12/2015 02:16 PM, Fox, Kevin M wrote: Nagios/watever As A Service would actually be very useful I think. Frankly, so do tenants. Tenants install software on their images using configuration management tools like mentioned above... I don't see a reason to have Nagios-as-a-Service for tenants either. for the same use cases as how a teanant would want use the already established cloudwatch/RAX monitoring on private OpenStack. They would just want to do a simple REST call to monitor their server : curl -X POST http://api/ -d 'monitor my server/port/etc/' and not having to install/configure/setup a Nagios or whatever monitoring server Chmouel __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven Dake and Vinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements are in an operational state. For information about blueprints check out: https://blueprints.launchpad.net/cloudpulse https://blueprints.launchpad.net/python-cloudpulseclient For more details, check out our Wiki: https://wiki.openstack.org/wiki/Cloudpulse Plase join the CloudPulse team in designing and implementing a world-class Carrier Grade system for checking the health of OpenStack clouds. We look forward to seeing you on IRC on #openstack-cloudpulse. Regards, Vinod Pandarinathan [1] https://github.com/openstack-dev/cookiecutter __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
Nagios/watever As A Service would actually be very useful I think. Setting up a monitoring server is a fair amount of work. If Cloud Apps downloaded from an OpenStack Catalog had a Monitoring Heat resource built in, that would register the launched app with a multitenant aware Cloud Monitoring Service, the user would only have to launch an app, and then go into the Dashboard and associate some kind of alerting policy with the registered checks. Say, email this address when things break. That would be awesome. :) Thanks, Kevin From: Jay Pipes [jaypi...@gmail.com] Sent: Tuesday, May 12, 2015 10:48 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments For operators: * Nagios * Icinga * Zabbix installed on baremetal machines deployed with the OpenStack and other infrastructure services. For tenants: * Nagios * Icinga * Zabbix installed on their VMs. Why are we re-inventing excellent open-source implementations of monitoring systems that have been around for over a decade? Best, -jay p.s. Sorry for top-posting. On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements are in an operational state. For information about blueprints check out: https://blueprints.launchpad.net/cloudpulse https://blueprints.launchpad.net/python-cloudpulseclient For more details, check out our Wiki: https://wiki.openstack.org/wiki/Cloudpulse Plase join the CloudPulse team in designing and implementing a world-class Carrier Grade system for checking the health of OpenStack clouds. We look forward to seeing you on IRC on #openstack-cloudpulse. Regards, Vinod Pandarinathan [1] https://github.com/openstack-dev/cookiecutter __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
For operators: * Nagios * Icinga * Zabbix installed on baremetal machines deployed with the OpenStack and other infrastructure services. For tenants: * Nagios * Icinga * Zabbix installed on their VMs. Why are we re-inventing excellent open-source implementations of monitoring systems that have been around for over a decade? Best, -jay p.s. Sorry for top-posting. On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements are in an operational state. For information about blueprints check out: https://blueprints.launchpad.net/cloudpulse https://blueprints.launchpad.net/python-cloudpulseclient For more details, check out our Wiki: https://wiki.openstack.org/wiki/Cloudpulse Plase join the CloudPulse team in designing and implementing a world-class Carrier Grade system for checking the health of OpenStack clouds. We look forward to seeing you on IRC on #openstack-cloudpulse. Regards, Vinod Pandarinathan [1] https://github.com/openstack-dev/cookiecutter __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
On 05/12/15 20:48, Jay Pipes wrote: For operators: * Nagios * Icinga * Zabbix installed on baremetal machines deployed with the OpenStack and other infrastructure services. For tenants: * Nagios * Icinga * Zabbix installed on their VMs. Why are we re-inventing excellent open-source implementations of monitoring systems that have been around for over a decade? Because that is what we love to do here in the OpenStack Community?? (Sorry I could not resist... :) ) But seriously though - do we have a set of tools that can do this - in a simple - consolidated way? Best, -jay p.s. Sorry for top-posting. On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements are in an operational state. For information about blueprints check out: https://blueprints.launchpad.net/cloudpulse https://blueprints.launchpad.net/python-cloudpulseclient For more details, check out our Wiki: https://wiki.openstack.org/wiki/Cloudpulse Plase join the CloudPulse team in designing and implementing a world-class Carrier Grade system for checking the health of OpenStack clouds. We look forward to seeing you on IRC on #openstack-cloudpulse. Regards, Vinod Pandarinathan [1] https://github.com/openstack-dev/cookiecutter -- Best Regards, Maish Saidel-Keesing __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
On 05/12/2015 02:16 PM, Fox, Kevin M wrote: Nagios/watever As A Service would actually be very useful I think. I don't really understand why Nagios-as-a-Service would be useful to operators. I mean, operators install their monitoring system of choice via their configuration management tool of choice -- Ansible, SaltStack, Puppet, Chef, etc. Frankly, so do tenants. Tenants install software on their images using configuration management tools like mentioned above... I don't see a reason to have Nagios-as-a-Service for tenants either. Setting up a monitoring server is a fair amount of work. Not really. It's typically a simple apt-get install nagios-nrpe-plugins on client VMs along with an apt-get install nagios-server on one or more monitoring system VMs. Again, have configuration management systems inject whatever check scripts you want paired with the ones that already come with nagios-nrpe-plugins package. If Cloud Apps downloaded from an OpenStack Catalog had a Monitoring Heat resource built in, that would register the launched app with a multitenant aware Cloud Monitoring Service, the user would only have to launch an app, and then go into the Dashboard and associate some kind of alerting policy with the registered checks. Say, email this address when things break. That would be awesome. :) I guess I just don't see this being in the realm of OpenStack. Or at least, not more than something like a Murano application manifest which is almost what you are describing above. I don't see the need for this service, sorry. Not everything needs to be re-invented as a RESTful Python service endpoint... Best, -jay Thanks, Kevin From: Jay Pipes [jaypi...@gmail.com] Sent: Tuesday, May 12, 2015 10:48 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments For operators: * Nagios * Icinga * Zabbix installed on baremetal machines deployed with the OpenStack and other infrastructure services. For tenants: * Nagios * Icinga * Zabbix installed on their VMs. Why are we re-inventing excellent open-source implementations of monitoring systems that have been around for over a decade? Best, -jay p.s. Sorry for top-posting. On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements are in an operational state. For information about blueprints check out: https://blueprints.launchpad.net/cloudpulse https://blueprints.launchpad.net/python-cloudpulseclient For more details, check out our Wiki: https://wiki.openstack.org/wiki/Cloudpulse Plase join the CloudPulse team in designing and implementing a world-class Carrier Grade system for checking the health of OpenStack clouds. We look forward to seeing you on IRC on #openstack-cloudpulse
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
Very True. However the way I see these are extensions/plugins to cloudpulse framework, so when these are available, the data from these tools are exposed. Openstack health service provides an overall framework with out assumptions on what is installed on the underlying cloud. The service is expected to run on existing cloud deployments that may or may not have any of this software (from tenant as well). Core health checks for operators and tenants test basic openstack services which are present in any openstack cloud. Thanks for the feedback. Thanks Vinod. On 5/12/15, 10:48 AM, Jay Pipes jaypi...@gmail.com wrote: For operators: * Nagios * Icinga * Zabbix installed on baremetal machines deployed with the OpenStack and other infrastructure services. For tenants: * Nagios * Icinga * Zabbix installed on their VMs. Why are we re-inventing excellent open-source implementations of monitoring systems that have been around for over a decade? Best, -jay p.s. Sorry for top-posting. On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements are in an operational state. For information about blueprints check out: https://blueprints.launchpad.net/cloudpulse https://blueprints.launchpad.net/python-cloudpulseclient For more details, check out our Wiki: https://wiki.openstack.org/wiki/Cloudpulse Plase join the CloudPulse team in designing and implementing a world-class Carrier Grade system for checking the health of OpenStack clouds. We look forward to seeing you on IRC on #openstack-cloudpulse. Regards, Vinod Pandarinathan [1] https://github.com/openstack-dev/cookiecutter _ _ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
On 05/12/2015 02:24 PM, Vinod Pandarinathan (vpandari) wrote: Very True. However the way I see these are extensions/plugins to cloudpulse framework, so when these are available, the data from these tools are exposed. Openstack health service provides an overall framework with out assumptions on what is installed on the underlying cloud. The service is expected to run on existing cloud deployments that may or may not have any of this software (from tenant as well). You mean, like Monasca? https://wiki.openstack.org/wiki/Monasca Sounds to me like you will at the very least need an agent of some sort on the VMs to communicate to an external system. And, that is the monasca-agent: https://github.com/stackforge/monasca-agent ala Nagios NRPE agent: http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf ala Zabbix agent: https://www.zabbix.com/documentation/2.0/manual/concepts/agent ala Icinga agent: http://docs.icinga.org/latest/en/nrpe.html So, cloudpulse would be yet another agent for sending healthcheck messages to an external system, in order for the framework not to make any assumptions on what is insyalled in the underlying cloud -- other than the assumption you'd need yet another agent installed. Core health checks for operators and tenants test basic openstack services which are present in any openstack cloud. Operators != tenants. Trying to make the two equal each other and you end up with Ceilometer and Triple-O -- with all the accompanying complexity therein. Best, -jay Thanks for the feedback. Thanks Vinod. On 5/12/15, 10:48 AM, Jay Pipes jaypi...@gmail.com wrote: For operators: * Nagios * Icinga * Zabbix installed on baremetal machines deployed with the OpenStack and other infrastructure services. For tenants: * Nagios * Icinga * Zabbix installed on their VMs. Why are we re-inventing excellent open-source implementations of monitoring systems that have been around for over a decade? Best, -jay p.s. Sorry for top-posting. On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements are in an operational state. For information about blueprints check out: https://blueprints.launchpad.net/cloudpulse https://blueprints.launchpad.net/python-cloudpulseclient For more details, check out our Wiki: https://wiki.openstack.org/wiki/Cloudpulse Plase join the CloudPulse team in designing and implementing a world-class Carrier Grade system for checking the health of OpenStack clouds. We look forward to seeing you on IRC on #openstack-cloudpulse. Regards, Vinod Pandarinathan [1] https://github.com/openstack-dev/cookiecutter _ _ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
Hooking it into Zaqar would be awesome too. Once you can trigger Mistral workflows based on Zaqar messages, just imagine the possibilities... Kevin From: Steven Dake (stdake) Sent: Tuesday, May 12, 2015 12:02:59 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments Kevin, This is a great idea that would make a solid extension to the software. If I read the wiki page correctly, the real goal is for operators and tenants to be able to be notified via querying the ReST API so they could write their own email/pager-duty app. Regards -steve On 5/12/15, 11:16 AM, Fox, Kevin M kevin@pnnl.gov wrote: Nagios/watever As A Service would actually be very useful I think. Setting up a monitoring server is a fair amount of work. If Cloud Apps downloaded from an OpenStack Catalog had a Monitoring Heat resource built in, that would register the launched app with a multitenant aware Cloud Monitoring Service, the user would only have to launch an app, and then go into the Dashboard and associate some kind of alerting policy with the registered checks. Say, email this address when things break. That would be awesome. :) Thanks, Kevin From: Jay Pipes [jaypi...@gmail.com] Sent: Tuesday, May 12, 2015 10:48 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments For operators: * Nagios * Icinga * Zabbix installed on baremetal machines deployed with the OpenStack and other infrastructure services. For tenants: * Nagios * Icinga * Zabbix installed on their VMs. Why are we re-inventing excellent open-source implementations of monitoring systems that have been around for over a decade? Best, -jay p.s. Sorry for top-posting. On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements are in an operational state. For information about blueprints check out: https://blueprints.launchpad.net/cloudpulse https://blueprints.launchpad.net/python-cloudpulseclient For more details, check out our Wiki: https://wiki.openstack.org/wiki/Cloudpulse Plase join the CloudPulse team in designing and implementing a world-class Carrier Grade system for checking the health of OpenStack clouds. We look forward to seeing you on IRC on #openstack-cloudpulse. Regards, Vinod Pandarinathan [1] https://github.com/openstack-dev/cookiecutter _ _ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
Kevin, This is a great idea that would make a solid extension to the software. If I read the wiki page correctly, the real goal is for operators and tenants to be able to be notified via querying the ReST API so they could write their own email/pager-duty app. Regards -steve On 5/12/15, 11:16 AM, Fox, Kevin M kevin@pnnl.gov wrote: Nagios/watever As A Service would actually be very useful I think. Setting up a monitoring server is a fair amount of work. If Cloud Apps downloaded from an OpenStack Catalog had a Monitoring Heat resource built in, that would register the launched app with a multitenant aware Cloud Monitoring Service, the user would only have to launch an app, and then go into the Dashboard and associate some kind of alerting policy with the registered checks. Say, email this address when things break. That would be awesome. :) Thanks, Kevin From: Jay Pipes [jaypi...@gmail.com] Sent: Tuesday, May 12, 2015 10:48 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments For operators: * Nagios * Icinga * Zabbix installed on baremetal machines deployed with the OpenStack and other infrastructure services. For tenants: * Nagios * Icinga * Zabbix installed on their VMs. Why are we re-inventing excellent open-source implementations of monitoring systems that have been around for over a decade? Best, -jay p.s. Sorry for top-posting. On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements are in an operational state. For information about blueprints check out: https://blueprints.launchpad.net/cloudpulse https://blueprints.launchpad.net/python-cloudpulseclient For more details, check out our Wiki: https://wiki.openstack.org/wiki/Cloudpulse Plase join the CloudPulse team in designing and implementing a world-class Carrier Grade system for checking the health of OpenStack clouds. We look forward to seeing you on IRC on #openstack-cloudpulse. Regards, Vinod Pandarinathan [1] https://github.com/openstack-dev/cookiecutter _ _ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally OpenStack is healthy when all infrastructure cluster elements are in an operational state. For information about blueprints check out: https://blueprints.launchpad.net/cloudpulse https://blueprints.launchpad.net/python-cloudpulseclient For more details, check out our Wiki: https://wiki.openstack.org/wiki/Cloudpulse Plase join the CloudPulse team in designing and implementing a world-class Carrier Grade system for checking the health of OpenStack clouds. We look forward to seeing you on IRC on #openstack-cloudpulse. Regards, Vinod Pandarinathan [1] https://github.com/openstack-dev/cookiecutter As others have expressed - I am a little skeptical about the need to 'reinvent the wheel' with regards to monitoring. Are there a well-defined set of business or user requirements which would be enabled by CloudPulse which are not enabled by existing solutions? I am just trying to better wrap my need around the problem... Regards, Richard __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
On Tue, May 12 2015, Steven Dake (stdake) wrote: This is a great idea that would make a solid extension to the software. If I read the wiki page correctly, the real goal is for operators and tenants to be able to be notified via querying the ReST API so they could write their own email/pager-duty app. Then leveraging Ceilometer polling and alarming systems could make you avoid reinventing a large portion of the wheel. -- Julien Danjou // Free Software hacker // http://julien.danjou.info signature.asc Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
It totally depends on how much experience you think a tenant user has... If we're talking about devops, they tend to have the skills to stand up a configuration management server, a monitoring server, and manage everything via config management. If tenant users are research scientists, like some of ours, its a fair amount of work to manage nagios without config management, and config management is way more effort then most researchers want to put into learning. That's where an app catalog becomes important, and something like monitoring as a service starts to become interesting Thanks, Kevin From: Jay Pipes [jaypi...@gmail.com] Sent: Tuesday, May 12, 2015 12:50 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments On 05/12/2015 02:16 PM, Fox, Kevin M wrote: Nagios/watever As A Service would actually be very useful I think. I don't really understand why Nagios-as-a-Service would be useful to operators. I mean, operators install their monitoring system of choice via their configuration management tool of choice -- Ansible, SaltStack, Puppet, Chef, etc. Frankly, so do tenants. Tenants install software on their images using configuration management tools like mentioned above... I don't see a reason to have Nagios-as-a-Service for tenants either. Setting up a monitoring server is a fair amount of work. Not really. It's typically a simple apt-get install nagios-nrpe-plugins on client VMs along with an apt-get install nagios-server on one or more monitoring system VMs. Again, have configuration management systems inject whatever check scripts you want paired with the ones that already come with nagios-nrpe-plugins package. If Cloud Apps downloaded from an OpenStack Catalog had a Monitoring Heat resource built in, that would register the launched app with a multitenant aware Cloud Monitoring Service, the user would only have to launch an app, and then go into the Dashboard and associate some kind of alerting policy with the registered checks. Say, email this address when things break. That would be awesome. :) I guess I just don't see this being in the realm of OpenStack. Or at least, not more than something like a Murano application manifest which is almost what you are describing above. I don't see the need for this service, sorry. Not everything needs to be re-invented as a RESTful Python service endpoint... Best, -jay Thanks, Kevin From: Jay Pipes [jaypi...@gmail.com] Sent: Tuesday, May 12, 2015 10:48 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments For operators: * Nagios * Icinga * Zabbix installed on baremetal machines deployed with the OpenStack and other infrastructure services. For tenants: * Nagios * Icinga * Zabbix installed on their VMs. Why are we re-inventing excellent open-source implementations of monitoring systems that have been around for over a decade? Best, -jay p.s. Sorry for top-posting. On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
Here is the way how we do VM level monitoring in application catalog. There is an application Nagios which will deploy a Nagios VM to the user tenant. And this Nagios application exposes abstracted monitoring app interface to add probes and checks. Another application, Ceilometer Alarm also allows you to use the same monitoring interface to add check for a VM. Demo is here: https://www.youtube.com/watch?v=OvPpJd0EOFw As usual Heat is used under the hood for infrastructure level management. You can add other monitoring apps like Zabbix ( https://github.com/openstack/murano-apps/tree/master/ZabbixAgent/package, https://github.com/openstack/murano-apps/tree/master/ZabbixServer/package) Thanks Gosha On Tue, May 12, 2015 at 1:31 PM, Fox, Kevin M kevin@pnnl.gov wrote: It totally depends on how much experience you think a tenant user has... If we're talking about devops, they tend to have the skills to stand up a configuration management server, a monitoring server, and manage everything via config management. If tenant users are research scientists, like some of ours, its a fair amount of work to manage nagios without config management, and config management is way more effort then most researchers want to put into learning. That's where an app catalog becomes important, and something like monitoring as a service starts to become interesting Thanks, Kevin From: Jay Pipes [jaypi...@gmail.com] Sent: Tuesday, May 12, 2015 12:50 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments On 05/12/2015 02:16 PM, Fox, Kevin M wrote: Nagios/watever As A Service would actually be very useful I think. I don't really understand why Nagios-as-a-Service would be useful to operators. I mean, operators install their monitoring system of choice via their configuration management tool of choice -- Ansible, SaltStack, Puppet, Chef, etc. Frankly, so do tenants. Tenants install software on their images using configuration management tools like mentioned above... I don't see a reason to have Nagios-as-a-Service for tenants either. Setting up a monitoring server is a fair amount of work. Not really. It's typically a simple apt-get install nagios-nrpe-plugins on client VMs along with an apt-get install nagios-server on one or more monitoring system VMs. Again, have configuration management systems inject whatever check scripts you want paired with the ones that already come with nagios-nrpe-plugins package. If Cloud Apps downloaded from an OpenStack Catalog had a Monitoring Heat resource built in, that would register the launched app with a multitenant aware Cloud Monitoring Service, the user would only have to launch an app, and then go into the Dashboard and associate some kind of alerting policy with the registered checks. Say, email this address when things break. That would be awesome. :) I guess I just don't see this being in the realm of OpenStack. Or at least, not more than something like a Murano application manifest which is almost what you are describing above. I don't see the need for this service, sorry. Not everything needs to be re-invented as a RESTful Python service endpoint... Best, -jay Thanks, Kevin From: Jay Pipes [jaypi...@gmail.com] Sent: Tuesday, May 12, 2015 10:48 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments For operators: * Nagios * Icinga * Zabbix installed on baremetal machines deployed with the OpenStack and other infrastructure services. For tenants: * Nagios * Icinga * Zabbix installed on their VMs. Why are we re-inventing excellent open-source implementations of monitoring systems that have been around for over a decade? Best, -jay p.s. Sorry for top-posting. On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
There are several differences: 1. Cloudpulse does not need any agent or special software installed on the underlying cloud, the service can be installed on a tenant VM having access to API network. 2. Cloudpulse can be configured for running specific test groups periodically exercising core open stack services. 3. Monasca is interesting, cloudpulse differs in the sense that it provides pluggable extensions/api-tests (manipulate resources Vms/networks/etc), flexible enough for the operator or the application/tenant configure the time-interval and test-group that has to run. 4. In addition Cloudpulse uses Openstack infra components without complexity of kafka/zookeeper/spark etc. Thanks Vinod. On 5/12/15, 12:59 PM, Jay Pipes jaypi...@gmail.com wrote: On 05/12/2015 02:24 PM, Vinod Pandarinathan (vpandari) wrote: Very True. However the way I see these are extensions/plugins to cloudpulse framework, so when these are available, the data from these tools are exposed. Openstack health service provides an overall framework with out assumptions on what is installed on the underlying cloud. The service is expected to run on existing cloud deployments that may or may not have any of this software (from tenant as well). You mean, like Monasca? https://wiki.openstack.org/wiki/Monasca Sounds to me like you will at the very least need an agent of some sort on the VMs to communicate to an external system. And, that is the monasca-agent: https://github.com/stackforge/monasca-agent ala Nagios NRPE agent: http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf ala Zabbix agent: https://www.zabbix.com/documentation/2.0/manual/concepts/agent ala Icinga agent: http://docs.icinga.org/latest/en/nrpe.html So, cloudpulse would be yet another agent for sending healthcheck messages to an external system, in order for the framework not to make any assumptions on what is insyalled in the underlying cloud -- other than the assumption you'd need yet another agent installed. Core health checks for operators and tenants test basic openstack services which are present in any openstack cloud. Operators != tenants. Trying to make the two equal each other and you end up with Ceilometer and Triple-O -- with all the accompanying complexity therein. Best, -jay Thanks for the feedback. Thanks Vinod. On 5/12/15, 10:48 AM, Jay Pipes jaypi...@gmail.com wrote: For operators: * Nagios * Icinga * Zabbix installed on baremetal machines deployed with the OpenStack and other infrastructure services. For tenants: * Nagios * Icinga * Zabbix installed on their VMs. Why are we re-inventing excellent open-source implementations of monitoring systems that have been around for over a decade? Best, -jay p.s. Sorry for top-posting. On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This project intends to identify when those failures occur so corrective actions may be taken by operators, tenants, and the applications themselves. OpenStack is considered healthy when OpenStack API services respond appropriately. Further OpenStack is healthy when network traffic can be sent between the tenant networks and can access the Internet. Finally
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
Cool, but using nagios or the like to trigger app level actions is not what I'm primarily interested in. Mostly the reverse. Its for the app definition to provide the information nessisary for a monitoring system to report to the user when something is very wrong and needs intervention. For example, the website is unresponsive because the backend database server in the demo goes offline, or the maximum number of servers is already been reached and the site responsiveness is bad due to excessive load. Murano doesn't do that currently, does it? Thanks, Kevin From: Georgy Okrokvertskhov [gokrokvertsk...@mirantis.com] Sent: Tuesday, May 12, 2015 2:04 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments Here is the way how we do VM level monitoring in application catalog. There is an application Nagios which will deploy a Nagios VM to the user tenant. And this Nagios application exposes abstracted monitoring app interface to add probes and checks. Another application, Ceilometer Alarm also allows you to use the same monitoring interface to add check for a VM. Demo is here: https://www.youtube.com/watch?v=OvPpJd0EOFw As usual Heat is used under the hood for infrastructure level management. You can add other monitoring apps like Zabbix (https://github.com/openstack/murano-apps/tree/master/ZabbixAgent/package, https://github.com/openstack/murano-apps/tree/master/ZabbixServer/package) Thanks Gosha On Tue, May 12, 2015 at 1:31 PM, Fox, Kevin M kevin@pnnl.govmailto:kevin@pnnl.gov wrote: It totally depends on how much experience you think a tenant user has... If we're talking about devops, they tend to have the skills to stand up a configuration management server, a monitoring server, and manage everything via config management. If tenant users are research scientists, like some of ours, its a fair amount of work to manage nagios without config management, and config management is way more effort then most researchers want to put into learning. That's where an app catalog becomes important, and something like monitoring as a service starts to become interesting Thanks, Kevin From: Jay Pipes [jaypi...@gmail.commailto:jaypi...@gmail.com] Sent: Tuesday, May 12, 2015 12:50 PM To: openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments On 05/12/2015 02:16 PM, Fox, Kevin M wrote: Nagios/watever As A Service would actually be very useful I think. I don't really understand why Nagios-as-a-Service would be useful to operators. I mean, operators install their monitoring system of choice via their configuration management tool of choice -- Ansible, SaltStack, Puppet, Chef, etc. Frankly, so do tenants. Tenants install software on their images using configuration management tools like mentioned above... I don't see a reason to have Nagios-as-a-Service for tenants either. Setting up a monitoring server is a fair amount of work. Not really. It's typically a simple apt-get install nagios-nrpe-plugins on client VMs along with an apt-get install nagios-server on one or more monitoring system VMs. Again, have configuration management systems inject whatever check scripts you want paired with the ones that already come with nagios-nrpe-plugins package. If Cloud Apps downloaded from an OpenStack Catalog had a Monitoring Heat resource built in, that would register the launched app with a multitenant aware Cloud Monitoring Service, the user would only have to launch an app, and then go into the Dashboard and associate some kind of alerting policy with the registered checks. Say, email this address when things break. That would be awesome. :) I guess I just don't see this being in the realm of OpenStack. Or at least, not more than something like a Murano application manifest which is almost what you are describing above. I don't see the need for this service, sorry. Not everything needs to be re-invented as a RESTful Python service endpoint... Best, -jay Thanks, Kevin From: Jay Pipes [jaypi...@gmail.commailto:jaypi...@gmail.com] Sent: Tuesday, May 12, 2015 10:48 AM To: openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments For operators: * Nagios * Icinga * Zabbix installed on baremetal machines deployed with the OpenStack and other infrastructure services. For tenants: * Nagios * Icinga * Zabbix installed on their VMs. Why are we re-inventing excellent open-source implementations of monitoring systems that have been around for over a decade? Best, -jay p.s. Sorry for top
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
Murano itself does not provide any monitoring. The idea here is to expose any application capabilities to do this. In this demo we had Java application deployed on Tomcat VM and connected to PostgreDB. Java app workflow executed Nagios application methods to register itself in Nagios monitoring system by adding proper IP, port, URL information for standard Nagios HTTP probes. Nagios itself has capabilities to send notifications to any other services like e-mail, IM or custom (Murano, Heat etc via simple bash\curl scripts). So, if you want to have monitoring for you apps, then you probbaly will need to modify Nagios application in murano to expose registration and e-mail setup for end users. Then Nagios will send notifications to user rather then to Murano. Another option is to add specific workflows actions in Murano or register Mistral workflow to react to Nagios monitoring event. The last, but not the least option for application will be a set of actions for some critical events. Application itself detects problems, like error in DB transactions, and it sends POST request to action URL. Action will call a workflow which will use monitoring application interface to trigger an event in monitoring system. The idea here is that application itself does not know beforehand which monitoring service is used, but it has a requirement to have monitoring service available with know interface implemented as Murano methods. I am not sure if I am good with explaining all this :-) Plenty of options available, but still they require some amount of work. Thanks Gosha On Tue, May 12, 2015 at 5:07 PM, Fox, Kevin M kevin@pnnl.gov wrote: Cool, but using nagios or the like to trigger app level actions is not what I'm primarily interested in. Mostly the reverse. Its for the app definition to provide the information nessisary for a monitoring system to report to the user when something is very wrong and needs intervention. For example, the website is unresponsive because the backend database server in the demo goes offline, or the maximum number of servers is already been reached and the site responsiveness is bad due to excessive load. Murano doesn't do that currently, does it? Thanks, Kevin -- *From:* Georgy Okrokvertskhov [gokrokvertsk...@mirantis.com] *Sent:* Tuesday, May 12, 2015 2:04 PM *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments Here is the way how we do VM level monitoring in application catalog. There is an application Nagios which will deploy a Nagios VM to the user tenant. And this Nagios application exposes abstracted monitoring app interface to add probes and checks. Another application, Ceilometer Alarm also allows you to use the same monitoring interface to add check for a VM. Demo is here: https://www.youtube.com/watch?v=OvPpJd0EOFw As usual Heat is used under the hood for infrastructure level management. You can add other monitoring apps like Zabbix ( https://github.com/openstack/murano-apps/tree/master/ZabbixAgent/package, https://github.com/openstack/murano-apps/tree/master/ZabbixServer/package) Thanks Gosha On Tue, May 12, 2015 at 1:31 PM, Fox, Kevin M kevin@pnnl.gov wrote: It totally depends on how much experience you think a tenant user has... If we're talking about devops, they tend to have the skills to stand up a configuration management server, a monitoring server, and manage everything via config management. If tenant users are research scientists, like some of ours, its a fair amount of work to manage nagios without config management, and config management is way more effort then most researchers want to put into learning. That's where an app catalog becomes important, and something like monitoring as a service starts to become interesting Thanks, Kevin From: Jay Pipes [jaypi...@gmail.com] Sent: Tuesday, May 12, 2015 12:50 PM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments On 05/12/2015 02:16 PM, Fox, Kevin M wrote: Nagios/watever As A Service would actually be very useful I think. I don't really understand why Nagios-as-a-Service would be useful to operators. I mean, operators install their monitoring system of choice via their configuration management tool of choice -- Ansible, SaltStack, Puppet, Chef, etc. Frankly, so do tenants. Tenants install software on their images using configuration management tools like mentioned above... I don't see a reason to have Nagios-as-a-Service for tenants either. Setting up a monitoring server is a fair amount of work. Not really. It's typically a simple apt-get install nagios-nrpe-plugins on client VMs along with an apt-get install nagios-server on one or more
Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments
On 05/12/2015 05:05 PM, Vinod Pandarinathan (vpandari) wrote: There are several differences: 1. Cloudpulse does not need any agent or special software installed on the underlying cloud, the service can be installed on a tenant VM having access to API network. 2. Cloudpulse can be configured for running specific test groups periodically exercising core open stack services. 3. Monasca is interesting, cloudpulse differs in the sense that it provides pluggable extensions/api-tests (manipulate resources Vms/networks/etc), flexible enough for the operator or the application/tenant configure the time-interval and test-group that has to run. 4. In addition Cloudpulse uses Openstack infra components without complexity of kafka/zookeeper/spark etc. Cloudpulse doesn't yet exist, so I think saying it is different from these things before it has anything to be different about is a bit premature. Again, I'd highly advise those folks involved in this effort to take a look at the existing solutions in this space and perhaps find ways to collaborate and improve on what already exists. Best, -jay Thanks Vinod. On 5/12/15, 12:59 PM, Jay Pipes jaypi...@gmail.com wrote: On 05/12/2015 02:24 PM, Vinod Pandarinathan (vpandari) wrote: Very True. However the way I see these are extensions/plugins to cloudpulse framework, so when these are available, the data from these tools are exposed. Openstack health service provides an overall framework with out assumptions on what is installed on the underlying cloud. The service is expected to run on existing cloud deployments that may or may not have any of this software (from tenant as well). You mean, like Monasca? https://wiki.openstack.org/wiki/Monasca Sounds to me like you will at the very least need an agent of some sort on the VMs to communicate to an external system. And, that is the monasca-agent: https://github.com/stackforge/monasca-agent ala Nagios NRPE agent: http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf ala Zabbix agent: https://www.zabbix.com/documentation/2.0/manual/concepts/agent ala Icinga agent: http://docs.icinga.org/latest/en/nrpe.html So, cloudpulse would be yet another agent for sending healthcheck messages to an external system, in order for the framework not to make any assumptions on what is insyalled in the underlying cloud -- other than the assumption you'd need yet another agent installed. Core health checks for operators and tenants test basic openstack services which are present in any openstack cloud. Operators != tenants. Trying to make the two equal each other and you end up with Ceilometer and Triple-O -- with all the accompanying complexity therein. Best, -jay Thanks for the feedback. Thanks Vinod. On 5/12/15, 10:48 AM, Jay Pipes jaypi...@gmail.com wrote: For operators: * Nagios * Icinga * Zabbix installed on baremetal machines deployed with the OpenStack and other infrastructure services. For tenants: * Nagios * Icinga * Zabbix installed on their VMs. Why are we re-inventing excellent open-source implementations of monitoring systems that have been around for over a decade? Best, -jay p.s. Sorry for top-posting. On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote: Hello, I'm pleased to announce the development of a new project called CloudPulse. CloudPulse provides Openstack health-checking services to both operators, tenants, and applications. This project will begin as a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are: Server: https://github.com/stackforge/cloudpulse Client: https://github.com/stackforge/python-cloudpulseclient Please join us via iRC on #openstack-cloudpulse on freenode. I am holding a doodle poll to select times for our first meeting the week after summit. This doodle poll will close May 24th and meeting times will be announced on the mailing list at that time. At our first IRC meeting, we will draft additional core team members, so if your interested in joining a fresh new development effort, please attend our first meeting. Please take a moment if your interested in CloudPulse to fill out the doodle poll here: https://doodle.com/kcpvzy8kfrxe6rvb The initial core team is composed of Ajay Kalambur, Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod Pandarinathan. I expect more members to join during our initial meeting. A little bit about CloudPulse: Cloud operators need notification of OpenStack failures before a customer reports the failure. Cloud operators can then take timely corrective actions with minimal disruption to applications. Many cloud applications, including those I am interested in (NFV) have very stringent service level agreements. Loss of service can trigger contractual costs associated with the service. Application high availability requires an operational OpenStack Cloud, and the reality is that occascionally OpenStack clouds fail in some mysterious ways. This