Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-13 Thread Simon Pasquier
On Wed, May 13, 2015 at 3:27 PM, David Kranz dkr...@redhat.com wrote:

  On 05/13/2015 09:06 AM, Simon Pasquier wrote:

   Hello,

 Like many others commented before, I don't quite understand how unique are
 the Cloudpulse use cases.

 For operators, I got the feeling that existing solutions fit well:
 - Traditional monitoring tools (Nagios, Zabbix, ) are necessary anyway
 for infrastructure monitoring (CPU, RAM, disks, operating system, RabbitMQ,
 databases and more) and diagnostic purposes. Adding OpenStack service
 checks is fairly easy if you already have the toolchain.

 Is it really so easy? Rabbitmq has an aliveness test that is easy to
 hook into. I don't know exactly what it does, other than what the doc says,
 but I should not have to. If I want my standard monitoring system to call
 into a cloud and ask is nova healthy?, is glance healthy?, etc. are
 their such calls?


Regarding RabbitMQ aliveness test, it has its own limits (more on that
latter, I've got an interesting RabbitMQ outage that I'm going to discuss
in a new thread) and it doesn't replicate exactly what the clients (eg
OpenStack services) are doing.

Regarding the service checks, there are already plenty of scripts that
exist for Nagios, Collectd and so on. Some of them are listed in the Wiki
[1].


 There are various sets of calls associated with nagios, zabbix, etc. but
 those seem like after-market parts for a car. Seems to me the services
 themselves would know best how to check if they are healthy, particularly
 as that could change version to version. Has their been discussion of
 adding a health-check (admin) api in each service? Lacking that, is there
 documentation from any OpenStack projects about how to check the health of
 nova? When I saw this thread start, that is what I thought it was going to
 be about.


Starting with Kilo, you could configure your OpenStack API services with
the healthcheck middleware [2]. This has been inspired by what Swift's been
doing for some time now [3].IIUC the default healthcheck is minimalist and
doesn't check that dependent services (like RabbitMQ, database) are healthy
but the framework is extensible and more healthchecks can be added.



  -David


BR,
Simon

[1] https://wiki.openstack.org/wiki/Operations/Tools#Monitoring_and_Trending
[2]
http://docs.openstack.org/developer/oslo.middleware/api.html#oslo_middleware.Healthcheck
[3]
http://docs.openstack.org/kilo/config-reference/content/object-storage-healthcheck.html



- OpenStack projects like Rally or Tempest can generate synthetic
 loads and run end-to-end tests. Integrating them with a monitoring system
 isn't terribly difficult either.

 As far as Monitoring-as-a-service is concerned, do you have plans to
 integrate/leverage Ceilometer?

  BR,
  Simon

 On Tue, May 12, 2015 at 7:20 PM, Vinod Pandarinathan (vpandari) 
 vpand...@cisco.com wrote:

   Hello,

I'm pleased to announce the development of a new project called
 CloudPulse.  CloudPulse provides Openstack
  health-checking services to both operators, tenants, and applications.
 This project will begin as
  a StackForge project based upon an empty cookiecutter[1] repo.  The
 repos to work in are:
  Server:   https://github.com/stackforge/cloudpulse
  Client: https://github.com/stackforge/python-cloudpulseclient

  Please join us via iRC on #openstack-cloudpulse on freenode.

  I am holding a doodle poll to select times for our first meeting the
 week after summit.  This doodle poll will close May 24th and meeting times
 will be announced on the mailing list at that time.  At our first IRC
 meeting,
  we will draft additional core team members, so if your interested in
 joining a fresh new development effort, please attend our first meeting.
  Please take a moment if your interested in CloudPulse to fill out the
 doodle poll here:

  https://doodle.com/kcpvzy8kfrxe6rvb

  The initial core team is composed of
  Ajay Kalambur,
  Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven Dake and Vinod
 Pandarinathan.
  I expect more members to join during our initial meeting.

   A little bit about CloudPulse:
   Cloud operators need notification of OpenStack failures before a
 customer reports the failure. Cloud operators can then take timely
 corrective actions with minimal disruption to applications.  Many cloud
 applications, including
  those I am interested in (NFV) have very stringent service level
 agreements.  Loss of service can trigger contractual
  costs associated with the service.  Application high availability
 requires an operational OpenStack Cloud, and the reality
  is that occascionally OpenStack clouds fail in some mysterious ways.
 This project intends to identify when those failures
  occur so corrective actions may be taken by operators, tenants, and the
 applications themselves.

  OpenStack is considered healthy when OpenStack API services respond
 appropriately.  Further OpenStack is
  healthy when network traffic can be sent between the 

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-13 Thread David Kranz

On 05/13/2015 09:06 AM, Simon Pasquier wrote:

Hello,

Like many others commented before, I don't quite understand how unique 
are the Cloudpulse use cases.


For operators, I got the feeling that existing solutions fit well:
- Traditional monitoring tools (Nagios, Zabbix, ) are necessary 
anyway for infrastructure monitoring (CPU, RAM, disks, operating 
system, RabbitMQ, databases and more) and diagnostic purposes. Adding 
OpenStack service checks is fairly easy if you already have the toolchain.
Is it really so easy? Rabbitmq has an aliveness test that is easy to 
hook into. I don't know exactly what it does, other than what the doc 
says, but I should not have to. If I want my standard monitoring system 
to call into a cloud and ask is nova healthy?, is glance healthy?, 
etc. are their such calls?


There are various sets of calls associated with nagios, zabbix, etc. but 
those seem like after-market parts for a car. Seems to me the services 
themselves would know best how to check if they are healthy, 
particularly as that could change version to version. Has their been 
discussion of adding a health-check (admin) api in each service? Lacking 
that, is there documentation from any OpenStack projects about how to 
check the health of nova? When I saw this thread start, that is what I 
thought it was going to be about.


 -David

- OpenStack projects like Rally or Tempest can generate synthetic 
loads and run end-to-end tests. Integrating them with a monitoring 
system isn't terribly difficult either.


As far as Monitoring-as-a-service is concerned, do you have plans to 
integrate/leverage Ceilometer?


BR,
Simon

On Tue, May 12, 2015 at 7:20 PM, Vinod Pandarinathan (vpandari) 
vpand...@cisco.com mailto:vpand...@cisco.com wrote:


Hello,

  I'm pleased to announce the development of a new project called
CloudPulse.  CloudPulse provides Openstack
health-checking services to both operators, tenants, and
applications. This project will begin as
a StackForge project based upon an empty cookiecutter[1] repo. 
The repos to work in are:

Server: https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting
the week after summit.  This doodle poll will close May 24th and
meeting times will be announced on the mailing list at that time.
At our first IRC meeting,
we will draft additional core team members, so if your interested
in joining a fresh new development effort, please attend our first
meeting.
Please take a moment if your interested in CloudPulse to fill out
the doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven
DakeandVinod Pandarinathan.
I expect more members to join during our initial meeting.

 A little bit about CloudPulse:
 Cloud operators need notification of OpenStack failures before a
customer reports the failure. Cloud operators can then take timely
corrective actions with minimal disruption to applications. Many
cloud applications, including
those I am interested in (NFV) have very stringent service level
agreements.  Loss of service can trigger contractual
costs associated with the service.  Application high availability
requires an operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious
ways.  This project intends to identify when those failures
occur so corrective actions may be taken by operators, tenants,
and the applications themselves.

OpenStack is considered healthy when OpenStack API services
respond appropriately.  Further OpenStack is
healthy when network traffic can be sent between the tenant
networks and can access the Internet.  Finally OpenStack
is healthy when all infrastructure cluster elements are in an
operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a
world-class Carrier Grade system for checking
the health of OpenStack clouds.  We look forward to seeing you on
IRC on #openstack-cloudpulse.

Regards,
Vinod Pandarinathan
[1] https://github.com/openstack-dev/cookiecutter


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-13 Thread Steven Dake (stdake)


From: David Kranz dkr...@redhat.commailto:dkr...@redhat.com
Reply-To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Date: Wednesday, May 13, 2015 at 6:27 AM
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to 
HealthCheck OpenStack deployments

On 05/13/2015 09:06 AM, Simon Pasquier wrote:
Hello,

Like many others commented before, I don't quite understand how unique are the 
Cloudpulse use cases.

For operators, I got the feeling that existing solutions fit well:
- Traditional monitoring tools (Nagios, Zabbix, ) are necessary anyway for 
infrastructure monitoring (CPU, RAM, disks, operating system, RabbitMQ, 
databases and more) and diagnostic purposes. Adding OpenStack service checks is 
fairly easy if you already have the toolchain.
Is it really so easy? Rabbitmq has an aliveness test that is easy to hook 
into. I don't know exactly what it does, other than what the doc says, but I 
should not have to. If I want my standard monitoring system to call into a 
cloud and ask is nova healthy?, is glance healthy?, etc. are their such 
calls?

David,

I think a healthchecking API per service is a fantastic idea.  I would like to 
see the same thing in OpenStack services.  The real way to check health of nova 
for example, is for Nova to do the job of checking it’s own health.  It knows 
its internals best and can do the job.  Maybe this project can introduce API 
calls and implementations into the major services to do such work.

Regards
-steve


There are various sets of calls associated with nagios, zabbix, etc. but those 
seem like after-market parts for a car. Seems to me the services themselves 
would know best how to check if they are healthy, particularly as that could 
change version to version. Has their been discussion of adding a health-check 
(admin) api in each service? Lacking that, is there documentation from any 
OpenStack projects about how to check the health of nova? When I saw this 
thread start, that is what I thought it was going to be about.

 -David

- OpenStack projects like Rally or Tempest can generate synthetic loads and run 
end-to-end tests. Integrating them with a monitoring system isn't terribly 
difficult either.

As far as Monitoring-as-a-service is concerned, do you have plans to 
integrate/leverage Ceilometer?

BR,
Simon

On Tue, May 12, 2015 at 7:20 PM, Vinod Pandarinathan (vpandari) 
vpand...@cisco.commailto:vpand...@cisco.com wrote:
Hello,

  I'm pleased to announce the development of a new project called CloudPulse.  
CloudPulse provides Openstack
health-checking services to both operators, tenants, and applications. This 
project will begin as
a StackForge project based upon an empty cookiecutter[1] repo.  The repos to 
work in are:
Server:   https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting the week after 
summit.  This doodle poll will close May 24th and meeting times will be 
announced on the mailing list at that time.  At our first IRC meeting,
we will draft additional core team members, so if your interested in joining a 
fresh new development effort, please attend our first meeting.
Please take a moment if your interested in CloudPulse to fill out the doodle 
poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven Dake and Vinod 
Pandarinathan.
I expect more members to join during our initial meeting.

 A little bit about CloudPulse:
 Cloud operators need notification of OpenStack failures before a customer 
reports the failure. Cloud operators can then take timely corrective actions 
with minimal disruption to applications.  Many cloud applications, including
those I am interested in (NFV) have very stringent service level agreements.  
Loss of service can trigger contractual
costs associated with the service.  Application high availability requires an 
operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious ways.  This 
project intends to identify when those failures
occur so corrective actions may be taken by operators, tenants, and the 
applications themselves.

OpenStack is considered healthy when OpenStack API services respond 
appropriately.  Further OpenStack is
healthy when network traffic can be sent between the tenant networks and can 
access the Internet.  Finally OpenStack
is healthy when all infrastructure cluster elements are in an operational state.

For information about blueprints check out:
 https://blueprints.launchpad.net/cloudpulse
https

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-13 Thread Steven Dake (stdake)


On 5/12/15, 1:28 PM, Julien Danjou jul...@danjou.info wrote:

On Tue, May 12 2015, Steven Dake (stdake) wrote:

 This is a great idea that would make a solid extension to the software.
 If I read the wiki page correctly, the real goal is for operators and
 tenants to be able to be notified via querying the ReST API so they
could
 write their own email/pager-duty app.

Then leveraging Ceilometer polling and alarming systems could make you
avoid reinventing a large portion of the wheel.

Julien,

Reading the wiki page, I don¹t expect there would be a need for an agent.
But who knows, atm, all the software is is a wiki page ;)  If there were a
need for agents, the project would definitely use the ceilometer agents
and extend there if needed via the normal development process.

Regards
-steve


-- 
Julien Danjou
// Free Software hacker
// http://julien.danjou.info


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-13 Thread Ian Wells
On 13 May 2015 at 10:30, Vinod Pandarinathan (vpandari) vpand...@cisco.com
wrote:

 - Traditional monitoring tools (Nagios, Zabbix, ) are necessary anyway
 for infrastructure monitoring (CPU, RAM, disks, operating system, RabbitMQ,
 databases and more) and diagnostic purposes. Adding OpenStack service
 checks is fairly easy if you already have the toolchain.

  The solution is for health-checking, which includes periodically running
 light/mid/heavy
 Control and data plane tests and provide test data. The tool shall not
 have any dependency on one particular monitoring tool
 If monitoring tool is installed, then monitoring data shall be exposed to
 the applications in a consumable fashion.
 As I mentioned earlier, we are not replacing any monitoring solution
 available out there we are leveraging those solutions
  and provide  a clean interface so that the application/tenants and
 Operators know if the cloud is healthy.


To rephrase this:

- Zabbix and friends will monitor an operator's cloud and tell the operator
bad things are happening.  Or they can monitor an application's VMs and see
if the app is happy, and tell the app or its owner.
- Ceilometer will front cloud monitoring solutions and offer those
statistics to tenants of the cloud in ways that (ideally) make sense to the
client.  It lets tenants see stats they couldn't get for themselves.

This isn't quite what we're trying to address.  We had one specific use
case: a cloud application that needs to provide reasonably high
availability uses the Openstack APIs occasionally to try and correct
problems (VM died, app overloaded, etc.) - a pretty normal cloud
application.  If you're interested in maintaining service, you need to know
about single points of failure to work around them, and the cloud control
plane failing is a single point of failure - the APIs stop working, and the
app runs just fine until a second failure that causes them to be used, and
if you haven't done something by that point you get a meltdown.  The idea
of CloudPulse was to be able to say 'the cloud APIs are operating normally'
to applications that are interested.  If they're *not* normal then the
application can take corrective action; for instance, spinning up extra
capacity in another cloud and moving traffic over there.

As you can see, that's a cross-domain sort of monitoring similar to
Ceilometer - the tenant finding out information about the infrastructure
that they can't see directly.  That said, it's a very concise summary
('working'), and we also had in mind that you ran the tests to freshen the
results if the tests hadn't been run recently, rather than looping them
continually.  Also, the history of the results are not really relevant - my
app cares about about whether the control plane works *now*, not if it
worked for 8 hours out of the last 24.

We're scratching an itch.  Absolutely the point of mailing everyone about
it was to see if anyone had better scratching tools, and if people would
like to chat about it at the summit.  What seems to have come out of it is
that yes, there are tools out there that might be usable for the purpose,
and we'd love to hear your opinions and what ideas you have about how we
should do this.  Apparently there are also a lot of people with slightly
different itches to scratch, and I hope you all take the opportunity to get
together at the summit too.
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-13 Thread Thierry Carrez
Mooney, Sean K wrote:
 Will cloudpulse be under the governance of the OpenStack Telemetry program
 Or will this be an independent StackFoge repository?
 
 I think there would be great value in having cloud monitoring  or monitoring 
 as a service
 In the telemetry program.

Except we don't do programs anymore, because forcing new projects to
inherit another team leadership was not the best way to foster innovation.

http://git.openstack.org/cgit/openstack/governance/commit/?id=fcc4046f7d866d0516f2810571aad0c0ce2cc361

-- 
Thierry Carrez (ttx)

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-13 Thread Mooney, Sean K
Will cloudpulse be under the governance of the OpenStack Telemetry program
Or will this be an independent StackFoge repository?

I think there would be great value in having cloud monitoring  or monitoring as 
a service
In the telemetry program.
Regards
Sean.

-Original Message-
From: Steven Dake (stdake) [mailto:std...@cisco.com] 
Sent: Wednesday, May 13, 2015 3:39 PM
To: Julien Danjou
Cc: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to 
HealthCheck OpenStack deployments



On 5/12/15, 1:28 PM, Julien Danjou jul...@danjou.info wrote:

On Tue, May 12 2015, Steven Dake (stdake) wrote:

 This is a great idea that would make a solid extension to the software.
 If I read the wiki page correctly, the real goal is for operators and  
tenants to be able to be notified via querying the ReST API so they 
could  write their own email/pager-duty app.

Then leveraging Ceilometer polling and alarming systems could make you 
avoid reinventing a large portion of the wheel.

Julien,

Reading the wiki page, I don¹t expect there would be a need for an agent.
But who knows, atm, all the software is is a wiki page ;)  If there were a need 
for agents, the project would definitely use the ceilometer agents and extend 
there if needed via the normal development process.

Regards
-steve


--
Julien Danjou
// Free Software hacker
// http://julien.danjou.info


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-13 Thread David Kranz

On 05/13/2015 09:51 AM, Simon Pasquier wrote:



On Wed, May 13, 2015 at 3:27 PM, David Kranz dkr...@redhat.com 
mailto:dkr...@redhat.com wrote:


On 05/13/2015 09:06 AM, Simon Pasquier wrote:

Hello,

Like many others commented before, I don't quite understand how
unique are the Cloudpulse use cases.

For operators, I got the feeling that existing solutions fit well:
- Traditional monitoring tools (Nagios, Zabbix, ) are
necessary anyway for infrastructure monitoring (CPU, RAM, disks,
operating system, RabbitMQ, databases and more) and diagnostic
purposes. Adding OpenStack service checks is fairly easy if you
already have the toolchain.

Is it really so easy? Rabbitmq has an aliveness test that is
easy to hook into. I don't know exactly what it does, other than
what the doc says, but I should not have to. If I want my standard
monitoring system to call into a cloud and ask is nova healthy?,
is glance healthy?, etc. are their such calls?


Regarding RabbitMQ aliveness test, it has its own limits (more on that 
latter, I've got an interesting RabbitMQ outage that I'm going to 
discuss in a new thread) and it doesn't replicate exactly what the 
clients (eg OpenStack services) are doing.
I'm sure it has limits but my point was that the developers of rabbitmq 
understood that it would be difficult for users to know exactly what 
should be poked at inside to check health, so they provide a call to do it.


Regarding the service checks, there are already plenty of scripts that 
exist for Nagios, Collectd and so on. Some of them are listed in the 
Wiki [1].
I understand and that is what I meant by after-market. If some one 
puts a  new feature in service X, that requires some monitoring to be 
healthy, then all those different scripts need to chase after it to keep 
up to date. Poking at service internals to check the health of a service 
is an abstraction violation. As some one on this thread said, 
tempest/rally can be used to check a certain kind of health but it is 
akin to black-box testing whereas health monitoring should be more akin 
to whitebox-testing.



There are various sets of calls associated with nagios, zabbix,
etc. but those seem like after-market parts for a car. Seems to
me the services themselves would know best how to check if they
are healthy, particularly as that could change version to version.
Has their been discussion of adding a health-check (admin) api in
each service? Lacking that, is there documentation from any
OpenStack projects about how to check the health of nova? When I
saw this thread start, that is what I thought it was going to be
about.


Starting with Kilo, you could configure your OpenStack API services 
with the healthcheck middleware [2]. This has been inspired by what 
Swift's been doing for some time now [3].IIUC the default healthcheck 
is minimalist and doesn't check that dependent services (like 
RabbitMQ, database) are healthy but the framework is extensible and 
more healthchecks can be added.
I can see that but the real value would be in abstracting the details of 
what it means for a service to be healthy inside the implementation and 
exporting an api. If that were present, the question of whether calling 
it used middleware or not would be secondary. I'm not sure what the 
value-add of middleware would be in this case.


 -David






 -David


BR,
Simon

[1] 
https://wiki.openstack.org/wiki/Operations/Tools#Monitoring_and_Trending
[2] 
http://docs.openstack.org/developer/oslo.middleware/api.html#oslo_middleware.Healthcheck
[3] 
http://docs.openstack.org/kilo/config-reference/content/object-storage-healthcheck.html




- OpenStack projects like Rally or Tempest can generate synthetic
loads and run end-to-end tests. Integrating them with a
monitoring system isn't terribly difficult either.

As far as Monitoring-as-a-service is concerned, do you have plans
to integrate/leverage Ceilometer?

BR,
Simon

On Tue, May 12, 2015 at 7:20 PM, Vinod Pandarinathan (vpandari)
vpand...@cisco.com mailto:vpand...@cisco.com wrote:

Hello,

  I'm pleased to announce the development of a new project
called CloudPulse.  CloudPulse provides Openstack
health-checking services to both operators, tenants, and
applications. This project will begin as
a StackForge project based upon an empty cookiecutter[1]
repo. The repos to work in are:
Server: https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first
meeting the week after summit.  This doodle poll will close
May 24th and meeting times will be announced on the mailing
list at that time.  At our first IRC meeting,

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-13 Thread Vinod Pandarinathan (vpandari)


On 5/12/15, 2:43 PM, Richard Raseley rich...@raseley.com wrote:


On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:
 Hello,

 I'm pleased to announce the development of a new project called
 CloudPulse.  CloudPulse provides Openstack health-checking services
 to both operators, tenants, and applications. This project will
 begin as a StackForge project based upon an empty cookiecutter[1]
 repo.  The repos to work in are: Server:
 https://github.com/stackforge/cloudpulse Client:
 https://github.com/stackforge/python-cloudpulseclient

 Please join us via iRC on #openstack-cloudpulse on freenode.

 I am holding a doodle poll to select times for our first meeting
 the week after summit.  This doodle poll will close May 24th and
 meeting times will be announced on the mailing list at that time.
 At our first IRC meeting, we will draft additional core team
 members, so if your interested in joining a fresh new development
 effort, please attend our first meeting. Please take a moment if
 your interested in CloudPulse to fill out the doodle poll here:

 https://doodle.com/kcpvzy8kfrxe6rvb

 The initial core team is composed of Ajay Kalambur, Behzad Dastur,
 Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
 Pandarinathan. I expect more members to join during our initial
 meeting.

 A little bit about CloudPulse: Cloud operators need notification of
 OpenStack failures before a customer reports the failure. Cloud
 operators can then take timely corrective actions with minimal
 disruption to applications.  Many cloud applications, including
 those I am interested in (NFV) have very stringent service level
 agreements.  Loss of service can trigger contractual costs
 associated with the service.  Application high availability
 requires an operational OpenStack Cloud, and the reality is that
 occascionally OpenStack clouds fail in some mysterious ways. This
 project intends to identify when those failures occur so corrective
 actions may be taken by operators, tenants, and the applications
 themselves.

 OpenStack is considered healthy when OpenStack API services
 respond appropriately.  Further OpenStack is healthy when network
 traffic can be sent between the tenant networks and can access the
 Internet.  Finally OpenStack is healthy when all infrastructure
 cluster elements are in an operational state.

 For information about blueprints check out:
 https://blueprints.launchpad.net/cloudpulse
 https://blueprints.launchpad.net/python-cloudpulseclient

 For more details, check out our Wiki:
 https://wiki.openstack.org/wiki/Cloudpulse

 Plase join the CloudPulse team in designing and implementing a
 world-class Carrier Grade system for checking the health of
 OpenStack clouds.  We look forward to seeing you on IRC on
 #openstack-cloudpulse.

 Regards, Vinod Pandarinathan [1]
 https://github.com/openstack-dev/cookiecutter

As others have expressed - I am a little skeptical about the need to
'reinvent the wheel' with regards to monitoring.

Are there a well-defined set of business or user requirements which
would be enabled by CloudPulse which are not enabled by existing
solutions? I am just trying to better wrap my need around the problem...

The solution is for health-checking, which includes periodically running
light/mid/heavy
Control and data plane tests and provide test data. The tool shall not
have any dependency on one particular monitoring tool
If monitoring tool is installed, then monitoring data shall be exposed to
the applications in a consumable fashion.
As I mentioned earlier, we are not replacing any monitoring solution
available out there we are leveraging those solutions
 and provide  a clean interface so that the application/tenants and
Operators know if the cloud is healthy.


Thanks
Vinod.


Regards,

Richard

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-13 Thread Vinod Pandarinathan (vpandari)
Great Idea.  I have seen this request coming from multiple folks.

Especially to detect key events through Zaqar and pass it on to  the 
application, which can then take app specific action.

Thanks
Vinod.

From: Fox, Kevin M kevin@pnnl.govmailto:kevin@pnnl.gov
Reply-To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Date: Tuesday, May 12, 2015 at 12:51 PM
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to 
HealthCheck OpenStack deployments

Hooking it into Zaqar would be awesome too. Once you can trigger Mistral 
workflows based on Zaqar messages, just imagine the possibilities...

Kevin


From: Steven Dake (stdake)
Sent: Tuesday, May 12, 2015 12:02:59 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to 
HealthCheck OpenStack deployments

Kevin,

This is a great idea that would make a solid extension to the software.
If I read the wiki page correctly, the real goal is for operators and
tenants to be able to be notified via querying the ReST API so they could
write their own email/pager-duty app.

Regards
-steve

On 5/12/15, 11:16 AM, Fox, Kevin M 
kevin@pnnl.govmailto:kevin@pnnl.gov wrote:

Nagios/watever As A Service would actually be very useful I think.

Setting up a monitoring server is a fair amount of work. If Cloud Apps
downloaded from an OpenStack Catalog had a Monitoring Heat resource built
in, that would register the launched app with a multitenant aware Cloud
Monitoring Service, the user would only have to launch an app, and then
go into the Dashboard and associate some kind of alerting policy with the
registered checks. Say, email this address when things break. That would
be awesome. :)

Thanks,
Kevin

From: Jay Pipes [jaypi...@gmail.commailto:jaypi...@gmail.com]
Sent: Tuesday, May 12, 2015 10:48 AM
To: openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to
HealthCheck OpenStack deployments

For operators:

* Nagios
* Icinga
* Zabbix

installed on baremetal machines deployed with the OpenStack and other
infrastructure services.

For tenants:

* Nagios
* Icinga
* Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:
 Hello,

I'm pleased to announce the development of a new project called
 CloudPulse.  CloudPulse provides Openstack
 health-checking services to both operators, tenants, and applications.
 This project will begin as
 a StackForge project based upon an empty cookiecutter[1] repo.  The
 repos to work in are:
 Server: https://github.com/stackforge/cloudpulse
 Client: https://github.com/stackforge/python-cloudpulseclient

 Please join us via iRC on #openstack-cloudpulse on freenode.

 I am holding a doodle poll to select times for our first meeting the
 week after summit.  This doodle poll will close May 24th and meeting
 times will be announced on the mailing list at that time.  At our first
 IRC meeting,
 we will draft additional core team members, so if your interested in
 joining a fresh new development effort, please attend our first meeting.
 Please take a moment if your interested in CloudPulse to fill out the
 doodle poll here:

 https://doodle.com/kcpvzy8kfrxe6rvb

 The initial core team is composed of
 Ajay Kalambur,
 Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
 Pandarinathan.
 I expect more members to join during our initial meeting.

   A little bit about CloudPulse:
   Cloud operators need notification of OpenStack failures before a
 customer reports the failure. Cloud operators can then take timely
 corrective actions with minimal disruption to applications.  Many cloud
 applications, including
 those I am interested in (NFV) have very stringent service level
 agreements.  Loss of service can trigger contractual
 costs associated with the service.  Application high availability
 requires an operational OpenStack Cloud, and the reality
 is that occascionally OpenStack clouds fail in some mysterious ways.
 This project intends to identify when those failures
 occur so corrective actions may be taken by operators, tenants, and the
 applications themselves.

 OpenStack is considered healthy when OpenStack API services respond
 appropriately.  Further OpenStack is
 healthy when network traffic can be sent between the tenant networks and
 can access the Internet.  Finally OpenStack
 is healthy when all infrastructure cluster elements

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-13 Thread Vinod Pandarinathan (vpandari)
Hi Simon,

Thanks for your feedback. Please see inline.

From: Simon Pasquier spasqu...@mirantis.commailto:spasqu...@mirantis.com
Reply-To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Date: Wednesday, May 13, 2015 at 6:06 AM
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to 
HealthCheck OpenStack deployments

Hello,

Like many others commented before, I don't quite understand how unique are the 
Cloudpulse use cases.

For operators, I got the feeling that existing solutions fit well:
- Traditional monitoring tools (Nagios, Zabbix, ) are necessary anyway for 
infrastructure monitoring (CPU, RAM, disks, operating system, RabbitMQ, 
databases and more) and diagnostic purposes. Adding OpenStack service checks is 
fairly easy if you already have the toolchain.

The solution is for health-checking, which includes periodically running 
light/mid/heavy
Control and data plane tests and provide test data. The tool shall not have any 
dependency on one particular monitoring tool
If monitoring tool is installed, then monitoring data shall be exposed to the 
applications in a consumable fashion.
As I mentioned earlier, we are not replacing any monitoring solution available 
out there we are leveraging those solutions
 and provide  a clean interface so that the application/tenants and
Operators know if the cloud is healthy.

- OpenStack projects like Rally or Tempest can generate synthetic loads and run 
end-to-end tests. Integrating them with a monitoring system isn't terribly 
difficult either.


You put it well, right now the ask is simple and there is no solution that has 
integrated control/dataplane tests and made it flexible and configurable from 
both application and tenant perspective.
We will levarage any of these existing tests as part of our comprehensive 
tests, which can be run by the operator on periodic basis with long intervals. 
At this point these tests cannot be run in short intervals since they are heavy 
weight, and several times the tests leave several orphan resources that needs 
manual cleanup.

As far as Monitoring-as-a-service is concerned, do you have plans to 
integrate/leverage Ceilometer?

Yes, that will be exposed as an extension, when some application/operator needs 
the data.

Thanks
Vinod.


BR,
Simon

On Tue, May 12, 2015 at 7:20 PM, Vinod Pandarinathan (vpandari) 
vpand...@cisco.commailto:vpand...@cisco.com wrote:
Hello,

  I'm pleased to announce the development of a new project called CloudPulse.  
CloudPulse provides Openstack
health-checking services to both operators, tenants, and applications. This 
project will begin as
a StackForge project based upon an empty cookiecutter[1] repo.  The repos to 
work in are:
Server:   https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting the week after 
summit.  This doodle poll will close May 24th and meeting times will be 
announced on the mailing list at that time.  At our first IRC meeting,
we will draft additional core team members, so if your interested in joining a 
fresh new development effort, please attend our first meeting.
Please take a moment if your interested in CloudPulse to fill out the doodle 
poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven Dake and Vinod 
Pandarinathan.
I expect more members to join during our initial meeting.

 A little bit about CloudPulse:
 Cloud operators need notification of OpenStack failures before a customer 
reports the failure. Cloud operators can then take timely corrective actions 
with minimal disruption to applications.  Many cloud applications, including
those I am interested in (NFV) have very stringent service level agreements.  
Loss of service can trigger contractual
costs associated with the service.  Application high availability requires an 
operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious ways.  This 
project intends to identify when those failures
occur so corrective actions may be taken by operators, tenants, and the 
applications themselves.

OpenStack is considered healthy when OpenStack API services respond 
appropriately.  Further OpenStack is
healthy when network traffic can be sent between the tenant networks and can 
access the Internet.  Finally OpenStack
is healthy when all infrastructure cluster elements are in an operational state.

For information about blueprints check out:
 https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-13 Thread Dieterly, Deklan
How is the different/same as Monasca?

Regards.
--
Deklan Dieterly
Hewlett-Packard Company
Sr. Systems Software Engineer
HP Cloud






On 5/12/15, 11:48 AM, Jay Pipes jaypi...@gmail.com wrote:

For operators:

* Nagios
* Icinga
* Zabbix

installed on baremetal machines deployed with the OpenStack and other
infrastructure services.

For tenants:

* Nagios
* Icinga
* Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:
 Hello,

I'm pleased to announce the development of a new project called
 CloudPulse.  CloudPulse provides Openstack
 health-checking services to both operators, tenants, and applications.
 This project will begin as
 a StackForge project based upon an empty cookiecutter[1] repo.  The
 repos to work in are:
 Server: https://github.com/stackforge/cloudpulse
 Client: https://github.com/stackforge/python-cloudpulseclient

 Please join us via iRC on #openstack-cloudpulse on freenode.

 I am holding a doodle poll to select times for our first meeting the
 week after summit.  This doodle poll will close May 24th and meeting
 times will be announced on the mailing list at that time.  At our first
 IRC meeting,
 we will draft additional core team members, so if your interested in
 joining a fresh new development effort, please attend our first meeting.
 Please take a moment if your interested in CloudPulse to fill out the
 doodle poll here:

 https://doodle.com/kcpvzy8kfrxe6rvb

 The initial core team is composed of
 Ajay Kalambur,
 Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
 Pandarinathan.
 I expect more members to join during our initial meeting.

   A little bit about CloudPulse:
   Cloud operators need notification of OpenStack failures before a
 customer reports the failure. Cloud operators can then take timely
 corrective actions with minimal disruption to applications.  Many cloud
 applications, including
 those I am interested in (NFV) have very stringent service level
 agreements.  Loss of service can trigger contractual
 costs associated with the service.  Application high availability
 requires an operational OpenStack Cloud, and the reality
 is that occascionally OpenStack clouds fail in some mysterious ways.
 This project intends to identify when those failures
 occur so corrective actions may be taken by operators, tenants, and the
 applications themselves.

 OpenStack is considered healthy when OpenStack API services respond
 appropriately.  Further OpenStack is
 healthy when network traffic can be sent between the tenant networks and
 can access the Internet.  Finally OpenStack
 is healthy when all infrastructure cluster elements are in an
 operational state.

 For information about blueprints check out:
 https://blueprints.launchpad.net/cloudpulse
 https://blueprints.launchpad.net/python-cloudpulseclient

 For more details, check out our Wiki:
 https://wiki.openstack.org/wiki/Cloudpulse

 Plase join the CloudPulse team in designing and implementing a
 world-class Carrier Grade system for checking
 the health of OpenStack clouds.  We look forward to seeing you on IRC on
 #openstack-cloudpulse.

 Regards,
 Vinod Pandarinathan
 [1] https://github.com/openstack-dev/cookiecutter



 
_
_
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-13 Thread Chmouel Boudjnah
Jay Pipes jaypi...@gmail.com writes:

 On 05/12/2015 02:16 PM, Fox, Kevin M wrote:
 Nagios/watever As A Service would actually be very useful I think.
 Frankly, so do tenants. Tenants install software on their images using
 configuration management tools like mentioned above... I don't see a reason to
 have Nagios-as-a-Service for tenants either.

for the same use cases as how a teanant would want use the already
established cloudwatch/RAX monitoring on private OpenStack. They would
just want to do a simple REST call to monitor their server :

curl -X POST http://api/ -d 'monitor my server/port/etc/'

and not having to install/configure/setup a Nagios or whatever
monitoring server

Chmouel

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Vinod Pandarinathan (vpandari)
Hello,

  I'm pleased to announce the development of a new project called CloudPulse.  
CloudPulse provides Openstack
health-checking services to both operators, tenants, and applications. This 
project will begin as
a StackForge project based upon an empty cookiecutter[1] repo.  The repos to 
work in are:
Server:   https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting the week after 
summit.  This doodle poll will close May 24th and meeting times will be 
announced on the mailing list at that time.  At our first IRC meeting,
we will draft additional core team members, so if your interested in joining a 
fresh new development effort, please attend our first meeting.
Please take a moment if your interested in CloudPulse to fill out the doodle 
poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven Dake and Vinod 
Pandarinathan.
I expect more members to join during our initial meeting.

 A little bit about CloudPulse:
 Cloud operators need notification of OpenStack failures before a customer 
reports the failure. Cloud operators can then take timely corrective actions 
with minimal disruption to applications.  Many cloud applications, including
those I am interested in (NFV) have very stringent service level agreements.  
Loss of service can trigger contractual
costs associated with the service.  Application high availability requires an 
operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious ways.  This 
project intends to identify when those failures
occur so corrective actions may be taken by operators, tenants, and the 
applications themselves.

OpenStack is considered healthy when OpenStack API services respond 
appropriately.  Further OpenStack is
healthy when network traffic can be sent between the tenant networks and can 
access the Internet.  Finally OpenStack
is healthy when all infrastructure cluster elements are in an operational state.

For information about blueprints check out:
 https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a world-class 
Carrier Grade system for checking
the health of OpenStack clouds.  We look forward to seeing you on IRC on 
#openstack-cloudpulse.

Regards,
Vinod Pandarinathan
[1] https://github.com/openstack-dev/cookiecutter

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Fox, Kevin M
Nagios/watever As A Service would actually be very useful I think.

Setting up a monitoring server is a fair amount of work. If Cloud Apps 
downloaded from an OpenStack Catalog had a Monitoring Heat resource built in, 
that would register the launched app with a multitenant aware Cloud Monitoring 
Service, the user would only have to launch an app, and then go into the 
Dashboard and associate some kind of alerting policy with the registered 
checks. Say, email this address when things break. That would be awesome. :)

Thanks,
Kevin

From: Jay Pipes [jaypi...@gmail.com]
Sent: Tuesday, May 12, 2015 10:48 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to 
HealthCheck OpenStack deployments

For operators:

* Nagios
* Icinga
* Zabbix

installed on baremetal machines deployed with the OpenStack and other
infrastructure services.

For tenants:

* Nagios
* Icinga
* Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:
 Hello,

I'm pleased to announce the development of a new project called
 CloudPulse.  CloudPulse provides Openstack
 health-checking services to both operators, tenants, and applications.
 This project will begin as
 a StackForge project based upon an empty cookiecutter[1] repo.  The
 repos to work in are:
 Server: https://github.com/stackforge/cloudpulse
 Client: https://github.com/stackforge/python-cloudpulseclient

 Please join us via iRC on #openstack-cloudpulse on freenode.

 I am holding a doodle poll to select times for our first meeting the
 week after summit.  This doodle poll will close May 24th and meeting
 times will be announced on the mailing list at that time.  At our first
 IRC meeting,
 we will draft additional core team members, so if your interested in
 joining a fresh new development effort, please attend our first meeting.
 Please take a moment if your interested in CloudPulse to fill out the
 doodle poll here:

 https://doodle.com/kcpvzy8kfrxe6rvb

 The initial core team is composed of
 Ajay Kalambur,
 Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
 Pandarinathan.
 I expect more members to join during our initial meeting.

   A little bit about CloudPulse:
   Cloud operators need notification of OpenStack failures before a
 customer reports the failure. Cloud operators can then take timely
 corrective actions with minimal disruption to applications.  Many cloud
 applications, including
 those I am interested in (NFV) have very stringent service level
 agreements.  Loss of service can trigger contractual
 costs associated with the service.  Application high availability
 requires an operational OpenStack Cloud, and the reality
 is that occascionally OpenStack clouds fail in some mysterious ways.
 This project intends to identify when those failures
 occur so corrective actions may be taken by operators, tenants, and the
 applications themselves.

 OpenStack is considered healthy when OpenStack API services respond
 appropriately.  Further OpenStack is
 healthy when network traffic can be sent between the tenant networks and
 can access the Internet.  Finally OpenStack
 is healthy when all infrastructure cluster elements are in an
 operational state.

 For information about blueprints check out:
 https://blueprints.launchpad.net/cloudpulse
 https://blueprints.launchpad.net/python-cloudpulseclient

 For more details, check out our Wiki:
 https://wiki.openstack.org/wiki/Cloudpulse

 Plase join the CloudPulse team in designing and implementing a
 world-class Carrier Grade system for checking
 the health of OpenStack clouds.  We look forward to seeing you on IRC on
 #openstack-cloudpulse.

 Regards,
 Vinod Pandarinathan
 [1] https://github.com/openstack-dev/cookiecutter



 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Jay Pipes

For operators:

* Nagios
* Icinga
* Zabbix

installed on baremetal machines deployed with the OpenStack and other 
infrastructure services.


For tenants:

* Nagios
* Icinga
* Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of 
monitoring systems that have been around for over a decade?


Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:

Hello,

   I'm pleased to announce the development of a new project called
CloudPulse.  CloudPulse provides Openstack
health-checking services to both operators, tenants, and applications.
This project will begin as
a StackForge project based upon an empty cookiecutter[1] repo.  The
repos to work in are:
Server: https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting the
week after summit.  This doodle poll will close May 24th and meeting
times will be announced on the mailing list at that time.  At our first
IRC meeting,
we will draft additional core team members, so if your interested in
joining a fresh new development effort, please attend our first meeting.
Please take a moment if your interested in CloudPulse to fill out the
doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
Pandarinathan.
I expect more members to join during our initial meeting.

  A little bit about CloudPulse:
  Cloud operators need notification of OpenStack failures before a
customer reports the failure. Cloud operators can then take timely
corrective actions with minimal disruption to applications.  Many cloud
applications, including
those I am interested in (NFV) have very stringent service level
agreements.  Loss of service can trigger contractual
costs associated with the service.  Application high availability
requires an operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious ways.
This project intends to identify when those failures
occur so corrective actions may be taken by operators, tenants, and the
applications themselves.

OpenStack is considered healthy when OpenStack API services respond
appropriately.  Further OpenStack is
healthy when network traffic can be sent between the tenant networks and
can access the Internet.  Finally OpenStack
is healthy when all infrastructure cluster elements are in an
operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a
world-class Carrier Grade system for checking
the health of OpenStack clouds.  We look forward to seeing you on IRC on
#openstack-cloudpulse.

Regards,
Vinod Pandarinathan
[1] https://github.com/openstack-dev/cookiecutter



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Maish Saidel-Keesing

On 05/12/15 20:48, Jay Pipes wrote:

For operators:

* Nagios
* Icinga
* Zabbix

installed on baremetal machines deployed with the OpenStack and other 
infrastructure services.


For tenants:

* Nagios
* Icinga
* Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of 
monitoring systems that have been around for over a decade?



Because that is what we love to do here in the OpenStack Community??
(Sorry I could not resist... :) )

But seriously though - do we have a set of tools that can do this - in a 
simple - consolidated way?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:

Hello,

   I'm pleased to announce the development of a new project called
CloudPulse.  CloudPulse provides Openstack
health-checking services to both operators, tenants, and applications.
This project will begin as
a StackForge project based upon an empty cookiecutter[1] repo. The
repos to work in are:
Server: https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting the
week after summit.  This doodle poll will close May 24th and meeting
times will be announced on the mailing list at that time.  At our first
IRC meeting,
we will draft additional core team members, so if your interested in
joining a fresh new development effort, please attend our first meeting.
Please take a moment if your interested in CloudPulse to fill out the
doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
Pandarinathan.
I expect more members to join during our initial meeting.

  A little bit about CloudPulse:
  Cloud operators need notification of OpenStack failures before a
customer reports the failure. Cloud operators can then take timely
corrective actions with minimal disruption to applications. Many cloud
applications, including
those I am interested in (NFV) have very stringent service level
agreements.  Loss of service can trigger contractual
costs associated with the service.  Application high availability
requires an operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious ways.
This project intends to identify when those failures
occur so corrective actions may be taken by operators, tenants, and the
applications themselves.

OpenStack is considered healthy when OpenStack API services respond
appropriately.  Further OpenStack is
healthy when network traffic can be sent between the tenant networks and
can access the Internet.  Finally OpenStack
is healthy when all infrastructure cluster elements are in an
operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a
world-class Carrier Grade system for checking
the health of OpenStack clouds.  We look forward to seeing you on IRC on
#openstack-cloudpulse.

Regards,
Vinod Pandarinathan
[1] https://github.com/openstack-dev/cookiecutter

--
Best Regards,
Maish Saidel-Keesing

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Jay Pipes

On 05/12/2015 02:16 PM, Fox, Kevin M wrote:

Nagios/watever As A Service would actually be very useful I think.


I don't really understand why Nagios-as-a-Service would be useful to 
operators. I mean, operators install their monitoring system of choice 
via their configuration management tool of choice -- Ansible, SaltStack, 
Puppet, Chef, etc.


Frankly, so do tenants. Tenants install software on their images using 
configuration management tools like mentioned above... I don't see a 
reason to have Nagios-as-a-Service for tenants either.



Setting up a monitoring server is a fair amount of work.


Not really. It's typically a simple apt-get install nagios-nrpe-plugins 
on client VMs along with an apt-get install nagios-server on one or more 
monitoring system VMs. Again, have configuration management systems 
inject whatever check scripts you want paired with the ones that already 
come with nagios-nrpe-plugins package.


 If Cloud

Apps downloaded from an OpenStack Catalog had a Monitoring Heat
resource built in, that would register the launched app with a
multitenant aware Cloud Monitoring Service, the user would only have
to launch an app, and then go into the Dashboard and associate some
kind of alerting policy with the registered checks. Say, email this
address when things break. That would be awesome. :)


I guess I just don't see this being in the realm of OpenStack. Or at 
least, not more than something like a Murano application manifest which 
is almost what you are describing above.


I don't see the need for this service, sorry. Not everything needs to be 
re-invented as a RESTful Python service endpoint...


Best,
-jay


Thanks, Kevin 


From: Jay Pipes [jaypi...@gmail.com] Sent: Tuesday, May 12, 2015 10:48 
AM To:

openstack-dev@lists.openstack.org Subject: Re: [openstack-dev]
[new][cloudpulse] Announcing a project to HealthCheck OpenStack
deployments

For operators:

* Nagios * Icinga * Zabbix

installed on baremetal machines deployed with the OpenStack and
other infrastructure services.

For tenants:

* Nagios * Icinga * Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best, -jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:

Hello,

I'm pleased to announce the development of a new project called
CloudPulse.  CloudPulse provides Openstack health-checking services
to both operators, tenants, and applications. This project will
begin as a StackForge project based upon an empty cookiecutter[1]
repo.  The repos to work in are: Server:
https://github.com/stackforge/cloudpulse Client:
https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting
the week after summit.  This doodle poll will close May 24th and
meeting times will be announced on the mailing list at that time.
At our first IRC meeting, we will draft additional core team
members, so if your interested in joining a fresh new development
effort, please attend our first meeting. Please take a moment if
your interested in CloudPulse to fill out the doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of Ajay Kalambur, Behzad Dastur,
Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
Pandarinathan. I expect more members to join during our initial
meeting.

A little bit about CloudPulse: Cloud operators need notification of
OpenStack failures before a customer reports the failure. Cloud
operators can then take timely corrective actions with minimal
disruption to applications.  Many cloud applications, including
those I am interested in (NFV) have very stringent service level
agreements.  Loss of service can trigger contractual costs
associated with the service.  Application high availability
requires an operational OpenStack Cloud, and the reality is that
occascionally OpenStack clouds fail in some mysterious ways. This
project intends to identify when those failures occur so corrective
actions may be taken by operators, tenants, and the applications
themselves.

OpenStack is considered healthy when OpenStack API services
respond appropriately.  Further OpenStack is healthy when network
traffic can be sent between the tenant networks and can access the
Internet.  Finally OpenStack is healthy when all infrastructure
cluster elements are in an operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a
world-class Carrier Grade system for checking the health of
OpenStack clouds.  We look forward to seeing you on IRC on
#openstack-cloudpulse

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Vinod Pandarinathan (vpandari)
Very True. However the way I see these are  extensions/plugins to
cloudpulse framework, so when these are available, the data from these
tools are exposed.

Openstack health service provides an overall framework with out
assumptions on what is installed on the underlying cloud.
The service is expected to run on existing cloud deployments that may or
may not have any of this software (from tenant as well).

Core health checks for operators and tenants test basic openstack services
which are present in any openstack cloud.

Thanks for the feedback.


Thanks
Vinod.

On 5/12/15, 10:48 AM, Jay Pipes jaypi...@gmail.com wrote:

For operators:

* Nagios
* Icinga
* Zabbix

installed on baremetal machines deployed with the OpenStack and other
infrastructure services.

For tenants:

* Nagios
* Icinga
* Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:
 Hello,

I'm pleased to announce the development of a new project called
 CloudPulse.  CloudPulse provides Openstack
 health-checking services to both operators, tenants, and applications.
 This project will begin as
 a StackForge project based upon an empty cookiecutter[1] repo.  The
 repos to work in are:
 Server: https://github.com/stackforge/cloudpulse
 Client: https://github.com/stackforge/python-cloudpulseclient

 Please join us via iRC on #openstack-cloudpulse on freenode.

 I am holding a doodle poll to select times for our first meeting the
 week after summit.  This doodle poll will close May 24th and meeting
 times will be announced on the mailing list at that time.  At our first
 IRC meeting,
 we will draft additional core team members, so if your interested in
 joining a fresh new development effort, please attend our first meeting.
 Please take a moment if your interested in CloudPulse to fill out the
 doodle poll here:

 https://doodle.com/kcpvzy8kfrxe6rvb

 The initial core team is composed of
 Ajay Kalambur,
 Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
 Pandarinathan.
 I expect more members to join during our initial meeting.

   A little bit about CloudPulse:
   Cloud operators need notification of OpenStack failures before a
 customer reports the failure. Cloud operators can then take timely
 corrective actions with minimal disruption to applications.  Many cloud
 applications, including
 those I am interested in (NFV) have very stringent service level
 agreements.  Loss of service can trigger contractual
 costs associated with the service.  Application high availability
 requires an operational OpenStack Cloud, and the reality
 is that occascionally OpenStack clouds fail in some mysterious ways.
 This project intends to identify when those failures
 occur so corrective actions may be taken by operators, tenants, and the
 applications themselves.

 OpenStack is considered healthy when OpenStack API services respond
 appropriately.  Further OpenStack is
 healthy when network traffic can be sent between the tenant networks and
 can access the Internet.  Finally OpenStack
 is healthy when all infrastructure cluster elements are in an
 operational state.

 For information about blueprints check out:
 https://blueprints.launchpad.net/cloudpulse
 https://blueprints.launchpad.net/python-cloudpulseclient

 For more details, check out our Wiki:
 https://wiki.openstack.org/wiki/Cloudpulse

 Plase join the CloudPulse team in designing and implementing a
 world-class Carrier Grade system for checking
 the health of OpenStack clouds.  We look forward to seeing you on IRC on
 #openstack-cloudpulse.

 Regards,
 Vinod Pandarinathan
 [1] https://github.com/openstack-dev/cookiecutter



 
_
_
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Jay Pipes

On 05/12/2015 02:24 PM, Vinod Pandarinathan (vpandari) wrote:

Very True. However the way I see these are  extensions/plugins to
cloudpulse framework, so when these are available, the data from these
tools are exposed.

Openstack health service provides an overall framework with out
assumptions on what is installed on the underlying cloud.
The service is expected to run on existing cloud deployments that may or
may not have any of this software (from tenant as well).


You mean, like Monasca?

https://wiki.openstack.org/wiki/Monasca

Sounds to me like you will at the very least need an agent of some sort 
on the VMs to communicate to an external system. And, that is the 
monasca-agent:


https://github.com/stackforge/monasca-agent

ala Nagios NRPE agent:

http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf

ala Zabbix agent:

https://www.zabbix.com/documentation/2.0/manual/concepts/agent

ala Icinga agent:

http://docs.icinga.org/latest/en/nrpe.html

So, cloudpulse would be yet another agent for sending healthcheck 
messages to an external system, in order for the framework not to make 
any assumptions on what is insyalled in the underlying cloud -- other 
than the assumption you'd need yet another agent installed.



Core health checks for operators and tenants test basic openstack services
which are present in any openstack cloud.


Operators != tenants. Trying to make the two equal each other and you 
end up with Ceilometer and Triple-O -- with all the accompanying 
complexity therein.


Best,
-jay


Thanks for the feedback.


Thanks
Vinod.

On 5/12/15, 10:48 AM, Jay Pipes jaypi...@gmail.com wrote:


For operators:

* Nagios
* Icinga
* Zabbix

installed on baremetal machines deployed with the OpenStack and other
infrastructure services.

For tenants:

* Nagios
* Icinga
* Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:

Hello,

I'm pleased to announce the development of a new project called
CloudPulse.  CloudPulse provides Openstack
health-checking services to both operators, tenants, and applications.
This project will begin as
a StackForge project based upon an empty cookiecutter[1] repo.  The
repos to work in are:
Server: https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting the
week after summit.  This doodle poll will close May 24th and meeting
times will be announced on the mailing list at that time.  At our first
IRC meeting,
we will draft additional core team members, so if your interested in
joining a fresh new development effort, please attend our first meeting.
Please take a moment if your interested in CloudPulse to fill out the
doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
Pandarinathan.
I expect more members to join during our initial meeting.

   A little bit about CloudPulse:
   Cloud operators need notification of OpenStack failures before a
customer reports the failure. Cloud operators can then take timely
corrective actions with minimal disruption to applications.  Many cloud
applications, including
those I am interested in (NFV) have very stringent service level
agreements.  Loss of service can trigger contractual
costs associated with the service.  Application high availability
requires an operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious ways.
This project intends to identify when those failures
occur so corrective actions may be taken by operators, tenants, and the
applications themselves.

OpenStack is considered healthy when OpenStack API services respond
appropriately.  Further OpenStack is
healthy when network traffic can be sent between the tenant networks and
can access the Internet.  Finally OpenStack
is healthy when all infrastructure cluster elements are in an
operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a
world-class Carrier Grade system for checking
the health of OpenStack clouds.  We look forward to seeing you on IRC on
#openstack-cloudpulse.

Regards,
Vinod Pandarinathan
[1] https://github.com/openstack-dev/cookiecutter




_
_
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Fox, Kevin M
Hooking it into Zaqar would be awesome too. Once you can trigger Mistral 
workflows based on Zaqar messages, just imagine the possibilities...

Kevin


From: Steven Dake (stdake)
Sent: Tuesday, May 12, 2015 12:02:59 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to 
HealthCheck OpenStack deployments

Kevin,

This is a great idea that would make a solid extension to the software.
If I read the wiki page correctly, the real goal is for operators and
tenants to be able to be notified via querying the ReST API so they could
write their own email/pager-duty app.

Regards
-steve

On 5/12/15, 11:16 AM, Fox, Kevin M kevin@pnnl.gov wrote:

Nagios/watever As A Service would actually be very useful I think.

Setting up a monitoring server is a fair amount of work. If Cloud Apps
downloaded from an OpenStack Catalog had a Monitoring Heat resource built
in, that would register the launched app with a multitenant aware Cloud
Monitoring Service, the user would only have to launch an app, and then
go into the Dashboard and associate some kind of alerting policy with the
registered checks. Say, email this address when things break. That would
be awesome. :)

Thanks,
Kevin

From: Jay Pipes [jaypi...@gmail.com]
Sent: Tuesday, May 12, 2015 10:48 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to
HealthCheck OpenStack deployments

For operators:

* Nagios
* Icinga
* Zabbix

installed on baremetal machines deployed with the OpenStack and other
infrastructure services.

For tenants:

* Nagios
* Icinga
* Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:
 Hello,

I'm pleased to announce the development of a new project called
 CloudPulse.  CloudPulse provides Openstack
 health-checking services to both operators, tenants, and applications.
 This project will begin as
 a StackForge project based upon an empty cookiecutter[1] repo.  The
 repos to work in are:
 Server: https://github.com/stackforge/cloudpulse
 Client: https://github.com/stackforge/python-cloudpulseclient

 Please join us via iRC on #openstack-cloudpulse on freenode.

 I am holding a doodle poll to select times for our first meeting the
 week after summit.  This doodle poll will close May 24th and meeting
 times will be announced on the mailing list at that time.  At our first
 IRC meeting,
 we will draft additional core team members, so if your interested in
 joining a fresh new development effort, please attend our first meeting.
 Please take a moment if your interested in CloudPulse to fill out the
 doodle poll here:

 https://doodle.com/kcpvzy8kfrxe6rvb

 The initial core team is composed of
 Ajay Kalambur,
 Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
 Pandarinathan.
 I expect more members to join during our initial meeting.

   A little bit about CloudPulse:
   Cloud operators need notification of OpenStack failures before a
 customer reports the failure. Cloud operators can then take timely
 corrective actions with minimal disruption to applications.  Many cloud
 applications, including
 those I am interested in (NFV) have very stringent service level
 agreements.  Loss of service can trigger contractual
 costs associated with the service.  Application high availability
 requires an operational OpenStack Cloud, and the reality
 is that occascionally OpenStack clouds fail in some mysterious ways.
 This project intends to identify when those failures
 occur so corrective actions may be taken by operators, tenants, and the
 applications themselves.

 OpenStack is considered healthy when OpenStack API services respond
 appropriately.  Further OpenStack is
 healthy when network traffic can be sent between the tenant networks and
 can access the Internet.  Finally OpenStack
 is healthy when all infrastructure cluster elements are in an
 operational state.

 For information about blueprints check out:
 https://blueprints.launchpad.net/cloudpulse
 https://blueprints.launchpad.net/python-cloudpulseclient

 For more details, check out our Wiki:
 https://wiki.openstack.org/wiki/Cloudpulse

 Plase join the CloudPulse team in designing and implementing a
 world-class Carrier Grade system for checking
 the health of OpenStack clouds.  We look forward to seeing you on IRC on
 #openstack-cloudpulse.

 Regards,
 Vinod Pandarinathan
 [1] https://github.com/openstack-dev/cookiecutter




_
_
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Steven Dake (stdake)
Kevin,

This is a great idea that would make a solid extension to the software.
If I read the wiki page correctly, the real goal is for operators and
tenants to be able to be notified via querying the ReST API so they could
write their own email/pager-duty app.

Regards
-steve

On 5/12/15, 11:16 AM, Fox, Kevin M kevin@pnnl.gov wrote:

Nagios/watever As A Service would actually be very useful I think.

Setting up a monitoring server is a fair amount of work. If Cloud Apps
downloaded from an OpenStack Catalog had a Monitoring Heat resource built
in, that would register the launched app with a multitenant aware Cloud
Monitoring Service, the user would only have to launch an app, and then
go into the Dashboard and associate some kind of alerting policy with the
registered checks. Say, email this address when things break. That would
be awesome. :)

Thanks,
Kevin

From: Jay Pipes [jaypi...@gmail.com]
Sent: Tuesday, May 12, 2015 10:48 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to
HealthCheck OpenStack deployments

For operators:

* Nagios
* Icinga
* Zabbix

installed on baremetal machines deployed with the OpenStack and other
infrastructure services.

For tenants:

* Nagios
* Icinga
* Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:
 Hello,

I'm pleased to announce the development of a new project called
 CloudPulse.  CloudPulse provides Openstack
 health-checking services to both operators, tenants, and applications.
 This project will begin as
 a StackForge project based upon an empty cookiecutter[1] repo.  The
 repos to work in are:
 Server: https://github.com/stackforge/cloudpulse
 Client: https://github.com/stackforge/python-cloudpulseclient

 Please join us via iRC on #openstack-cloudpulse on freenode.

 I am holding a doodle poll to select times for our first meeting the
 week after summit.  This doodle poll will close May 24th and meeting
 times will be announced on the mailing list at that time.  At our first
 IRC meeting,
 we will draft additional core team members, so if your interested in
 joining a fresh new development effort, please attend our first meeting.
 Please take a moment if your interested in CloudPulse to fill out the
 doodle poll here:

 https://doodle.com/kcpvzy8kfrxe6rvb

 The initial core team is composed of
 Ajay Kalambur,
 Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
 Pandarinathan.
 I expect more members to join during our initial meeting.

   A little bit about CloudPulse:
   Cloud operators need notification of OpenStack failures before a
 customer reports the failure. Cloud operators can then take timely
 corrective actions with minimal disruption to applications.  Many cloud
 applications, including
 those I am interested in (NFV) have very stringent service level
 agreements.  Loss of service can trigger contractual
 costs associated with the service.  Application high availability
 requires an operational OpenStack Cloud, and the reality
 is that occascionally OpenStack clouds fail in some mysterious ways.
 This project intends to identify when those failures
 occur so corrective actions may be taken by operators, tenants, and the
 applications themselves.

 OpenStack is considered healthy when OpenStack API services respond
 appropriately.  Further OpenStack is
 healthy when network traffic can be sent between the tenant networks and
 can access the Internet.  Finally OpenStack
 is healthy when all infrastructure cluster elements are in an
 operational state.

 For information about blueprints check out:
 https://blueprints.launchpad.net/cloudpulse
 https://blueprints.launchpad.net/python-cloudpulseclient

 For more details, check out our Wiki:
 https://wiki.openstack.org/wiki/Cloudpulse

 Plase join the CloudPulse team in designing and implementing a
 world-class Carrier Grade system for checking
 the health of OpenStack clouds.  We look forward to seeing you on IRC on
 #openstack-cloudpulse.

 Regards,
 Vinod Pandarinathan
 [1] https://github.com/openstack-dev/cookiecutter



 
_
_
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Richard Raseley


On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:

Hello,

I'm pleased to announce the development of a new project called
CloudPulse.  CloudPulse provides Openstack health-checking services
to both operators, tenants, and applications. This project will
begin as a StackForge project based upon an empty cookiecutter[1]
repo.  The repos to work in are: Server:
https://github.com/stackforge/cloudpulse Client:
https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting
the week after summit.  This doodle poll will close May 24th and
meeting times will be announced on the mailing list at that time.
At our first IRC meeting, we will draft additional core team
members, so if your interested in joining a fresh new development
effort, please attend our first meeting. Please take a moment if
your interested in CloudPulse to fill out the doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of Ajay Kalambur, Behzad Dastur,
Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
Pandarinathan. I expect more members to join during our initial
meeting.

A little bit about CloudPulse: Cloud operators need notification of
OpenStack failures before a customer reports the failure. Cloud
operators can then take timely corrective actions with minimal
disruption to applications.  Many cloud applications, including
those I am interested in (NFV) have very stringent service level
agreements.  Loss of service can trigger contractual costs
associated with the service.  Application high availability
requires an operational OpenStack Cloud, and the reality is that
occascionally OpenStack clouds fail in some mysterious ways. This
project intends to identify when those failures occur so corrective
actions may be taken by operators, tenants, and the applications
themselves.

OpenStack is considered healthy when OpenStack API services
respond appropriately.  Further OpenStack is healthy when network
traffic can be sent between the tenant networks and can access the
Internet.  Finally OpenStack is healthy when all infrastructure
cluster elements are in an operational state.

For information about blueprints check out:
https://blueprints.launchpad.net/cloudpulse
https://blueprints.launchpad.net/python-cloudpulseclient

For more details, check out our Wiki:
https://wiki.openstack.org/wiki/Cloudpulse

Plase join the CloudPulse team in designing and implementing a
world-class Carrier Grade system for checking the health of
OpenStack clouds.  We look forward to seeing you on IRC on
#openstack-cloudpulse.

Regards, Vinod Pandarinathan [1]
https://github.com/openstack-dev/cookiecutter


As others have expressed - I am a little skeptical about the need to 
'reinvent the wheel' with regards to monitoring.


Are there a well-defined set of business or user requirements which 
would be enabled by CloudPulse which are not enabled by existing 
solutions? I am just trying to better wrap my need around the problem...


Regards,

Richard

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Julien Danjou
On Tue, May 12 2015, Steven Dake (stdake) wrote:

 This is a great idea that would make a solid extension to the software.
 If I read the wiki page correctly, the real goal is for operators and
 tenants to be able to be notified via querying the ReST API so they could
 write their own email/pager-duty app.

Then leveraging Ceilometer polling and alarming systems could make you
avoid reinventing a large portion of the wheel.

-- 
Julien Danjou
// Free Software hacker
// http://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Fox, Kevin M
It totally depends on how much experience you think a tenant user has...

If we're talking about devops, they tend to have the skills to stand up a 
configuration management server, a monitoring server, and manage everything via 
config management.

If tenant users are research scientists, like some of ours, its a fair amount 
of work to manage nagios without config management, and config management is 
way more effort then most researchers want to put into learning. That's where 
an app catalog becomes important, and something like monitoring as a service 
starts to become interesting 

Thanks,
Kevin

From: Jay Pipes [jaypi...@gmail.com]
Sent: Tuesday, May 12, 2015 12:50 PM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to 
HealthCheck OpenStack deployments

On 05/12/2015 02:16 PM, Fox, Kevin M wrote:
 Nagios/watever As A Service would actually be very useful I think.

I don't really understand why Nagios-as-a-Service would be useful to
operators. I mean, operators install their monitoring system of choice
via their configuration management tool of choice -- Ansible, SaltStack,
Puppet, Chef, etc.

Frankly, so do tenants. Tenants install software on their images using
configuration management tools like mentioned above... I don't see a
reason to have Nagios-as-a-Service for tenants either.

 Setting up a monitoring server is a fair amount of work.

Not really. It's typically a simple apt-get install nagios-nrpe-plugins
on client VMs along with an apt-get install nagios-server on one or more
monitoring system VMs. Again, have configuration management systems
inject whatever check scripts you want paired with the ones that already
come with nagios-nrpe-plugins package.

  If Cloud
 Apps downloaded from an OpenStack Catalog had a Monitoring Heat
 resource built in, that would register the launched app with a
 multitenant aware Cloud Monitoring Service, the user would only have
 to launch an app, and then go into the Dashboard and associate some
 kind of alerting policy with the registered checks. Say, email this
 address when things break. That would be awesome. :)

I guess I just don't see this being in the realm of OpenStack. Or at
least, not more than something like a Murano application manifest which
is almost what you are describing above.

I don't see the need for this service, sorry. Not everything needs to be
re-invented as a RESTful Python service endpoint...

Best,
-jay

 Thanks, Kevin 

From: Jay Pipes [jaypi...@gmail.com] Sent: Tuesday, May 12, 2015 10:48
AM To:
 openstack-dev@lists.openstack.org Subject: Re: [openstack-dev]
 [new][cloudpulse] Announcing a project to HealthCheck OpenStack
 deployments

 For operators:

 * Nagios * Icinga * Zabbix

 installed on baremetal machines deployed with the OpenStack and
 other infrastructure services.

 For tenants:

 * Nagios * Icinga * Zabbix

 installed on their VMs.

 Why are we re-inventing excellent open-source implementations of
 monitoring systems that have been around for over a decade?

 Best, -jay

 p.s. Sorry for top-posting.

 On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:
 Hello,

 I'm pleased to announce the development of a new project called
 CloudPulse.  CloudPulse provides Openstack health-checking services
 to both operators, tenants, and applications. This project will
 begin as a StackForge project based upon an empty cookiecutter[1]
 repo.  The repos to work in are: Server:
 https://github.com/stackforge/cloudpulse Client:
 https://github.com/stackforge/python-cloudpulseclient

 Please join us via iRC on #openstack-cloudpulse on freenode.

 I am holding a doodle poll to select times for our first meeting
 the week after summit.  This doodle poll will close May 24th and
 meeting times will be announced on the mailing list at that time.
 At our first IRC meeting, we will draft additional core team
 members, so if your interested in joining a fresh new development
 effort, please attend our first meeting. Please take a moment if
 your interested in CloudPulse to fill out the doodle poll here:

 https://doodle.com/kcpvzy8kfrxe6rvb

 The initial core team is composed of Ajay Kalambur, Behzad Dastur,
 Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
 Pandarinathan. I expect more members to join during our initial
 meeting.

 A little bit about CloudPulse: Cloud operators need notification of
 OpenStack failures before a customer reports the failure. Cloud
 operators can then take timely corrective actions with minimal
 disruption to applications.  Many cloud applications, including
 those I am interested in (NFV) have very stringent service level
 agreements.  Loss of service can trigger contractual costs
 associated with the service.  Application high availability
 requires an operational OpenStack Cloud, and the reality is that
 occascionally OpenStack clouds fail in some

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Georgy Okrokvertskhov
Here is the way how we do VM level monitoring in application catalog. There
is an application Nagios which will deploy a Nagios VM to the user tenant.
And this Nagios application exposes abstracted monitoring app interface to
add probes and checks. Another application, Ceilometer Alarm also allows
you to use the same monitoring interface to add check for a VM.

Demo is here: https://www.youtube.com/watch?v=OvPpJd0EOFw

As usual Heat is used under the hood for infrastructure level management.
You can add other monitoring apps like Zabbix (
https://github.com/openstack/murano-apps/tree/master/ZabbixAgent/package,
https://github.com/openstack/murano-apps/tree/master/ZabbixServer/package)

Thanks
Gosha

On Tue, May 12, 2015 at 1:31 PM, Fox, Kevin M kevin@pnnl.gov wrote:

 It totally depends on how much experience you think a tenant user has...

 If we're talking about devops, they tend to have the skills to stand up a
 configuration management server, a monitoring server, and manage everything
 via config management.

 If tenant users are research scientists, like some of ours, its a fair
 amount of work to manage nagios without config management, and config
 management is way more effort then most researchers want to put into
 learning. That's where an app catalog becomes important, and something like
 monitoring as a service starts to become interesting

 Thanks,
 Kevin
 
 From: Jay Pipes [jaypi...@gmail.com]
 Sent: Tuesday, May 12, 2015 12:50 PM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to
 HealthCheck OpenStack deployments

 On 05/12/2015 02:16 PM, Fox, Kevin M wrote:
  Nagios/watever As A Service would actually be very useful I think.

 I don't really understand why Nagios-as-a-Service would be useful to
 operators. I mean, operators install their monitoring system of choice
 via their configuration management tool of choice -- Ansible, SaltStack,
 Puppet, Chef, etc.

 Frankly, so do tenants. Tenants install software on their images using
 configuration management tools like mentioned above... I don't see a
 reason to have Nagios-as-a-Service for tenants either.

  Setting up a monitoring server is a fair amount of work.

 Not really. It's typically a simple apt-get install nagios-nrpe-plugins
 on client VMs along with an apt-get install nagios-server on one or more
 monitoring system VMs. Again, have configuration management systems
 inject whatever check scripts you want paired with the ones that already
 come with nagios-nrpe-plugins package.

   If Cloud
  Apps downloaded from an OpenStack Catalog had a Monitoring Heat
  resource built in, that would register the launched app with a
  multitenant aware Cloud Monitoring Service, the user would only have
  to launch an app, and then go into the Dashboard and associate some
  kind of alerting policy with the registered checks. Say, email this
  address when things break. That would be awesome. :)

 I guess I just don't see this being in the realm of OpenStack. Or at
 least, not more than something like a Murano application manifest which
 is almost what you are describing above.

 I don't see the need for this service, sorry. Not everything needs to be
 re-invented as a RESTful Python service endpoint...

 Best,
 -jay

  Thanks, Kevin 

 From: Jay Pipes [jaypi...@gmail.com] Sent: Tuesday, May 12, 2015 10:48
 AM To:
  openstack-dev@lists.openstack.org Subject: Re: [openstack-dev]
  [new][cloudpulse] Announcing a project to HealthCheck OpenStack
  deployments
 
  For operators:
 
  * Nagios * Icinga * Zabbix
 
  installed on baremetal machines deployed with the OpenStack and
  other infrastructure services.
 
  For tenants:
 
  * Nagios * Icinga * Zabbix
 
  installed on their VMs.
 
  Why are we re-inventing excellent open-source implementations of
  monitoring systems that have been around for over a decade?
 
  Best, -jay
 
  p.s. Sorry for top-posting.
 
  On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:
  Hello,
 
  I'm pleased to announce the development of a new project called
  CloudPulse.  CloudPulse provides Openstack health-checking services
  to both operators, tenants, and applications. This project will
  begin as a StackForge project based upon an empty cookiecutter[1]
  repo.  The repos to work in are: Server:
  https://github.com/stackforge/cloudpulse Client:
  https://github.com/stackforge/python-cloudpulseclient
 
  Please join us via iRC on #openstack-cloudpulse on freenode.
 
  I am holding a doodle poll to select times for our first meeting
  the week after summit.  This doodle poll will close May 24th and
  meeting times will be announced on the mailing list at that time.
  At our first IRC meeting, we will draft additional core team
  members, so if your interested in joining a fresh new development
  effort, please attend our first meeting. Please take a moment

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Vinod Pandarinathan (vpandari)

There are several differences:

1. Cloudpulse does not need any agent or special software installed on the
underlying cloud, the service can be installed on a tenant VM having
access to API network.

2. Cloudpulse can be configured for running specific test groups
periodically exercising core open stack services.

3. Monasca is interesting, cloudpulse differs in the sense that it
provides pluggable extensions/api-tests (manipulate resources
Vms/networks/etc), flexible enough for the operator or the
application/tenant configure the time-interval and test-group that has to
run.

4. In addition Cloudpulse uses Openstack infra components without
complexity of kafka/zookeeper/spark etc.

Thanks
Vinod.

On 5/12/15, 12:59 PM, Jay Pipes jaypi...@gmail.com wrote:

On 05/12/2015 02:24 PM, Vinod Pandarinathan (vpandari) wrote:
 Very True. However the way I see these are  extensions/plugins to
 cloudpulse framework, so when these are available, the data from these
 tools are exposed.

 Openstack health service provides an overall framework with out
 assumptions on what is installed on the underlying cloud.
 The service is expected to run on existing cloud deployments that may or
 may not have any of this software (from tenant as well).

You mean, like Monasca?

https://wiki.openstack.org/wiki/Monasca

Sounds to me like you will at the very least need an agent of some sort
on the VMs to communicate to an external system. And, that is the
monasca-agent:

https://github.com/stackforge/monasca-agent

ala Nagios NRPE agent:

http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf

ala Zabbix agent:

https://www.zabbix.com/documentation/2.0/manual/concepts/agent

ala Icinga agent:

http://docs.icinga.org/latest/en/nrpe.html

So, cloudpulse would be yet another agent for sending healthcheck
messages to an external system, in order for the framework not to make
any assumptions on what is insyalled in the underlying cloud -- other
than the assumption you'd need yet another agent installed.

 Core health checks for operators and tenants test basic openstack
services
 which are present in any openstack cloud.

Operators != tenants. Trying to make the two equal each other and you
end up with Ceilometer and Triple-O -- with all the accompanying
complexity therein.

Best,
-jay

 Thanks for the feedback.


 Thanks
 Vinod.

 On 5/12/15, 10:48 AM, Jay Pipes jaypi...@gmail.com wrote:

 For operators:

 * Nagios
 * Icinga
 * Zabbix

 installed on baremetal machines deployed with the OpenStack and other
 infrastructure services.

 For tenants:

 * Nagios
 * Icinga
 * Zabbix

 installed on their VMs.

 Why are we re-inventing excellent open-source implementations of
 monitoring systems that have been around for over a decade?

 Best,
 -jay

 p.s. Sorry for top-posting.

 On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:
 Hello,

 I'm pleased to announce the development of a new project called
 CloudPulse.  CloudPulse provides Openstack
 health-checking services to both operators, tenants, and applications.
 This project will begin as
 a StackForge project based upon an empty cookiecutter[1] repo.  The
 repos to work in are:
 Server: https://github.com/stackforge/cloudpulse
 Client: https://github.com/stackforge/python-cloudpulseclient

 Please join us via iRC on #openstack-cloudpulse on freenode.

 I am holding a doodle poll to select times for our first meeting the
 week after summit.  This doodle poll will close May 24th and meeting
 times will be announced on the mailing list at that time.  At our
first
 IRC meeting,
 we will draft additional core team members, so if your interested in
 joining a fresh new development effort, please attend our first
meeting.
 Please take a moment if your interested in CloudPulse to fill out the
 doodle poll here:

 https://doodle.com/kcpvzy8kfrxe6rvb

 The initial core team is composed of
 Ajay Kalambur,
 Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
 Pandarinathan.
 I expect more members to join during our initial meeting.

A little bit about CloudPulse:
Cloud operators need notification of OpenStack failures before a
 customer reports the failure. Cloud operators can then take timely
 corrective actions with minimal disruption to applications.  Many
cloud
 applications, including
 those I am interested in (NFV) have very stringent service level
 agreements.  Loss of service can trigger contractual
 costs associated with the service.  Application high availability
 requires an operational OpenStack Cloud, and the reality
 is that occascionally OpenStack clouds fail in some mysterious ways.
 This project intends to identify when those failures
 occur so corrective actions may be taken by operators, tenants, and
the
 applications themselves.

 OpenStack is considered healthy when OpenStack API services respond
 appropriately.  Further OpenStack is
 healthy when network traffic can be sent between the tenant networks
and
 can access the Internet.  Finally 

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Fox, Kevin M
Cool, but using nagios or the like to trigger app level actions is not what I'm 
primarily interested in. Mostly the reverse. Its for the app definition to 
provide the information nessisary for a monitoring system to report to the user 
when something is very wrong and needs intervention. For example, the website 
is unresponsive because the backend database server in the demo goes offline, 
or the maximum number of servers is already been reached and the site 
responsiveness is bad due to excessive load. Murano doesn't do that currently, 
does it?

Thanks,
Kevin


From: Georgy Okrokvertskhov [gokrokvertsk...@mirantis.com]
Sent: Tuesday, May 12, 2015 2:04 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to 
HealthCheck OpenStack deployments

Here is the way how we do VM level monitoring in application catalog. There is 
an application Nagios which will deploy a Nagios VM to the user tenant. And 
this Nagios application exposes abstracted monitoring app interface to add 
probes and checks. Another application, Ceilometer Alarm also allows you to use 
the same monitoring interface to add check for a VM.

Demo is here: https://www.youtube.com/watch?v=OvPpJd0EOFw

As usual Heat is used under the hood for infrastructure level management. You 
can add other monitoring apps like Zabbix 
(https://github.com/openstack/murano-apps/tree/master/ZabbixAgent/package, 
https://github.com/openstack/murano-apps/tree/master/ZabbixServer/package)

Thanks
Gosha

On Tue, May 12, 2015 at 1:31 PM, Fox, Kevin M 
kevin@pnnl.govmailto:kevin@pnnl.gov wrote:
It totally depends on how much experience you think a tenant user has...

If we're talking about devops, they tend to have the skills to stand up a 
configuration management server, a monitoring server, and manage everything via 
config management.

If tenant users are research scientists, like some of ours, its a fair amount 
of work to manage nagios without config management, and config management is 
way more effort then most researchers want to put into learning. That's where 
an app catalog becomes important, and something like monitoring as a service 
starts to become interesting

Thanks,
Kevin

From: Jay Pipes [jaypi...@gmail.commailto:jaypi...@gmail.com]
Sent: Tuesday, May 12, 2015 12:50 PM
To: openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to 
HealthCheck OpenStack deployments

On 05/12/2015 02:16 PM, Fox, Kevin M wrote:
 Nagios/watever As A Service would actually be very useful I think.

I don't really understand why Nagios-as-a-Service would be useful to
operators. I mean, operators install their monitoring system of choice
via their configuration management tool of choice -- Ansible, SaltStack,
Puppet, Chef, etc.

Frankly, so do tenants. Tenants install software on their images using
configuration management tools like mentioned above... I don't see a
reason to have Nagios-as-a-Service for tenants either.

 Setting up a monitoring server is a fair amount of work.

Not really. It's typically a simple apt-get install nagios-nrpe-plugins
on client VMs along with an apt-get install nagios-server on one or more
monitoring system VMs. Again, have configuration management systems
inject whatever check scripts you want paired with the ones that already
come with nagios-nrpe-plugins package.

  If Cloud
 Apps downloaded from an OpenStack Catalog had a Monitoring Heat
 resource built in, that would register the launched app with a
 multitenant aware Cloud Monitoring Service, the user would only have
 to launch an app, and then go into the Dashboard and associate some
 kind of alerting policy with the registered checks. Say, email this
 address when things break. That would be awesome. :)

I guess I just don't see this being in the realm of OpenStack. Or at
least, not more than something like a Murano application manifest which
is almost what you are describing above.

I don't see the need for this service, sorry. Not everything needs to be
re-invented as a RESTful Python service endpoint...

Best,
-jay

 Thanks, Kevin 

From: Jay Pipes [jaypi...@gmail.commailto:jaypi...@gmail.com] Sent: Tuesday, 
May 12, 2015 10:48
AM To:
 openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org 
 Subject: Re: [openstack-dev]
 [new][cloudpulse] Announcing a project to HealthCheck OpenStack
 deployments

 For operators:

 * Nagios * Icinga * Zabbix

 installed on baremetal machines deployed with the OpenStack and
 other infrastructure services.

 For tenants:

 * Nagios * Icinga * Zabbix

 installed on their VMs.

 Why are we re-inventing excellent open-source implementations of
 monitoring systems that have been around for over a decade?

 Best, -jay

 p.s. Sorry for top

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Georgy Okrokvertskhov
Murano itself does not provide any monitoring. The idea here is to expose
any application capabilities to do this. In this demo we had Java
application deployed on Tomcat VM and connected to PostgreDB. Java app
workflow executed Nagios application methods to register itself in Nagios
monitoring system by adding proper IP, port, URL information for standard
Nagios HTTP probes. Nagios itself has capabilities to send notifications to
any other services like e-mail, IM or custom (Murano, Heat etc via simple
bash\curl scripts).

So, if you want to have monitoring for you apps, then you probbaly will
need to modify Nagios application in murano to expose registration and
e-mail setup for end users. Then Nagios will send notifications to user
rather then to Murano.

Another option is to add specific workflows actions in Murano or register
Mistral workflow to react to Nagios monitoring event.

The last, but not the least option for application will be a set of actions
for some critical events. Application itself detects problems, like error
in DB transactions, and it sends POST request to action URL. Action will
call a workflow which will use monitoring application interface to trigger
an event in monitoring system. The idea here is that application itself
does not know beforehand which monitoring service is used, but it has a
requirement to have monitoring service available with know interface
implemented as Murano methods. I am not sure if I am good with explaining
all this :-)

Plenty of options available, but still they require some amount of work.

Thanks
Gosha

On Tue, May 12, 2015 at 5:07 PM, Fox, Kevin M kevin@pnnl.gov wrote:

  Cool, but using nagios or the like to trigger app level actions is not
 what I'm primarily interested in. Mostly the reverse. Its for the app
 definition to provide the information nessisary for a monitoring system to
 report to the user when something is very wrong and needs intervention. For
 example, the website is unresponsive because the backend database server in
 the demo goes offline, or the maximum number of servers is already been
 reached and the site responsiveness is bad due to excessive load. Murano
 doesn't do that currently, does it?

 Thanks,
 Kevin

  --
 *From:* Georgy Okrokvertskhov [gokrokvertsk...@mirantis.com]
 *Sent:* Tuesday, May 12, 2015 2:04 PM
 *To:* OpenStack Development Mailing List (not for usage questions)

 *Subject:* Re: [openstack-dev] [new][cloudpulse] Announcing a project to
 HealthCheck OpenStack deployments

   Here is the way how we do VM level monitoring in application catalog.
 There is an application Nagios which will deploy a Nagios VM to the user
 tenant. And this Nagios application exposes abstracted monitoring app
 interface to add probes and checks. Another application, Ceilometer Alarm
 also allows you to use the same monitoring interface to add check for a VM.

  Demo is here: https://www.youtube.com/watch?v=OvPpJd0EOFw

  As usual Heat is used under the hood for infrastructure level
 management. You can add other monitoring apps like Zabbix (
 https://github.com/openstack/murano-apps/tree/master/ZabbixAgent/package,
 https://github.com/openstack/murano-apps/tree/master/ZabbixServer/package)

  Thanks
 Gosha

 On Tue, May 12, 2015 at 1:31 PM, Fox, Kevin M kevin@pnnl.gov wrote:

 It totally depends on how much experience you think a tenant user has...

 If we're talking about devops, they tend to have the skills to stand up a
 configuration management server, a monitoring server, and manage everything
 via config management.

 If tenant users are research scientists, like some of ours, its a fair
 amount of work to manage nagios without config management, and config
 management is way more effort then most researchers want to put into
 learning. That's where an app catalog becomes important, and something like
 monitoring as a service starts to become interesting

 Thanks,
 Kevin
 
 From: Jay Pipes [jaypi...@gmail.com]
 Sent: Tuesday, May 12, 2015 12:50 PM
  To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [new][cloudpulse] Announcing a project to
 HealthCheck OpenStack deployments

 On 05/12/2015 02:16 PM, Fox, Kevin M wrote:
  Nagios/watever As A Service would actually be very useful I think.

 I don't really understand why Nagios-as-a-Service would be useful to
 operators. I mean, operators install their monitoring system of choice
 via their configuration management tool of choice -- Ansible, SaltStack,
 Puppet, Chef, etc.

 Frankly, so do tenants. Tenants install software on their images using
 configuration management tools like mentioned above... I don't see a
 reason to have Nagios-as-a-Service for tenants either.

  Setting up a monitoring server is a fair amount of work.

 Not really. It's typically a simple apt-get install nagios-nrpe-plugins
 on client VMs along with an apt-get install nagios-server on one or more

Re: [openstack-dev] [new][cloudpulse] Announcing a project to HealthCheck OpenStack deployments

2015-05-12 Thread Jay Pipes

On 05/12/2015 05:05 PM, Vinod Pandarinathan (vpandari) wrote:

There are several differences:

1. Cloudpulse does not need any agent or special software installed on the
underlying cloud, the service can be installed on a tenant VM having
access to API network.

2. Cloudpulse can be configured for running specific test groups
periodically exercising core open stack services.

3. Monasca is interesting, cloudpulse differs in the sense that it
provides pluggable extensions/api-tests (manipulate resources
Vms/networks/etc), flexible enough for the operator or the
application/tenant configure the time-interval and test-group that has to
run.

4. In addition Cloudpulse uses Openstack infra components without
complexity of kafka/zookeeper/spark etc.


Cloudpulse doesn't yet exist, so I think saying it is different from 
these things before it has anything to be different about is a bit 
premature.


Again, I'd highly advise those folks involved in this effort to take a 
look at the existing solutions in this space and perhaps find ways to 
collaborate and improve on what already exists.


Best,
-jay


Thanks
Vinod.

On 5/12/15, 12:59 PM, Jay Pipes jaypi...@gmail.com wrote:


On 05/12/2015 02:24 PM, Vinod Pandarinathan (vpandari) wrote:

Very True. However the way I see these are  extensions/plugins to
cloudpulse framework, so when these are available, the data from these
tools are exposed.

Openstack health service provides an overall framework with out
assumptions on what is installed on the underlying cloud.
The service is expected to run on existing cloud deployments that may or
may not have any of this software (from tenant as well).


You mean, like Monasca?

https://wiki.openstack.org/wiki/Monasca

Sounds to me like you will at the very least need an agent of some sort
on the VMs to communicate to an external system. And, that is the
monasca-agent:

https://github.com/stackforge/monasca-agent

ala Nagios NRPE agent:

http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf

ala Zabbix agent:

https://www.zabbix.com/documentation/2.0/manual/concepts/agent

ala Icinga agent:

http://docs.icinga.org/latest/en/nrpe.html

So, cloudpulse would be yet another agent for sending healthcheck
messages to an external system, in order for the framework not to make
any assumptions on what is insyalled in the underlying cloud -- other
than the assumption you'd need yet another agent installed.


Core health checks for operators and tenants test basic openstack
services
which are present in any openstack cloud.


Operators != tenants. Trying to make the two equal each other and you
end up with Ceilometer and Triple-O -- with all the accompanying
complexity therein.

Best,
-jay


Thanks for the feedback.


Thanks
Vinod.

On 5/12/15, 10:48 AM, Jay Pipes jaypi...@gmail.com wrote:


For operators:

* Nagios
* Icinga
* Zabbix

installed on baremetal machines deployed with the OpenStack and other
infrastructure services.

For tenants:

* Nagios
* Icinga
* Zabbix

installed on their VMs.

Why are we re-inventing excellent open-source implementations of
monitoring systems that have been around for over a decade?

Best,
-jay

p.s. Sorry for top-posting.

On 05/12/2015 01:20 PM, Vinod Pandarinathan (vpandari) wrote:

Hello,

 I'm pleased to announce the development of a new project called
CloudPulse.  CloudPulse provides Openstack
health-checking services to both operators, tenants, and applications.
This project will begin as
a StackForge project based upon an empty cookiecutter[1] repo.  The
repos to work in are:
Server: https://github.com/stackforge/cloudpulse
Client: https://github.com/stackforge/python-cloudpulseclient

Please join us via iRC on #openstack-cloudpulse on freenode.

I am holding a doodle poll to select times for our first meeting the
week after summit.  This doodle poll will close May 24th and meeting
times will be announced on the mailing list at that time.  At our
first
IRC meeting,
we will draft additional core team members, so if your interested in
joining a fresh new development effort, please attend our first
meeting.
Please take a moment if your interested in CloudPulse to fill out the
doodle poll here:

https://doodle.com/kcpvzy8kfrxe6rvb

The initial core team is composed of
Ajay Kalambur,
Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven DakeandVinod
Pandarinathan.
I expect more members to join during our initial meeting.

A little bit about CloudPulse:
Cloud operators need notification of OpenStack failures before a
customer reports the failure. Cloud operators can then take timely
corrective actions with minimal disruption to applications.  Many
cloud
applications, including
those I am interested in (NFV) have very stringent service level
agreements.  Loss of service can trigger contractual
costs associated with the service.  Application high availability
requires an operational OpenStack Cloud, and the reality
is that occascionally OpenStack clouds fail in some mysterious ways.
This