Re: [Openstack] availability/performance sensors/probes

2012-02-22 Thread Jasper Capel
I've uploaded the checks we use in production here at Spil Games to 
https://github.com/spilgames/swift. Besides check_swift (which is a functional 
test) everything's meant to gather statistics from the cluster and we're 
looking to replace that with a Graphite-based solution to avoid having to parse 
access logs and having more real-time metrics available. Nothing fancy, but it 
may be of use to someone.

Jasper



On Feb 21, 2012, at 11:54 PM, Tim Bell wrote:

 
 This does bring up a more generic problem of sharing the
 availability/performance code for all of the OpenStack components.
 
 At the design summit, this was proposed as one of the example use cases of
 the OpenStack community forge (I forget the exact name) but it was intended
 as a place for sharing code/procedures which were not intended to be part of
 the core but may be of interest to others.
 
 Was anything set up along these lines ?
 
 A set of production quality Nagios/Ganglia sensors would be very interesting
 if someone has these
 
 Tim
 
 -Original Message-
 From: openstack-bounces+tim.bell=cern...@lists.launchpad.net
 [mailto:openstack-bounces+tim.bell=cern...@lists.launchpad.net] On Behalf
 Of Jasper Capel
 Sent: 21 February 2012 18:29
 To: John Dickinson
 Cc: openstack@lists.launchpad.net
 Subject: Re: [Openstack] swprobe: swift middleware for sending metrics to
 graphite using statsd
 
 Hi John,
 
 Apparently my google-fu is not up to snuff, as I wasn't aware of that
 project.
 Had I been, I probably would've just extemded that one. :)
 
 Cheers,
 Jasper
 
 
 From: John Dickinson [m...@not.mn]
 Sent: Tuesday, February 21, 2012 5:44 PM
 To: Jasper Capel
 Cc: openstack@lists.launchpad.net
 Subject: Re: [Openstack] swprobe: swift middleware for sending metrics to
 graphite using statsd
 
 That's great. Have you by any chance seen
 https://github.com/pandemicsyn/swift-informant? It's something similar
 that we've been playing with at Rackspace.
 
 --John
 
 
 On Feb 21, 2012, at 10:36 AM, Jasper Capel wrote:
 
 Hi all,
 
 I'm announcing a piece of Swift middleware, swprobe [1], designed to
 gather run-time metrics and ship them off to Graphite [2] for near
 real-time
 monitoring. Currently it sends out bytes up- and downloaded per account,
 http methods and response codes and timings in miliseconds on each call.
 
 To be able to use this you need Graphite [2]. You also need statsd
 running,
 preferably on the local machine since there potentially many small UDP
 packets are being sent out. Please also note that we have not yet tested
 this
 with production workloads.
 
 [1] - https://github.com/spilgames/swprobe
 [2] - http://graphite.wikidot.com/
 [3] - https://github.com/etsy/statsd
 
 Best regards,
 
 --
 Jasper Capel
 Lead Infrastructure Engineer
 
 W http://www.spilgames.com | S jwcapel-spil
 
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] availability/performance sensors/probes

2012-02-22 Thread Florian Hines
If John Dickinson can steal me a 30 minute block at the conference I'll 
probably be giving a talk about it, but we (Rackspace) started switching to 
Graphite back in December. We're basically just following the etsy cookbook to 
graph all the things!.  

We're using https://github.com/pandemicsyn/swift-informant to fire events to 
statsd. It takes care of answering questions like:

How many Object GET 200's are we currently getting per second.
How many container ops are we doing per second.
What was the average request time of container HEAD's between 4-5PM last 
tuesday (which always seems to lead to the question of why are they so much 
slower today…oh look that node is having a weird hw issue)?

Swift's also really good about dumping info to the error log. We convert the 
majority of those log lines to events thats get fired to statsd using 
https://github.com/pandemicsyn/statsdlog.

That lets us track everything from container-replicator timeouts, auth service 
retries, to OSError's on the object servers (think we're tracking about 25-30 
log line patterns at the moment).

The last piece is just a hacked version of the swift-recon cli. It's what 
reports async-pending's, replication times, etc to graphite.

Right now it gets tied together by tiny hackish Flask app that generates some 
tv dashboard's and will probably start doing the monitoring/alerting for the 
traffic prediction/confidence bands (experimenting with just doing it with an 
irc bot).

--  
Florian Hines | @pandemicsyn
http://about.me/pandemicsyn


On Wednesday, February 22, 2012 at 2:50 AM, Jasper Capel wrote:

 I've uploaded the checks we use in production here at Spil Games to 
 https://github.com/spilgames/swift. Besides check_swift (which is a 
 functional test) everything's meant to gather statistics from the cluster and 
 we're looking to replace that with a Graphite-based solution to avoid having 
 to parse access logs and having more real-time metrics available. Nothing 
 fancy, but it may be of use to someone.
  
 Jasper
  
  
  
 On Feb 21, 2012, at 11:54 PM, Tim Bell wrote:
  
   
  This does bring up a more generic problem of sharing the
  availability/performance code for all of the OpenStack components.
   
  At the design summit, this was proposed as one of the example use cases of
  the OpenStack community forge (I forget the exact name) but it was intended
  as a place for sharing code/procedures which were not intended to be part of
  the core but may be of interest to others.
   
  Was anything set up along these lines ?
   
  A set of production quality Nagios/Ganglia sensors would be very interesting
  if someone has these
   
  Tim
   
   -Original Message-
   From: openstack-bounces+tim.bell=cern...@lists.launchpad.net 
   (mailto:cern...@lists.launchpad.net)
   [mailto:openstack-bounces+tim.bell=cern...@lists.launchpad.net 
   (mailto:cern...@lists.launchpad.net)] On Behalf
   Of Jasper Capel
   Sent: 21 February 2012 18:29
   To: John Dickinson
   Cc: openstack@lists.launchpad.net (mailto:openstack@lists.launchpad.net)
   Subject: Re: [Openstack] swprobe: swift middleware for sending metrics to
   graphite using statsd

   Hi John,

   Apparently my google-fu is not up to snuff, as I wasn't aware of that
  project.
   Had I been, I probably would've just extemded that one. :)

   Cheers,
   Jasper

   
   From: John Dickinson [m...@not.mn (mailto:m...@not.mn)]
   Sent: Tuesday, February 21, 2012 5:44 PM
   To: Jasper Capel
   Cc: openstack@lists.launchpad.net (mailto:openstack@lists.launchpad.net)
   Subject: Re: [Openstack] swprobe: swift middleware for sending metrics to
   graphite using statsd

   That's great. Have you by any chance seen
   https://github.com/pandemicsyn/swift-informant? It's something similar
   that we've been playing with at Rackspace.

   --John


   On Feb 21, 2012, at 10:36 AM, Jasper Capel wrote:

Hi all,
 
I'm announcing a piece of Swift middleware, swprobe [1], designed to
   gather run-time metrics and ship them off to Graphite [2] for near

   
  real-time
   monitoring. Currently it sends out bytes up- and downloaded per account,
   http methods and response codes and timings in miliseconds on each call.
 
To be able to use this you need Graphite [2]. You also need statsd
  running,
   preferably on the local machine since there potentially many small UDP
   packets are being sent out. Please also note that we have not yet tested

   
  this
   with production workloads.
 
[1] - https://github.com/spilgames/swprobe
[2] - http://graphite.wikidot.com/
[3] - https://github.com/etsy/statsd
 
Best regards,
 
--
Jasper Capel
Lead Infrastructure Engineer
 
W http://www.spilgames.com | S jwcapel-spil
 
 
 
___
Mailing list: https://launchpad.net/~openstack

Re: [Openstack] availability/performance sensors/probes

2012-02-22 Thread Jay Pipes

On 02/22/2012 11:45 AM, Florian Hines wrote:

If John Dickinson can steal me a 30 minute block at the conference I'll
probably be giving a talk about it, but we (Rackspace) started switching
to Graphite back in December. We're basically just following the etsy
cookbook to graph all the things!.


Well, if John doesn't, I will :)

But I bet John will.

-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] availability/performance sensors/probes

2012-02-21 Thread Tim Bell

This does bring up a more generic problem of sharing the
availability/performance code for all of the OpenStack components.

At the design summit, this was proposed as one of the example use cases of
the OpenStack community forge (I forget the exact name) but it was intended
as a place for sharing code/procedures which were not intended to be part of
the core but may be of interest to others.

Was anything set up along these lines ?

A set of production quality Nagios/Ganglia sensors would be very interesting
if someone has these

Tim

 -Original Message-
 From: openstack-bounces+tim.bell=cern...@lists.launchpad.net
 [mailto:openstack-bounces+tim.bell=cern...@lists.launchpad.net] On Behalf
 Of Jasper Capel
 Sent: 21 February 2012 18:29
 To: John Dickinson
 Cc: openstack@lists.launchpad.net
 Subject: Re: [Openstack] swprobe: swift middleware for sending metrics to
 graphite using statsd
 
 Hi John,
 
 Apparently my google-fu is not up to snuff, as I wasn't aware of that
project.
 Had I been, I probably would've just extemded that one. :)
 
 Cheers,
 Jasper
 
 
 From: John Dickinson [m...@not.mn]
 Sent: Tuesday, February 21, 2012 5:44 PM
 To: Jasper Capel
 Cc: openstack@lists.launchpad.net
 Subject: Re: [Openstack] swprobe: swift middleware for sending metrics to
 graphite using statsd
 
 That's great. Have you by any chance seen
 https://github.com/pandemicsyn/swift-informant? It's something similar
 that we've been playing with at Rackspace.
 
 --John
 
 
 On Feb 21, 2012, at 10:36 AM, Jasper Capel wrote:
 
  Hi all,
 
  I'm announcing a piece of Swift middleware, swprobe [1], designed to
 gather run-time metrics and ship them off to Graphite [2] for near
real-time
 monitoring. Currently it sends out bytes up- and downloaded per account,
 http methods and response codes and timings in miliseconds on each call.
 
  To be able to use this you need Graphite [2]. You also need statsd
running,
 preferably on the local machine since there potentially many small UDP
 packets are being sent out. Please also note that we have not yet tested
this
 with production workloads.
 
  [1] - https://github.com/spilgames/swprobe
  [2] - http://graphite.wikidot.com/
  [3] - https://github.com/etsy/statsd
 
  Best regards,
 
  --
  Jasper Capel
  Lead Infrastructure Engineer
 
  W http://www.spilgames.com | S jwcapel-spil
 
 
 
  ___
  Mailing list: https://launchpad.net/~openstack
  Post to : openstack@lists.launchpad.net
  Unsubscribe : https://launchpad.net/~openstack
  More help   : https://help.launchpad.net/ListHelp
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


smime.p7s
Description: S/MIME cryptographic signature
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp