Re: [Openstack] availability/performance sensors/probes
I've uploaded the checks we use in production here at Spil Games to https://github.com/spilgames/swift. Besides check_swift (which is a functional test) everything's meant to gather statistics from the cluster and we're looking to replace that with a Graphite-based solution to avoid having to parse access logs and having more real-time metrics available. Nothing fancy, but it may be of use to someone. Jasper On Feb 21, 2012, at 11:54 PM, Tim Bell wrote: This does bring up a more generic problem of sharing the availability/performance code for all of the OpenStack components. At the design summit, this was proposed as one of the example use cases of the OpenStack community forge (I forget the exact name) but it was intended as a place for sharing code/procedures which were not intended to be part of the core but may be of interest to others. Was anything set up along these lines ? A set of production quality Nagios/Ganglia sensors would be very interesting if someone has these Tim -Original Message- From: openstack-bounces+tim.bell=cern...@lists.launchpad.net [mailto:openstack-bounces+tim.bell=cern...@lists.launchpad.net] On Behalf Of Jasper Capel Sent: 21 February 2012 18:29 To: John Dickinson Cc: openstack@lists.launchpad.net Subject: Re: [Openstack] swprobe: swift middleware for sending metrics to graphite using statsd Hi John, Apparently my google-fu is not up to snuff, as I wasn't aware of that project. Had I been, I probably would've just extemded that one. :) Cheers, Jasper From: John Dickinson [m...@not.mn] Sent: Tuesday, February 21, 2012 5:44 PM To: Jasper Capel Cc: openstack@lists.launchpad.net Subject: Re: [Openstack] swprobe: swift middleware for sending metrics to graphite using statsd That's great. Have you by any chance seen https://github.com/pandemicsyn/swift-informant? It's something similar that we've been playing with at Rackspace. --John On Feb 21, 2012, at 10:36 AM, Jasper Capel wrote: Hi all, I'm announcing a piece of Swift middleware, swprobe [1], designed to gather run-time metrics and ship them off to Graphite [2] for near real-time monitoring. Currently it sends out bytes up- and downloaded per account, http methods and response codes and timings in miliseconds on each call. To be able to use this you need Graphite [2]. You also need statsd running, preferably on the local machine since there potentially many small UDP packets are being sent out. Please also note that we have not yet tested this with production workloads. [1] - https://github.com/spilgames/swprobe [2] - http://graphite.wikidot.com/ [3] - https://github.com/etsy/statsd Best regards, -- Jasper Capel Lead Infrastructure Engineer W http://www.spilgames.com | S jwcapel-spil ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] availability/performance sensors/probes
If John Dickinson can steal me a 30 minute block at the conference I'll probably be giving a talk about it, but we (Rackspace) started switching to Graphite back in December. We're basically just following the etsy cookbook to graph all the things!. We're using https://github.com/pandemicsyn/swift-informant to fire events to statsd. It takes care of answering questions like: How many Object GET 200's are we currently getting per second. How many container ops are we doing per second. What was the average request time of container HEAD's between 4-5PM last tuesday (which always seems to lead to the question of why are they so much slower today…oh look that node is having a weird hw issue)? Swift's also really good about dumping info to the error log. We convert the majority of those log lines to events thats get fired to statsd using https://github.com/pandemicsyn/statsdlog. That lets us track everything from container-replicator timeouts, auth service retries, to OSError's on the object servers (think we're tracking about 25-30 log line patterns at the moment). The last piece is just a hacked version of the swift-recon cli. It's what reports async-pending's, replication times, etc to graphite. Right now it gets tied together by tiny hackish Flask app that generates some tv dashboard's and will probably start doing the monitoring/alerting for the traffic prediction/confidence bands (experimenting with just doing it with an irc bot). -- Florian Hines | @pandemicsyn http://about.me/pandemicsyn On Wednesday, February 22, 2012 at 2:50 AM, Jasper Capel wrote: I've uploaded the checks we use in production here at Spil Games to https://github.com/spilgames/swift. Besides check_swift (which is a functional test) everything's meant to gather statistics from the cluster and we're looking to replace that with a Graphite-based solution to avoid having to parse access logs and having more real-time metrics available. Nothing fancy, but it may be of use to someone. Jasper On Feb 21, 2012, at 11:54 PM, Tim Bell wrote: This does bring up a more generic problem of sharing the availability/performance code for all of the OpenStack components. At the design summit, this was proposed as one of the example use cases of the OpenStack community forge (I forget the exact name) but it was intended as a place for sharing code/procedures which were not intended to be part of the core but may be of interest to others. Was anything set up along these lines ? A set of production quality Nagios/Ganglia sensors would be very interesting if someone has these Tim -Original Message- From: openstack-bounces+tim.bell=cern...@lists.launchpad.net (mailto:cern...@lists.launchpad.net) [mailto:openstack-bounces+tim.bell=cern...@lists.launchpad.net (mailto:cern...@lists.launchpad.net)] On Behalf Of Jasper Capel Sent: 21 February 2012 18:29 To: John Dickinson Cc: openstack@lists.launchpad.net (mailto:openstack@lists.launchpad.net) Subject: Re: [Openstack] swprobe: swift middleware for sending metrics to graphite using statsd Hi John, Apparently my google-fu is not up to snuff, as I wasn't aware of that project. Had I been, I probably would've just extemded that one. :) Cheers, Jasper From: John Dickinson [m...@not.mn (mailto:m...@not.mn)] Sent: Tuesday, February 21, 2012 5:44 PM To: Jasper Capel Cc: openstack@lists.launchpad.net (mailto:openstack@lists.launchpad.net) Subject: Re: [Openstack] swprobe: swift middleware for sending metrics to graphite using statsd That's great. Have you by any chance seen https://github.com/pandemicsyn/swift-informant? It's something similar that we've been playing with at Rackspace. --John On Feb 21, 2012, at 10:36 AM, Jasper Capel wrote: Hi all, I'm announcing a piece of Swift middleware, swprobe [1], designed to gather run-time metrics and ship them off to Graphite [2] for near real-time monitoring. Currently it sends out bytes up- and downloaded per account, http methods and response codes and timings in miliseconds on each call. To be able to use this you need Graphite [2]. You also need statsd running, preferably on the local machine since there potentially many small UDP packets are being sent out. Please also note that we have not yet tested this with production workloads. [1] - https://github.com/spilgames/swprobe [2] - http://graphite.wikidot.com/ [3] - https://github.com/etsy/statsd Best regards, -- Jasper Capel Lead Infrastructure Engineer W http://www.spilgames.com | S jwcapel-spil ___ Mailing list: https://launchpad.net/~openstack
Re: [Openstack] availability/performance sensors/probes
On 02/22/2012 11:45 AM, Florian Hines wrote: If John Dickinson can steal me a 30 minute block at the conference I'll probably be giving a talk about it, but we (Rackspace) started switching to Graphite back in December. We're basically just following the etsy cookbook to graph all the things!. Well, if John doesn't, I will :) But I bet John will. -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] availability/performance sensors/probes
This does bring up a more generic problem of sharing the availability/performance code for all of the OpenStack components. At the design summit, this was proposed as one of the example use cases of the OpenStack community forge (I forget the exact name) but it was intended as a place for sharing code/procedures which were not intended to be part of the core but may be of interest to others. Was anything set up along these lines ? A set of production quality Nagios/Ganglia sensors would be very interesting if someone has these Tim -Original Message- From: openstack-bounces+tim.bell=cern...@lists.launchpad.net [mailto:openstack-bounces+tim.bell=cern...@lists.launchpad.net] On Behalf Of Jasper Capel Sent: 21 February 2012 18:29 To: John Dickinson Cc: openstack@lists.launchpad.net Subject: Re: [Openstack] swprobe: swift middleware for sending metrics to graphite using statsd Hi John, Apparently my google-fu is not up to snuff, as I wasn't aware of that project. Had I been, I probably would've just extemded that one. :) Cheers, Jasper From: John Dickinson [m...@not.mn] Sent: Tuesday, February 21, 2012 5:44 PM To: Jasper Capel Cc: openstack@lists.launchpad.net Subject: Re: [Openstack] swprobe: swift middleware for sending metrics to graphite using statsd That's great. Have you by any chance seen https://github.com/pandemicsyn/swift-informant? It's something similar that we've been playing with at Rackspace. --John On Feb 21, 2012, at 10:36 AM, Jasper Capel wrote: Hi all, I'm announcing a piece of Swift middleware, swprobe [1], designed to gather run-time metrics and ship them off to Graphite [2] for near real-time monitoring. Currently it sends out bytes up- and downloaded per account, http methods and response codes and timings in miliseconds on each call. To be able to use this you need Graphite [2]. You also need statsd running, preferably on the local machine since there potentially many small UDP packets are being sent out. Please also note that we have not yet tested this with production workloads. [1] - https://github.com/spilgames/swprobe [2] - http://graphite.wikidot.com/ [3] - https://github.com/etsy/statsd Best regards, -- Jasper Capel Lead Infrastructure Engineer W http://www.spilgames.com | S jwcapel-spil ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp smime.p7s Description: S/MIME cryptographic signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp