The metrics it displayed weren't ideal compared to those reported by e.g. Ganglia. I've updated it with better metrics and calculated values.
https://integration.wikimedia.org/monitoring/ ... and then I decided we needed this for other labs projects (like tool-labs, beta deployment-prep etc.), so made it into a generic tool: https://tools.wmflabs.org/nagf/ https://tools.wmflabs.org/nagf/?project=deployment-prep https://tools.wmflabs.org/nagf/?project=integration Source code: https://github.com/wikimedia/nagf (and the old one, will probably be removed soon in favour of a redirect) https://github.com/wikimedia/integration-docroot/tree/master/org/wikimedia/integration/monitoring Yay for solo half-day sprints! — Krinkle On 7 Oct 2014, at 13:43, Krinkle <[email protected]> wrote: > Hey all, > > As a temporary solution lacking ganglia or equivalent, I've put up some > graphs that allow us to monitor the Jenkins slaves in labs for trends in CPU, > Memory and Disk space. > > https://integration.wikimedia.org/monitoring/ > > Also, thanks to Yuvi, there's alerts set up via production Icinga: > > https://github.com/wikimedia/operations-puppet/blob/df0d3298/modules/contint/manifests/monitoring.pp > > https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=labmon1001&nostatusheader > > — Krinkle >
_______________________________________________ QA mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/qa
