Re: [QA] integration.wikimedia.org: Monitoring for contint slaves

The metrics it displayed weren't ideal compared to those reported by e.g. 
Ganglia.
I've updated it with better metrics and calculated values.


https://integration.wikimedia.org/monitoring/

... and then I decided we needed this for other labs projects (like tool-labs, 
beta deployment-prep etc.), so made it into a generic tool:

https://tools.wmflabs.org/nagf/
https://tools.wmflabs.org/nagf/?project=deployment-prep
https://tools.wmflabs.org/nagf/?project=integration

Source code:
https://github.com/wikimedia/nagf

(and the old one, will probably be removed soon in favour of a redirect)
https://github.com/wikimedia/integration-docroot/tree/master/org/wikimedia/integration/monitoring

Yay for solo half-day sprints!

— Krinkle

On 7 Oct 2014, at 13:43, Krinkle <[email protected]> wrote:

> Hey all,
> 
> As a temporary solution lacking ganglia or equivalent, I've put up some 
> graphs that allow us to monitor the Jenkins slaves in labs for trends in CPU, 
> Memory and Disk space.
> 
> https://integration.wikimedia.org/monitoring/
> 
> Also, thanks to Yuvi, there's alerts set up via production Icinga:
> 
> https://github.com/wikimedia/operations-puppet/blob/df0d3298/modules/contint/manifests/monitoring.pp
> 
> https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=labmon1001&nostatusheader
> 
> — Krinkle
>

_______________________________________________
QA mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/qa

Re: [QA] integration.wikimedia.org: Monitoring for contint slaves

Reply via email to