https://bugzilla.wikimedia.org/show_bug.cgi?id=48668
Web browser: ---
Bug ID: 48668
Summary: Set up Icinga monitoring for grid
Product: Wikimedia Labs
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: tools
Assignee: [email protected]
Reporter: [email protected]
Classification: Unclassified
Mobile Platform: ---
Besides the Ganglia statistics, the grid's status should be properly monitored
and alarms set up. From the top of my head and without data to back it up:
- Master alive and well (no threads in error state!),
- every execution daemon alive and well,
- count of jobs in error state doesn't exceed 5 % of all jobs running,
- count of jobs pending doesn't exceed 5 % of all jobs running.
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l