https://bugzilla.wikimedia.org/show_bug.cgi?id=48668

       Web browser: ---
            Bug ID: 48668
           Summary: Set up Icinga monitoring for grid
           Product: Wikimedia Labs
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: tools
          Assignee: [email protected]
          Reporter: [email protected]
    Classification: Unclassified
   Mobile Platform: ---

Besides the Ganglia statistics, the grid's status should be properly monitored
and alarms set up.  From the top of my head and without data to back it up:

- Master alive and well (no threads in error state!),
- every execution daemon alive and well,
- count of jobs in error state doesn't exceed 5 % of all jobs running,
- count of jobs pending doesn't exceed 5 % of all jobs running.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to