https://bugzilla.wikimedia.org/show_bug.cgi?id=50585

       Web browser: ---
            Bug ID: 50585
           Summary: Silence the qacct transfer jobs and monitor them with
                    Icinga instead
           Product: Wikimedia Labs
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: tools
          Assignee: [email protected]
          Reporter: [email protected]
    Classification: Unclassified
   Mobile Platform: ---

During the NFS outage, the qacct transfer jobs pestered the roots' mailboxes
every five minutes.  Though such an outage of course will never ever happen
again :-), it sucked nonetheless.

The transfer job is a service and if we would monitor it as one, we would get
better behaviour as well: A nice green or red icon on a web dashboard, and only
one (or none?) ping by mail when the status *changes*.

So we should set up Icinga monitoring for that:

a) The transfer job directs all stdout/stderr to a file, saves its exit code in
another and periodically these files are queried by Icinga.

b) The transfer job passes its output and exit code directly to an Icinga
sentinel that passes it somewhere up the chain.

Whether a) or b) are preferable (or possible for that matter), I haven't
figured out yet, but this bug will track the progress on that.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to