[Bug 36993] New: Labs cluster dies daily at roughly 6:30 UTC

bugzilla-daemon Mon, 21 May 2012 01:13:04 -0700

https://bugzilla.wikimedia.org/show_bug.cgi?id=36993


       Web browser: ---
             Bug #: 36993
           Summary: Labs cluster dies daily at roughly 6:30 UTC
           Product: Wikimedia Labs
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: General
        AssignedTo: [email protected]
        ReportedBy: [email protected]
                CC: [email protected], [email protected]
    Classification: Unclassified
   Mobile Platform: ---


Everyday, all instances hosted on WMFLabs are made barely accessible from
roughly 6:30am for about an hour. The symptoms are:
* very high load reported in ganglia for most instances
* ssh client reaching timeout
* `ls -l` being 
This is known to be related to I/O and how GlusterFS seems to be lacking in
that area.

Regardless of GlusterFS, Ubuntu has a default daily cron set up at 6:25 UTC.
Which also means that all instances start rotating or processing their logs at
the same exact time.


There must be a cronjob on some of the instances that uses too much I/O. We
would need some metrics in Ganglia about disk usage to find it out.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 36993] New: Labs cluster dies daily at roughly 6:30 UTC

Reply via email to