[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 Antoine "hashar" Musso (WMF) changed: What|Removed |Added Depends on||67333 -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 Bug 51497 depends on bug 70141, which changed state. Bug 70141 Summary: Determine first pass list of icinga-alerting data from graphite.wmflabs https://bugzilla.wikimedia.org/show_bug.cgi?id=70141 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 --- Comment #8 from Antoine "hashar" Musso --- Thank you Yuvi for the monitoring! Do we have a way to tweak the body of email notifications? I find them hard to read :-D -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 Antoine "hashar" Musso changed: What|Removed |Added Depends on||70862 -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 Antoine "hashar" Musso changed: What|Removed |Added See Also|https://bugzilla.wikimedia. | |org/show_bug.cgi?id=70695 | -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 Ċ½eljko Filipin changed: What|Removed |Added See Also||https://bugzilla.wikimedia. ||org/show_bug.cgi?id=70695 -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 --- Comment #7 from Yuvi Panda --- There's now alerts for the following things for betalabs: - Low space on /var - Low space on / - Puppet staleness (warn at 1h, crit at 12h) - Puppet failure events Note that puppet failure events is different from puppet failing - failure events means puppet did run, but some events failed. There's no detection for puppet itself failing completely. You can see those at https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=labmon -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 Greg Grossmeier changed: What|Removed |Added Assignee|wikibugs-l@lists.wikimedia. |yuvipa...@gmail.com |org | -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 Greg Grossmeier changed: What|Removed |Added Depends on||70141 -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 Bug 51497 depends on bug 52357, which changed state. Bug 52357 Summary: Set up graphite monitoring for the beta cluster https://bugzilla.wikimedia.org/show_bug.cgi?id=52357 What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |--- -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 Bug 51497 depends on bug 52357, which changed state. Bug 52357 Summary: Set up graphite monitoring for the beta cluster https://bugzilla.wikimedia.org/show_bug.cgi?id=52357 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 Tim Landscheidt changed: What|Removed |Added CC||yuvipa...@gmail.com --- Comment #6 from Tim Landscheidt --- I chatted yesterday with Yuvi a bit about monitoring and its challenges, and he reminded me that the main problem with applying the prod setup to Labs is that roots can fake Puppet facts by altering facter and thus control to some degree the exported resources (which in themselves are harmless as their template is reviewed by ops in operations/puppet). So the monitoring in Labs would require all monitoring resources to be audited with the assumption that all host data is hostile. Still, I don't like to let go of a working configuration that is tested every day :-). So two things that crossed my mind this morning: a) For root at Tools, I had to sign a contract where WMF promises to sue my ass off if I should do something funny. If we could limit the collection of monitoring resources to hosts in Labs projects with roots that are legally bound in a similar way (Tools, Beta, projects by WMF employees, etc.), we could assume that no hostile data is injected. That would solve the problem for the Beta cluster (and Tools ...), but not for all hosts Labs. b) What is the worst thing that a bright hacker could achieve by being root on a Labs project, carefully faking facts and bringing Labs's Icinga or Ganglia under their control if the latter are hosts in a Labs project themselves? Nothing. He would have started as root in a Labs project and ended as one as well. All the data in Icinga and Ganglia is public. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 scott.l...@gmail.com changed: What|Removed |Added CC||scott.l...@gmail.com --- Comment #5 from scott.l...@gmail.com --- If this is still an issue can I work on it? If so, please provide any additional details I can to get started. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 Antoine "hashar" Musso changed: What|Removed |Added Blocks|49459 | --- Comment #4 from Antoine "hashar" Musso --- Does not block Bug 49459 - continuous integration monitoring (tracking) -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 --- Comment #3 from Antoine "hashar" Musso --- The fatal/exception.. counts are now reported on the labs Ganglia instance on the deployment-fluoride.pmtpa.wmflabs node: http://ganglia.wmflabs.org/latest/?r=hour&cs=&ce=&c=deployment-prep&h=deployment-fluoride&tab=m&vn=&mc=2&z=medium&metric_group=NOGROUPS_%7C_mediawiki -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 Antoine "hashar" Musso changed: What|Removed |Added CC||benap...@gmail.com, ||fai...@wikimedia.org, ||mhershber...@wikimedia.org, ||platoni...@gmail.com Component|Continuous integration |deployment-prep (beta) Product|Wikimedia |Wikimedia Labs -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 --- Comment #2 from Antoine "hashar" Musso --- The way it is done in puppet is by collecting resources which is disabled on labs for security reasons. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 Antoine "hashar" Musso changed: What|Removed |Added Depends on||52867 -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 Greg Grossmeier changed: What|Removed |Added Depends on||52357 -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 --- Comment #1 from Antoine "hashar" Musso --- A breakdown of the useful monitoring systems: Icinga == The puppet manifests already define Icinga checks for a lot of service, that is done via the global define monitor_service. As an example, Varnish instances are blessed with: monitor_service { "varnish http ${title}": description => "Varnish HTTP ${title}", check_command => "check_http_generic!varnishcheck!${port}" } Which adds the monitoring on icinga.wikimedia.org. We could get ops involved in setting up the labs instance for beta and do the configuration hack that would prevent paging but drop emails|messages instead. Ganglia === All labs instances are automatically added in a Ganglia instance: http://ganglia.wmflabs.org/latest/?r=hour&s=by+name&c=deployment-prep&tab=m That seems to cover our needs. Graphite That would be very nice to have, specially the profiling bits. That project does not have any documentation beside the puppet manifests though. Probably lower priority compared to Icinga. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 51497] Setup monitoring for Beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 Sumana Harihareswara changed: What|Removed |Added Summary|Setup monitoring for|Setup monitoring for Beta |BetaLabs|cluster -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l