[Bug 51497] Setup monitoring for Beta cluster

2014-10-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

Antoine "hashar" Musso (WMF)  changed:

   What|Removed |Added

 Depends on||67333

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2014-10-07 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497
Bug 51497 depends on bug 70141, which changed state.

Bug 70141 Summary: Determine first pass list of icinga-alerting data from 
graphite.wmflabs
https://bugzilla.wikimedia.org/show_bug.cgi?id=70141

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2014-09-16 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

--- Comment #8 from Antoine "hashar" Musso  ---
Thank you Yuvi for the monitoring!  Do we have a way to tweak the body of email
notifications?  I find them hard to read :-D

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2014-09-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

Antoine "hashar" Musso  changed:

   What|Removed |Added

 Depends on||70862

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2014-09-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

Antoine "hashar" Musso  changed:

   What|Removed |Added

   See Also|https://bugzilla.wikimedia. |
   |org/show_bug.cgi?id=70695   |

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2014-09-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

Ċ½eljko Filipin  changed:

   What|Removed |Added

   See Also||https://bugzilla.wikimedia.
   ||org/show_bug.cgi?id=70695

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2014-09-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

--- Comment #7 from Yuvi Panda  ---
There's now alerts for the following things for betalabs:

- Low space on /var
- Low space on /
- Puppet staleness (warn at 1h, crit at 12h)
- Puppet failure events

Note that puppet failure events is different from puppet failing - failure
events means puppet did run, but some events failed. There's no detection for
puppet itself failing completely.

You can see those at
https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=labmon

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2014-09-05 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

Greg Grossmeier  changed:

   What|Removed |Added

   Assignee|wikibugs-l@lists.wikimedia. |yuvipa...@gmail.com
   |org |

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2014-09-04 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

Greg Grossmeier  changed:

   What|Removed |Added

 Depends on||70141

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2014-07-22 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497
Bug 51497 depends on bug 52357, which changed state.

Bug 52357 Summary: Set up graphite monitoring for the beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=52357

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2014-07-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497
Bug 51497 depends on bug 52357, which changed state.

Bug 52357 Summary: Set up graphite monitoring for the beta cluster
https://bugzilla.wikimedia.org/show_bug.cgi?id=52357

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2014-07-09 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

Tim Landscheidt  changed:

   What|Removed |Added

 CC||yuvipa...@gmail.com

--- Comment #6 from Tim Landscheidt  ---
I chatted yesterday with Yuvi a bit about monitoring and its challenges, and he
reminded me that the main problem with applying the prod setup to Labs is that
roots can fake Puppet facts by altering facter and thus control to some degree
the exported resources (which in themselves are harmless as their template is
reviewed by ops in operations/puppet).  So the monitoring in Labs would require
all monitoring resources to be audited with the assumption that all host data
is hostile.  Still, I don't like to let go of a working configuration that is
tested every day :-).

So two things that crossed my mind this morning:

a) For root at Tools, I had to sign a contract where WMF promises to sue my ass
off if I should do something funny.  If we could limit the collection of
monitoring resources to hosts in Labs projects with roots that are legally
bound in a similar way (Tools, Beta, projects by WMF employees, etc.), we could
assume that no hostile data is injected.  That would solve the problem for the
Beta cluster (and Tools ...), but not for all hosts Labs.

b) What is the worst thing that a bright hacker could achieve by being root on
a Labs project, carefully faking facts and bringing Labs's Icinga or Ganglia
under their control if the latter are hosts in a Labs project themselves? 
Nothing.  He would have started as root in a Labs project and ended as one as
well.  All the data in Icinga and Ganglia is public.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2014-07-01 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

scott.l...@gmail.com changed:

   What|Removed |Added

 CC||scott.l...@gmail.com

--- Comment #5 from scott.l...@gmail.com ---
If this is still an issue can I work on it? If so, please provide any
additional details I can to get started.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2014-04-02 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

Antoine "hashar" Musso  changed:

   What|Removed |Added

 Blocks|49459   |

--- Comment #4 from Antoine "hashar" Musso  ---
Does not block Bug 49459 - continuous integration monitoring (tracking)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2013-11-05 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

--- Comment #3 from Antoine "hashar" Musso  ---
The fatal/exception.. counts are now reported on the labs Ganglia instance on
the deployment-fluoride.pmtpa.wmflabs node:

http://ganglia.wmflabs.org/latest/?r=hour&cs=&ce=&c=deployment-prep&h=deployment-fluoride&tab=m&vn=&mc=2&z=medium&metric_group=NOGROUPS_%7C_mediawiki

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2013-10-22 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

Antoine "hashar" Musso  changed:

   What|Removed |Added

 CC||benap...@gmail.com,
   ||fai...@wikimedia.org,
   ||mhershber...@wikimedia.org,
   ||platoni...@gmail.com
  Component|Continuous integration  |deployment-prep (beta)
Product|Wikimedia   |Wikimedia Labs

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2013-08-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

--- Comment #2 from Antoine "hashar" Musso  ---
The way it is done in puppet is by collecting resources which is disabled on
labs for security reasons.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2013-08-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

Antoine "hashar" Musso  changed:

   What|Removed |Added

 Depends on||52867

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2013-07-31 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

Greg Grossmeier  changed:

   What|Removed |Added

 Depends on||52357

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2013-07-30 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

--- Comment #1 from Antoine "hashar" Musso  ---
A breakdown of the useful monitoring systems:

Icinga
==

The puppet manifests already define Icinga checks for a lot of service, that is
done via the global define monitor_service.  As an example, Varnish instances
are blessed with:

monitor_service { "varnish http ${title}":
description => "Varnish HTTP ${title}",
check_command => "check_http_generic!varnishcheck!${port}"

}


Which adds the monitoring on icinga.wikimedia.org.

We could get ops involved in setting up the labs instance for beta and do the
configuration hack that would prevent paging but drop emails|messages instead.


Ganglia
===

All labs instances are automatically added in a Ganglia instance:

http://ganglia.wmflabs.org/latest/?r=hour&s=by+name&c=deployment-prep&tab=m

That seems to cover our needs.

Graphite


That would be very nice to have, specially the profiling bits.  That project
does not have any documentation beside the puppet manifests though.  Probably
lower priority compared to Icinga.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 51497] Setup monitoring for Beta cluster

2013-07-30 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=51497

Sumana Harihareswara  changed:

   What|Removed |Added

Summary|Setup monitoring for|Setup monitoring for Beta
   |BetaLabs|cluster

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l