#31159: Monitor anti-censorship www services with prometheus -------------------------------------------------+------------------------- Reporter: phw | Owner: hiro Type: task | Status: | assigned Priority: Medium | Milestone: Component: Internal Services/Tor Sysadmin Team | Version: Severity: Normal | Resolution: Keywords: | Actual Points: Parent ID: #30152 | Points: 1 Reviewer: | Sponsor: -------------------------------------------------+------------------------- Changes (by anarcat):
* owner: tpa => hiro * status: new => assigned Old description: > In the anti-censorship team we currently monitor > [https://trac.torproject.org/projects/tor/wiki/org/teams/AntiCensorshipTeam/InfrastructureMonitoring > several services] with sysmon. We recently discovered that sysmon > doesn't seem to follow HTTP 301 redirects. This means that if a web > service dies but the 301 redirect still works (e.g., BridgeDB is dead but > its apache reverse proxy still works), sysmon won't notice. > > Now that prometheus is running, we should fill this monitoring gap by > testing the following web sites: > > * https://bridges.torproject.org > * https://snowflake.torproject.org > * https://gettor.torproject.org > > Our test should ensure that these sites serve the content we expect, > e.g., make sure that bridges.tp.o contains the string "BridgeDB" in its > HTML. Testing the HTTP status code does not suffice: if BridgeDB is down, > the reverse proxy may still respond. > > I wonder if prometheus could also help us with #12802 by sending an email > to bridges@tp.o and making sure that it responds with at least one > bridge? New description: In the anti-censorship team we currently monitor [https://trac.torproject.org/projects/tor/wiki/org/teams/AntiCensorshipTeam/InfrastructureMonitoring several services] with sysmon. We recently discovered that sysmon doesn't seem to follow HTTP 301 redirects. This means that if a web service dies but the 301 redirect still works (e.g., BridgeDB is dead but its apache reverse proxy still works), sysmon won't notice. Now that prometheus is running, we should fill this monitoring gap by testing the following web sites: * https://bridges.torproject.org * https://snowflake.torproject.org * https://gettor.torproject.org Our test should ensure that these sites serve the content we expect, e.g., make sure that bridges.tp.o contains the string "BridgeDB" in its HTML. Testing the HTTP status code does not suffice: if BridgeDB is down, the reverse proxy may still respond. I wonder if prometheus could also help us with #12802 by sending an email to bridges@tp.o and making sure that it responds with at least one bridge? Checklist: 1. [ ] monitor services in Nagios: BridgeDB, Snowflake, and GetTor 2. [ ] deploy Prometheus's "​blackbox exporter" for default bridges, which are external services 3. [ ] delegate to (and train) the anti-censorship team the blackbox exporter configuration 3. [ ] experiment with Prometheus's "alertmanager", which can send notifications if a monitoring target goes offline 4. [ ] grant the anti-censorship team access to Prometheus's grafana dashboard. -- Comment: awesome summary, thanks. i turned that into a checklist and assigned the ticket to hiro who, I think, will handle followup on this. hiro, let me know if you need help or if any of this is incorrect... -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/31159#comment:5> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs