Today, for those of us paying attention from the outside, the downtime became impossible to monitor. One problem is that the TTL on the DNS server and maintenance records themselves is set much too short!
Maybe you never thought you'd be inaccessible for an hour? Any chance master records could be moved to an anycast DNS provider? The SOA is pretty reasonable: wikimedia.org. 86400 IN SOA ns0.wikimedia.org. hostmaster.wikimedia.org. 2011052410 43200 7200 1209600 600 But the actual records are all 1 hour, so they disappeared during the downtime. And ganglia, while it looks OK: ns0.wikimedia.org. 3600 IN A 208.80.152.130 ns1.wikimedia.org. 3600 IN A 208.80.152.142 ns2.wikimedia.org. 3600 IN A 91.198.174.4 secure.wikimedia.org. 3600 IN A 208.80.152.134 ganglia.wikimedia.org. 3600 IN CNAME spence.wikimedia.org. spence.wikimedia.org. 3600 IN A 208.80.152.161 Had completely timed out in both my local cache and the Google servers while I was looking at it, and wasn't able to contact a NS anywhere to refresh. Here's the last time I saw it: ganglia.wikimedia.org. 473 IN CNAME spence.wikimedia.org. spence.wikimedia.org. 473 IN A 208.80.152.161 ;; Query time: 2 msec ;; SERVER: 10.0.1.1#53(10.0.1.1) ;; WHEN: Tue May 24 10:06:13 2011 ganglia.wikimedia.org. 287 IN CNAME spence.wikimedia.org. spence.wikimedia.org. 287 IN A 208.80.152.161 ;; Query time: 68 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Tue May 24 10:09:19 2011 _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
