On Thu, Mar 31, 2016 at 12:39 AM, Tim Starling <[email protected]> wrote:

> I think it's stretching the metaphor to call ops a "tight ship". We
> could switch off spare servers in codfw for a substantial power
> saving, in exchange for a ~10 minute penalty in failover time. But it
> would probably cost a week or two of engineer time to set up suitable
> automation for failover and periodic updates.
>

Just a small clarification: I don't think turning off and on
periodically servers would be a feasible option because servers (and
computers in general) tend to have a pretty high failure rate when
being powered off and on regularly. We see this with some server
failing every time we do a mass reboot due to some security issue. On
the other hand, we could surely do better in terms of idle-server
power consumption. In terms of costs and time spent (and probably also
natural resources consumption, but I did no calculation whatsoever) it
would probably be not sustainable.


> Or we could have avoided a hot spare colo altogether, with smarter
> disaster recovery plans, as I argued at the time.

Another small clarification: our codfw datacenter is _not_ just a hot
spare for disaster recovery and a lot of work has been done to make
the two facilities mostly active-active (and a lot more will be done
in the coming year).

Cheers,

Giuseppe
P.S. The server energy footprint of the WMF is negligible if compared
to the big internet players, but even a small-medium size local ISP
has probably a larger footprint than us. This doesn't mean we should
not try to get better, but we should always put things in prespective.
-- 
Giuseppe Lavagetto
Senior Technical Operations Engineer, Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to