On Thu, Mar 31, 2016 at 12:39 AM, Tim Starling <[email protected]> wrote:
> I think it's stretching the metaphor to call ops a "tight ship". We > could switch off spare servers in codfw for a substantial power > saving, in exchange for a ~10 minute penalty in failover time. But it > would probably cost a week or two of engineer time to set up suitable > automation for failover and periodic updates. > Just a small clarification: I don't think turning off and on periodically servers would be a feasible option because servers (and computers in general) tend to have a pretty high failure rate when being powered off and on regularly. We see this with some server failing every time we do a mass reboot due to some security issue. On the other hand, we could surely do better in terms of idle-server power consumption. In terms of costs and time spent (and probably also natural resources consumption, but I did no calculation whatsoever) it would probably be not sustainable. > Or we could have avoided a hot spare colo altogether, with smarter > disaster recovery plans, as I argued at the time. Another small clarification: our codfw datacenter is _not_ just a hot spare for disaster recovery and a lot of work has been done to make the two facilities mostly active-active (and a lot more will be done in the coming year). Cheers, Giuseppe P.S. The server energy footprint of the WMF is negligible if compared to the big internet players, but even a small-medium size local ISP has probably a larger footprint than us. This doesn't mean we should not try to get better, but we should always put things in prespective. -- Giuseppe Lavagetto Senior Technical Operations Engineer, Wikimedia Foundation _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
