On Fri, Nov 18, 2011 at 3:41 PM, Ben Hartshorne <[email protected]>wrote:
> Answering a few questions in one place. > > On Fri, Nov 18, 2011 at 10:29 AM, Brion Vibber <[email protected]> > wrote: > > > > Hmm... what I'd expect is that if one ES save target database is in > > read-only, the system should cycle through to the next available one that > > is working -- the save should then succeed transparently. > > > > Do we not have that sort of write failover logic, or are *all* ES > clusters > > getting locked somehow? > > The last step of the maintenance was to switch the master for article > writes from ms3 to es3. In order to make sure no data is lost during > the transition, I marked the master read-only for the duration of the > switch. Given that there is only one ES target database to which > writes are sent (currently es3), there is nowhere to which to > failover. (All slaves run read-only all the time.) > *nod* logical enough. For the future I'd recommend planning a temporary 'holding zone' cluster that would be used only during the changeover -- it would remain read-write while the main ones are being copied. Then after switching writes to the new targets, the holding zone can go read-only while it gets copied over to the new target, which should go relatively fast. This would be just another part of the ES system rather than a separate cache, so should remain reasonably robust: if something goes awry with the main copy to the new clusters, you can safely stop: the holding zone will just sits with the old servers and can just keep running like the other ES clusters, unlike some sort of cache which might lose data. -- brion _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
