On Fri, Nov 18, 2011 at 3:41 PM, Ben Hartshorne
<[email protected]>wrote:

> Answering a few questions in one place.
>
> On Fri, Nov 18, 2011 at 10:29 AM, Brion Vibber <[email protected]>
> wrote:
> >
> > Hmm... what I'd expect is that if one ES save target database is in
> > read-only, the system should cycle through to the next available one that
> > is working -- the save should then succeed transparently.
> >
> > Do we not have that sort of write failover logic, or are *all* ES
> clusters
> > getting locked somehow?
>
> The last step of the maintenance was to switch the master for article
> writes from ms3 to es3.  In order to make sure no data is lost during
> the transition, I marked the master read-only for the duration of the
> switch.  Given that there is only one ES target database to which
> writes are sent (currently es3), there is nowhere to which to
> failover.  (All slaves run read-only all the time.)
>

*nod* logical enough. For the future I'd recommend planning a temporary
'holding zone' cluster that would be used only during the changeover -- it
would remain read-write while the main ones are being copied.

Then after switching writes to the new targets, the holding zone can go
read-only while it gets copied over to the new target, which should go
relatively fast.

This would be just another part of the ES system rather than a separate
cache, so should remain reasonably robust: if something goes awry with the
main copy to the new clusters, you can safely stop: the holding zone will
just sits with the old servers and can just keep running like the other ES
clusters, unlike some sort of cache which might lose data.


-- brion
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to