I'm a little confused as to which DB server we are talking about. I need access to
enwiki-p.db.toolserver.org hap-s1-user.esi.toolserver.org. is that sql-s1-user or sql-s1-rr or what? Daniel On Wed, Aug 8, 2012 at 7:46 AM, Russell Blau <[email protected]> wrote: > (TL;DR? Skip down three paragraphs to the possible workaround....) Last > month, I reported on the progress of SHA-1 updates from the WMF servers, > and noted that s1 replag was likely to continue to be a problem for a > number of weeks. As I said then, the WMF was using (at least) three > processes to populate the SHA-1 field on three separate blocks of > revision records. All these changes then were being replicated to the > Toolserver's copies of the databases, and this flood of updates was > causing the replag. > > The three blocks were being populated at different rates (for reasons > that are beyond my knowledge). On July 23 at about 15:00 UTC, rosemary > (sql-s1-rr) completed updating the first of the three blocks. The other > blocks continued to be populated (and at some point the WMF started > another process to help finish off the slowest block), but the rate of > updates was somewhat less, and rosemary actually caught up on its > backlog and reached zero replag within about a day after this milestone. > > The situation on thyme (sql-s1-user) is less favorable, as we all know. > The replag on that server got much higher to start with, and thyme > didn't even reach the end of the first block until Sunday August 5 at > about 12:00 UTC. Unlike the situation with rosemary, the reduced load > after this event did not make any noticeable difference to the replag, > which has continued to increase for the past three days at much the same > rate as before. The next milestone will be completion of the second > major block, which looks like it will occur either late on Friday August > 9 or early on Saturday August 10 UTC, barring any other major problems > (like the WMF server outage on Monday which caused replication at the TS > end to stop for several hours). At that point, the load from SHA-1 > updates should be roughly about 30% of what it had been during July. One > would think that would allow the replag to drop, but since the events of > this week, I can't be confident of that. > > There is a possible workaround. The TS could treat this like a server > outage; copy user databases from thyme to rosemary and then point > sql-s1-user to rosemary, which currently has no replag. Rosemary would > then have to handle twice the load, but thyme should start to recover > very quickly with no user-generated queries hitting it. Once thyme has > recovered, point sql-s1-rr to it. > > Downsides: (1) this would require several hours of downtime for > sql-s1-user while the user databases are copied; all tools that require > access to user databases would be offline entirely for this period. (2) > it would have to wait until our volunteer TS admins have time to do it. > (3) the added load on rosemary could cause replag to grow there, > although I doubt it would come anywhere near the 14+ days replag we are > dealing with now on thyme. (4) this could all be unnecessary since thyme > might recover on its own once the SHA-1 update load is reduced, although > I don't know any way of forecasting that and experience so far has not > been encouraging. > > Question for those of you who operate and/or use tools that access s1 > (enwiki): would you be willing to accept several hours of service > outage and the other downsides in exchange for getting rid of the 14-day > replag? > -- > Russell Blau > [email protected] > > > _______________________________________________ > Toolserver-l mailing list ([email protected]) > https://lists.wikimedia.org/mailman/listinfo/toolserver-l > Posting guidelines for this list: > https://wiki.toolserver.org/view/Mailing_list_etiquette _______________________________________________ Toolserver-l mailing list ([email protected]) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
