Asher Feldman wrote: > I've temporarily commented out db36 from db.php on the cluster. > > This is a flaw in the how the client-side use of maxlag interacts with our > schema migration process - we run migrations on slaves one by one in an > automated fashion, only moving to the next after replication lag catches > up. Mediawiki takes care of not sending queries to the lagged slave that > is under migration. Meanwhile, maxlag always reports the value of the most > lagged slave. Not a new issue, but this particular alter table on enwiki > is likely the most time intensive ever run at wmf. It's slightly > ridiculous. > > For this one alter, I can stop the migration script and run each statement > by hand, pulling and re-adding db's one by one along the way, but this > isn't a sustainable process. Perhaps we can add a migration flag to > mediawiki, which if enabled, changes the behavior of maxlag and > wfWaitForSlaves() to ignore one highly lagged slave so long as others are > available without lag.
Thank you! My scripts are running fine again. :-) Some people have suggested that the current API behavior is intentional. That is, that having different servers return the same error code is better than having different servers return an error code or not. I think this is flawed logic due to the problems that it presents (scripts unable to get past the error code), but it's definitely something that needs investigation for the future. MZMcBride _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l