Imarlier added a comment. |
Just throwing a couple of notes in here:
Q45825741 and Q45825750 were both updated at very close to the same time (2017-12-16 14:54:00-15:02:00); both had a number of updates that happened within a very tight window (<1 minute). Both also have log entries indicating that they experienced failures getting a lock:
2017-12-16T14:55:09 mw1283 INFO Expectation (readQueryTime <= 5) by MediaWiki::restInPeace not met (actual: 15.000407934189): query-m: SELECT GET_LOCK('X', N) AS lockstatus [TRX#43742a] #0 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/TransactionProfiler.php(223): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated() #1 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(1049): Wikimedia\Rdbms\TransactionProfiler->recordQueryCompletion() #2 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(948): Wikimedia\Rdbms\Database->doProfiledQuery() #3 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/DatabaseMysqlBase.php(1039): Wikimedia\Rdbms\Database->query() #4 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(3559): Wikimedia\Rdbms\DatabaseMysqlBase->lock()
2017-12-16T15:00:38 mw1235 INFO Expectation (readQueryTime <= 5) by MediaWiki::restInPeace not met (actual: 15.000345230103): query-m: SELECT GET_LOCK('X', N) AS lockstatus [TRX#aa6d60] #0 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/TransactionProfiler.php(223): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated() #1 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(1049): Wikimedia\Rdbms\TransactionProfiler->recordQueryCompletion() #2 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(948): Wikimedia\Rdbms\Database->doProfiledQuery() #3 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/DatabaseMysqlBase.php(1039): Wikimedia\Rdbms\Database->query() #4 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(3559): Wikimedia\Rdbms\DatabaseMysqlBase->lock() #5 /srv/mediawiki/php-1.31.0-wmf.12/includes/deferred/LinksUpdate.php(203): Wikimedia\Rdbms\Database->getScopedLockAndFlush() #6 /srv/mediawiki/php-1.31.0-wmf.12/includes/deferred/LinksUpdate.php(170): LinksUpdate::acquirePageLock() #7 /srv/mediawiki/php-1.31.0-wmf.12/includes/deferred/DeferredUpdates.php(259): LinksUpdate->doUpdate() #8 /srv/mediawiki/php-1.31.0-wmf.12/includes/deferred/DeferredUpdates.php(210): DeferredUpdates::runUpdate() #9 /srv/mediawiki/php-1.31.0-wmf.12/includes/deferred/DeferredUpdates.php(131): DeferredUpdates::execute() #10 /srv/mediawiki/php-1.31.0-wmf.12/includes/MediaWiki.php(897): DeferredUpdates::doUpdates() #11 /srv/mediawiki/php-1.31.0-wmf.12/includes/MediaWiki.php(719): MediaWiki->restInPeace() #12 (): Closure$MediaWiki::doPostOutputShutdown() #13 {main}Both of those records also have one or more log entries showing that db1100 was significantly lagged at this time:
2017-12-16T14:54:54 mw1283 WARNING Wikimedia\Rdbms\LoadBalancer::doWait: Timed out waiting on db1100 pos db1070-bin.001655/8181304072017-12-16T15:00:44 mw1314 WARNING Wikimedia\Rdbms\LoadBalancer::doWait: Timed out waiting on db1100 pos db1070-bin.001655/821972877This doesn't necessarily mean anything, but it does immediately bring to mind the possibility of a bad interaction between two simultaneous transactions -- a process that gets the update lock, then tries to read from a slave that's lagged, times out waiting on that, and ultimately bails out without releasing the lock?
This code is new to me, but that seems like an angle that's worth looking at.
Q45825730 and Q45825742 were both created around the same time. However, neither of them shows any indication of similar database timeouts or lock failures. Here's 42:
Time host level message 2017-12-16T14:55:07 mw1286 INFO AbuseFilter::filterAction: cache miss for 'Q45825742' (key wikidatawiki:abusefilter:check-stash:default:19082eae700e0352ac30604486024ca3:v1). 2017-12-16T14:54:03 wdqs1003 INFO Got 5 changes, from Q45825742@609888642@20171216145333|645260053 to Q17580698@609888669@20171216145400|645260055 2017-12-16T14:53:33 mw1316 INFO AbuseFilter::filterAction: cache miss for 'Q45825742' (key wikidatawiki:abusefilter:check-stash:default:75754391037fe534c51178ac2603dc26:v1). 2017-12-16T14:53:23 wdqs1005 INFO Got 2 changes, from Q2492695@609888609@20171216145249|645260020 to Q45825742@609888612@20171216145251|645260022 2017-12-16T14:53:23 wdqs2002 INFO Got 2 changes, from Q2492695@609888609@20171216145249|645260020 to Q45825742@609888612@20171216145251|645260022 2017-12-16T14:53:22 wdqs1004 INFO Got 2 changes, from Q2492695@609888609@20171216145249|645260020 to Q45825742@609888612@20171216145251|645260022 2017-12-16T14:53:22 wdqs2001 INFO Got 2 changes, from Q2492695@609888609@20171216145249|645260020 to Q45825742@609888612@20171216145251|645260022 2017-12-16T14:53:22 wdqs2003 INFO Got 2 changes, from Q2492695@609888609@20171216145249|645260020 to Q45825742@609888612@20171216145251|645260022 2017-12-16T14:53:20 wdqs1003 INFO Got 2 changes, from Q2492695@609888609@20171216145249|645260020 to Q45825742@609888612@20171216145251|645260022 2017-12-16T14:53:20 mw1280 INFO AbuseFilter::filterAction: cache miss for 'Q45825742' (key wikidatawiki:abusefilter:check-stash:default:09f6595b2b674089e12c1c01698946af:v1). 2017-12-16T14:53:08 mw1232 INFO AbuseFilter::filterAction: cache miss for 'Q45825742' (key wikidatawiki:abusefilter:check-stash:default:f6f3586b5afd827fb2bfcdf305d4f41b:v1). 2017-12-16T14:52:51 mw1290 INFO AbuseFilter::filterAction: cache miss for 'Q45825742' (key wikidatawiki:abusefilter:check-stash:default:43b4667869d07ce4d7e32e416f51b9b3:v1). 1–12 of 12And here's 30:
Time host level message 2017-12-16T15:03:41 wdqs2001 INFO Got 100 changes, from Q40739103@609888095@20171216144339|645259546 to Q45825730@609888298@20171216144625|645259730 2017-12-16T15:03:41 wdqs1005 INFO Got 100 changes, from Q40739103@609888095@20171216144339|645259546 to Q45825730@609888298@20171216144625|645259730 2017-12-16T14:46:25 mw1317 INFO AbuseFilter::filterAction: cache miss for 'Q45825730' (key wikidatawiki:abusefilter:check-stash:default:e059a2c2b6c96934a8967978dcecdaa0:v1). 1–3 of 3Also noticing that there were exceptions showing up in Logstash around this time:
This approx. 10 minute period of time accounts for nearly 30% of all exceptions reported to logstash in the last week.
No smoking guns, but I think this points us in the right direction. I'm going to look into it more this afternoon.
Cc: Legoktm, Imarlier, aaron, Aklapper, EBernhardson, dcausse, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Vali.matei, Jdrewniak, Wikidata-bugs, aude, Mbch331, Jay8g, Krenair
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs