[Wikidata-bugs] [Maniphest] [Commented On] T183101: Items missing from Wikidata index due to LinksUpdate lock failures
jcrespo added a comment. I think there is a different core returned between killed (or other error)-although I think in this case there is an exception, timed out and acquired, but I may be wrong. Note that on that ticket, I mention waits for slave can start being killed from now on to avoid pileups.TASK DETAILhttps://phabricator.wikimedia.org/T183101EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Imarlier, jcrespoCc: Krinkle, jcrespo, Legoktm, Imarlier, aaron, Aklapper, EBernhardson, dcausse, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Vali.matei, Jdrewniak, Wikidata-bugs, aude, Mbch331, Jay8g, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T183101: Items missing from Wikidata index due to LinksUpdate lock failures
Smalyshev added a comment. Unfortunately, I have no access to see T180918.TASK DETAILhttps://phabricator.wikimedia.org/T183101EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Imarlier, SmalyshevCc: jcrespo, Legoktm, Imarlier, aaron, Aklapper, EBernhardson, dcausse, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Vali.matei, Jdrewniak, Wikidata-bugs, aude, Mbch331, Jay8g, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T183101: Items missing from Wikidata index due to LinksUpdate lock failures
Imarlier added a comment. Just noticed this in SAL: 15:00Synchronized wmf-config/db-eqiad.php: Depool db1100 (duration: 00m 58s) Potentially interesting, given that timeouts has started to be reported slightly before that. The commit that was being synced just refers to the host as being broken: https://phabricator.wikimedia.org/rOMWC6b3f4c9b5be8f231fb4c687e8cdcff2468796905 This does explain the big surge in exceptions around that time, but doesn't help to clear up what connection (if any) this issue had to the failed LinksUpdate jobs.TASK DETAILhttps://phabricator.wikimedia.org/T183101EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ImarlierCc: Legoktm, Imarlier, aaron, Aklapper, EBernhardson, dcausse, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Vali.matei, Jdrewniak, Wikidata-bugs, aude, Mbch331, Jay8g, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T183101: Items missing from Wikidata index due to LinksUpdate lock failures
Imarlier added a comment. @aaron -- would love your thoughts on this, when you get a chance.TASK DETAILhttps://phabricator.wikimedia.org/T183101EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ImarlierCc: Legoktm, Imarlier, aaron, Aklapper, EBernhardson, dcausse, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Vali.matei, Jdrewniak, Wikidata-bugs, aude, Mbch331, Jay8g, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T183101: Items missing from Wikidata index due to LinksUpdate lock failures
Imarlier added a comment. Alternate theory that I think is wrong: there's a chance that something wonky happened with db1070 around this time, which caused replication to stop and may also have caused issues getting locks. If that were the case I would expect to see far more write errors here (like on the original submit, and not just on the LinksUpdate job), but I can't rule it out right now.TASK DETAILhttps://phabricator.wikimedia.org/T183101EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ImarlierCc: Legoktm, Imarlier, aaron, Aklapper, EBernhardson, dcausse, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Vali.matei, Jdrewniak, Wikidata-bugs, aude, Mbch331, Jay8g, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T183101: Items missing from Wikidata index due to LinksUpdate lock failures
Imarlier added a comment. First appearance of a timeout on db1100: 2017-12-16T14:37:46 mw1311 WARNING Wikimedia\Rdbms\LoadBalancer::doWait: Timed out waiting on db1100 pos db1070-bin.001655/808461384 First exception trying to get the lock: 2017-12-16T14:38:16 mw1290 ERROR [WjUvyQpAEDcAACARVmoAAACM] /w/api.php RuntimeException from line 205 of /srv/mediawiki/php-1.31.0-wmf.12/includes/deferred/LinksUpdate.php: Could not acquire lock 'LinksUpdate:atomicity:pageid:1524206'. Prior to this exception being recorded, there hadn't been any exceptions received for a period of nearly 2 minutes -- in particular, there hadn't been any database timeouts or slave replication lag issues during that time. The first indication of replication lag is recorded 27 seconds later: 2017-12-16T14:38:43 mw1308 ERROR [8e59bd7e779c680d140f8366] /rpc/RunJobs.php?wiki=dawiki=refreshLinks=30=300M Wikimedia\Rdbms\DBReplicationWaitError from line 373 of /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/lbfactory/LBFactory.php: Could not wait for replica DBs to catch up to db1070 Last set of timeouts waiting on replication to db1100: 2017-12-16T15:02:02 mw1239 WARNING Wikimedia\Rdbms\LoadBalancer::doWait: Timed out waiting on db1100 pos db1070-bin.001655/825818703 2017-12-16T15:02:06 mw1288 WARNING Wikimedia\Rdbms\LoadBalancer::doWait: Timed out waiting on db1100 pos db1070-bin.001655/826052699 2017-12-16T15:02:12 terbium WARNING Wikimedia\Rdbms\LoadBalancer::doWait: Timed out waiting on db1100 pos db1070-bin.001655/826518643 2017-12-16T15:02:25 mw1286 WARNING Wikimedia\Rdbms\LoadBalancer::doWait: Timed out waiting on db1100 pos db1070-bin.001655/827446248 2017-12-16T15:03:25 mw1286 WARNING Wikimedia\Rdbms\LoadBalancer::doWait: Timed out waiting on db1100 pos db1070-bin.001655/831205807 This is right around when everything recovered, and I don't think that it's coincidence. I'm not sure yet how this is all overlapping, but I'm pretty certain that it is.TASK DETAILhttps://phabricator.wikimedia.org/T183101EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ImarlierCc: Legoktm, Imarlier, aaron, Aklapper, EBernhardson, dcausse, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Vali.matei, Jdrewniak, Wikidata-bugs, aude, Mbch331, Jay8g, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T183101: Items missing from Wikidata index due to LinksUpdate lock failures
Imarlier added a comment. Just throwing a couple of notes in here: Q45825741 and Q45825750 were both updated at very close to the same time (2017-12-16 14:54:00-15:02:00); both had a number of updates that happened within a very tight window (<1 minute). Both also have log entries indicating that they experienced failures getting a lock: 2017-12-16T14:55:09 mw1283 INFO Expectation (readQueryTime <= 5) by MediaWiki::restInPeace not met (actual: 15.000407934189): query-m: SELECT GET_LOCK('X', N) AS lockstatus [TRX#43742a] #0 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/TransactionProfiler.php(223): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated() #1 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(1049): Wikimedia\Rdbms\TransactionProfiler->recordQueryCompletion() #2 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(948): Wikimedia\Rdbms\Database->doProfiledQuery() #3 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/DatabaseMysqlBase.php(1039): Wikimedia\Rdbms\Database->query() #4 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(3559): Wikimedia\Rdbms\DatabaseMysqlBase->lock() 2017-12-16T15:00:38 mw1235 INFO Expectation (readQueryTime <= 5) by MediaWiki::restInPeace not met (actual: 15.000345230103): query-m: SELECT GET_LOCK('X', N) AS lockstatus [TRX#aa6d60] #0 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/TransactionProfiler.php(223): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated() #1 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(1049): Wikimedia\Rdbms\TransactionProfiler->recordQueryCompletion() #2 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(948): Wikimedia\Rdbms\Database->doProfiledQuery() #3 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/DatabaseMysqlBase.php(1039): Wikimedia\Rdbms\Database->query() #4 /srv/mediawiki/php-1.31.0-wmf.12/includes/libs/rdbms/database/Database.php(3559): Wikimedia\Rdbms\DatabaseMysqlBase->lock() #5 /srv/mediawiki/php-1.31.0-wmf.12/includes/deferred/LinksUpdate.php(203): Wikimedia\Rdbms\Database->getScopedLockAndFlush() #6 /srv/mediawiki/php-1.31.0-wmf.12/includes/deferred/LinksUpdate.php(170): LinksUpdate::acquirePageLock() #7 /srv/mediawiki/php-1.31.0-wmf.12/includes/deferred/DeferredUpdates.php(259): LinksUpdate->doUpdate() #8 /srv/mediawiki/php-1.31.0-wmf.12/includes/deferred/DeferredUpdates.php(210): DeferredUpdates::runUpdate() #9 /srv/mediawiki/php-1.31.0-wmf.12/includes/deferred/DeferredUpdates.php(131): DeferredUpdates::execute() #10 /srv/mediawiki/php-1.31.0-wmf.12/includes/MediaWiki.php(897): DeferredUpdates::doUpdates() #11 /srv/mediawiki/php-1.31.0-wmf.12/includes/MediaWiki.php(719): MediaWiki->restInPeace() #12 (): Closure$MediaWiki::doPostOutputShutdown() #13 {main} Both of those records also have one or more log entries showing that db1100 was significantly lagged at this time: 2017-12-16T14:54:54 mw1283 WARNING Wikimedia\Rdbms\LoadBalancer::doWait: Timed out waiting on db1100 pos db1070-bin.001655/818130407 2017-12-16T15:00:44 mw1314 WARNING Wikimedia\Rdbms\LoadBalancer::doWait: Timed out waiting on db1100 pos db1070-bin.001655/821972877 This doesn't necessarily mean anything, but it does immediately bring to mind the possibility of a bad interaction between two simultaneous transactions -- a process that gets the update lock, then tries to read from a slave that's lagged, times out waiting on that, and ultimately bails out without releasing the lock? This code is new to me, but that seems like an angle that's worth looking at. Q45825730 and Q45825742 were both created around the same time. However, neither of them shows any indication of similar database timeouts or lock failures. Here's 42: Time host level message 2017-12-16T14:55:07 mw1286 INFO AbuseFilter::filterAction: cache miss for 'Q45825742' (key wikidatawiki:abusefilter:check-stash:default:19082eae700e0352ac30604486024ca3:v1). 2017-12-16T14:54:03 wdqs1003 INFO Got 5 changes, from Q45825742@609888642@20171216145333|645260053 to Q17580698@609888669@20171216145400|645260055 2017-12-16T14:53:33 mw1316 INFO AbuseFilter::filterAction: cache miss for 'Q45825742' (key wikidatawiki:abusefilter:check-stash:default:75754391037fe534c51178ac2603dc26:v1). 2017-12-16T14:53:23 wdqs1005 INFO Got 2 changes, from Q2492695@609888609@20171216145249|645260020 to Q45825742@609888612@20171216145251|645260022 2017-12-16T14:53:23 wdqs2002 INFO Got 2 changes, from Q2492695@609888609@20171216145249|645260020 to Q45825742@609888612@20171216145251|645260022 2017-12-16T14:53:22 wdqs1004 INFO Got 2 changes, from Q2492695@609888609@20171216145249|645260020 to Q45825742@609888612@20171216145251|645260022 2017-12-16T14:53:22 wdqs2001 INFO Got 2 changes, from Q2492695@609888609@20171216145249|645260020 to Q45825742@609888612@20171216145251|645260022 2017-12-16T14:53:22 wdqs2003 INFO Got 2
[Wikidata-bugs] [Maniphest] [Commented On] T183101: Items missing from Wikidata index
Legoktm added a comment. Looking at Q45825750, there were 7 edits made very quickly right after another, so it seems reasonable that a links update would be in progress, holding the lock, and then another links update would be unable to get the lock and fail. However, AFAIK failed jobs are supposed to retry after an hour, so I'm not sure why that didn't happen here. (Unless all the jobs retried around the same time and ran into the lock problem again..?)TASK DETAILhttps://phabricator.wikimedia.org/T183101EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LegoktmCc: Legoktm, Imarlier, aaron, Aklapper, EBernhardson, dcausse, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Vali.matei, Jdrewniak, Wikidata-bugs, aude, Mbch331, Jay8g, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T183101: Items missing from Wikidata index
Imarlier added a comment. Aaron is on vacation this week, likely not paying any attention to notifications. We'll try to take a look at it without him.TASK DETAILhttps://phabricator.wikimedia.org/T183101EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ImarlierCc: Imarlier, aaron, Aklapper, EBernhardson, dcausse, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, Vali.matei, Jdrewniak, Wikidata-bugs, aude, Mbch331, Jay8g, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T183101: Items missing from Wikidata index
Smalyshev added a comment. Looks like we have 2 spikes of LinksUpdate errors: https://logstash.wikimedia.org/goto/a2050a1aca3e9d8c9bf7089ee2d2431d - which may be caused by some kind of database problems. We need to think how to make system more robust so that such problems do not lead to items disappearing.TASK DETAILhttps://phabricator.wikimedia.org/T183101EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: aaron, Aklapper, EBernhardson, dcausse, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, Jdrewniak, Wikidata-bugs, aude, Mbch331, Jay8g, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T183101: Items missing from Wikidata index
Smalyshev added a comment. Q45825750 has page ID of 46985262. Kibana says: Could not acquire lock 'LinksUpdate:atomicity:pageid:46985262'. Could be the reason for failed update.TASK DETAILhttps://phabricator.wikimedia.org/T183101EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Aklapper, EBernhardson, dcausse, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, Jdrewniak, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T183101: Items missing from Wikidata index
dcausse added a comment. Q45825730 is me, I used this one just to test.TASK DETAILhttps://phabricator.wikimedia.org/T183101EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: dcausseCc: Aklapper, EBernhardson, dcausse, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, Jdrewniak, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T183101: Items missing from Wikidata index
Smalyshev added a comment. I can see data in https://www.wikidata.org/wiki/Q45825730?action=""> now but not in https://www.wikidata.org/wiki/Q45825750?action="">. Extreme case of delay or something going really wrong here?TASK DETAILhttps://phabricator.wikimedia.org/T183101EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Aklapper, EBernhardson, dcausse, Smalyshev, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, Jdrewniak, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs