Ladsgroup added a subscriber: alaa_wmde. Ladsgroup added a comment.
While trying to do T226818: Diff when updating wbc_entity_usage <https://phabricator.wikimedia.org/T226818> I found a very very weird thing. The `AddUsagesForPageJob`, takes all entity usages of the given page and does some stuff plus adding all of the current entity usages to `wbc_entity_usage` table with INSERT IGNORE. There's no diffing here. It just adds all of the current ones through `$this->usageUpdater->addUsagesForPage()` that calls `$this->usageTracker->addUsedEntities()`. Then brings the question that how (or if) the removed entity usages actually gets removed from the table. I tested it locally and in production and I realized that it really removes the removed entity usages but then how? Looking at other usage of the `UsageUpdater` class it seems that `DataUpdateHookHandlers` calls `$this->usageUpdater->replaceUsagesForPage()` which does a pretty impressive diffing and updates the wbc_entity_usage into the proper state and then the `AddUsagesForPageJob` job comes in and do a `INSERT IGNORE` of the exact same rows that already exist into the database. It might the reason there is deadlock it's that both the hook handler (which I think is a deferred update) and the job, try to update the same rows at the same time. One other important thing is that the way the job adds the data to the database is pretty inefficient. For example in one article that caused this issue in Ukrainian Wikipedia. We have this entity usages: [email protected](ukwiki)> select * from wbc_entity_usage where eu_page_id = 1143221; +-----------+--------------+-----------+------------+ | eu_row_id | eu_entity_id | eu_aspect | eu_page_id | +-----------+--------------+-----------+------------+ | 67076587 | Q1026 | C.P1813 | 1143221 | | 67076588 | Q1026 | C.P1448 | 1143221 | | 67076589 | Q1026 | C.P1705 | 1143221 | (Lots of rows) +-----------+--------------+-----------+------------+ 392 rows in set (0.00 sec) But in most cases the entity usage change by one or two. Imagine I changed only one entity usage of this page from `Q1026 C.P1813` to `Q1026 C.P1814`. The hook handler, removes `Q1026 C.P1813` and adds `Q1026 C.P1814`. Then job comes in and happily insert ignores the whole 392 rows again, the job could completely be dropped and `wbc_entity_usage` would stay the same. And these types of mistakes used to happen 300 times a second and after T205045#5290536 <https://phabricator.wikimedia.org/T205045#5290536> it went down to 30 times per second. I don't think we should completely drop the job, it does important things to the entity usage subscription that the hook handler seems to be not doing it (the job also prunes unused entity subscription but it's not related to this at all). We should just drop that part. CC @Lydia_Pintscher and @hoo @Addshore @alaa_wmde @Lucas_Werkmeister_WMDE: Please double check everything and tell me I'm not crazy. I checked everything twice though. TASK DETAIL https://phabricator.wikimedia.org/T205045 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Ladsgroup Cc: alaa_wmde, Lucas_Werkmeister_WMDE, Joe, mobrovac, Pchelolo, Michael, Lydia_Pintscher, jcrespo, Ladsgroup, hoo, Addshore, Marostegui, Aklapper, Krinkle, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, joker88john, DannyS712, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Jonas, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, Jay8g, Krenair
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
