Ladsgroup added a subscriber: alaa_wmde.
Ladsgroup added a comment.

  While trying to do T226818: Diff when updating wbc_entity_usage 
<https://phabricator.wikimedia.org/T226818> I found a very very weird thing. 
The `AddUsagesForPageJob`, takes all entity usages of the given page and does 
some stuff plus adding all of the current entity usages to `wbc_entity_usage` 
table with INSERT IGNORE. There's no diffing here. It just adds all of the 
current ones through `$this->usageUpdater->addUsagesForPage()` that calls 
`$this->usageTracker->addUsedEntities()`. Then brings the question that how (or 
if) the removed entity usages actually gets removed from the table. I tested it 
locally and in production and I realized that it really removes the removed 
entity usages but then how? Looking at other usage of the `UsageUpdater` class 
it seems that `DataUpdateHookHandlers` calls 
`$this->usageUpdater->replaceUsagesForPage()` which does a pretty impressive 
diffing and updates the wbc_entity_usage into the proper state and then the 
`AddUsagesForPageJob` job comes in and do a `INSERT IGNORE` of the exact same 
rows that already exist into the database. It might the reason there is 
deadlock it's that both the hook handler (which I think is a deferred update) 
and the job, try to update the same rows at the same time.
  
  One other important thing is that the way the job adds the data to the 
database is pretty inefficient. For example in one article that caused this 
issue in Ukrainian Wikipedia. We have this entity usages:
  
    [email protected](ukwiki)> select * from wbc_entity_usage where 
eu_page_id = 1143221;
    +-----------+--------------+-----------+------------+
    | eu_row_id | eu_entity_id | eu_aspect | eu_page_id |
    +-----------+--------------+-----------+------------+
    |  67076587 | Q1026        | C.P1813   |    1143221 |
    |  67076588 | Q1026        | C.P1448   |    1143221 |
    |  67076589 | Q1026        | C.P1705   |    1143221 |
    
    (Lots of rows)
    
    +-----------+--------------+-----------+------------+
    392 rows in set (0.00 sec)
  
  But in most cases the entity usage change by one or two. Imagine I changed 
only one entity usage of this page from `Q1026 C.P1813` to `Q1026 C.P1814`. The 
hook handler, removes `Q1026 C.P1813` and adds  `Q1026 C.P1814`. Then job comes 
in and happily insert ignores the whole 392 rows again, the job could 
completely be dropped and `wbc_entity_usage` would stay the same. And these 
types of mistakes used to happen 300 times a second and after T205045#5290536 
<https://phabricator.wikimedia.org/T205045#5290536> it went down to 30 times 
per second.
  
  I don't think we should completely drop the job, it does important things to 
the entity usage subscription that the hook handler seems to be not doing it 
(the job also prunes unused entity subscription but it's not related to this at 
all). We should just drop that part.
  
  CC @Lydia_Pintscher and @hoo @Addshore @alaa_wmde @Lucas_Werkmeister_WMDE: 
Please double check everything and tell me I'm not crazy. I checked everything 
twice though.

TASK DETAIL
  https://phabricator.wikimedia.org/T205045

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: alaa_wmde, Lucas_Werkmeister_WMDE, Joe, mobrovac, Pchelolo, Michael, 
Lydia_Pintscher, jcrespo, Ladsgroup, hoo, Addshore, Marostegui, Aklapper, 
Krinkle, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, 
darthmon_wmde, joker88john, DannyS712, CucyNoiD, Nandana, NebulousIris, 
Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, 
Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, 
Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, WSH1906, 
Lewizho99, Maathavan, _jensen, rosalieper, Jonas, Wikidata-bugs, aude, 
Jdforrester-WMF, Mbch331, Jay8g, Krenair
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to