Lucas_Werkmeister_WMDE added a subscriber: ArthurTaylor.
Lucas_Werkmeister_WMDE added a comment.
@Michael, @ArthurTaylor and I looked into this issue some more and started
noticing some things that haven’t been discussed here yet AFAICT:
- The queries that deadlock always try to insert a bunch of `C.P%` entity
usages, for the same entity, on the same page. Example:
INSERT IGNORE INTO `wbc_entity_usage` (eu_page_id,eu_aspect,eu_entity_id)
VALUES
(1918734,'C.P486','Q1059818'),(1918734,'C.P4394','Q1059818'),(1918734,'C.P696','Q1059818'),(1918734,'C.P1323','Q1059818'),(1918734,'C.P1693','Q1059818'),(1918734,'C.P1694','Q1059818'),(1918734,'C.P1402','Q1059818'),(1918734,'C.P7173','Q1059818')
- (cont.) – that’s on enwiki, Dec 15, 2023 @ 05:53:37.303, reqId
`55f9a1a6-f93e-440f-ae08-ad1ac111ba27`; all the values are for page ID 1918734
and entity ID Q1059818.
- The affected pages (e.g. page ID 1918734 on enwiki
<https://en.wikipedia.org/w/index.php?curid=1918734>) always have //both// a
`C` usage and these `C.P%` usages for the same entity. Semantically, this
shouldn’t happen – “C” covers all `C.P%` – but there might be some logic error
in the code.
- Using the analytics replicas, and preferably only on a small wiki, you
can count such pages using this query: `SELECT COUNT(DISTINCT eu1.eu_row_id)
FROM wbc_entity_usage eu1 JOIN wbc_entity_usage eu2 ON eu1.eu_entity_id =
eu2.eu_entity_id AND eu1.eu_page_id = eu2.eu_page_id WHERE eu1.eu_aspect = 'C'
AND eu2.eu_aspect LIKE 'C.%';`
- The `C.P%` rows apparently get re-inserted when the affected page is purged
via the API (with forced links update) – observe the different `eu_row_id` when
the below query is run for the second time:
mysql:[email protected] [enwiki]> SELECT * FROM
wbc_entity_usage WHERE eu_page_id = 64091671 AND eu_entity_id = 'Q94595271';
+-----------+--------------+-----------+------------+
| eu_row_id | eu_entity_id | eu_aspect | eu_page_id |
+-----------+--------------+-----------+------------+
| 68680169 | Q94595271 | C | 64091671 |
| 297239401 | Q94595271 | C.P18 | 64091671 |
| 297239402 | Q94595271 | C.P1960 | 64091671 |
| 297239403 | Q94595271 | C.P2038 | 64091671 |
| 297239399 | Q94595271 | C.P31 | 64091671 |
| 297239400 | Q94595271 | C.P569 | 64091671 |
| 70607769 | Q94595271 | D.en | 64091671 |
| 68680144 | Q94595271 | O | 64091671 |
| 68680148 | Q94595271 | S | 64091671 |
| 68680168 | Q94595271 | T | 64091671 |
+-----------+--------------+-----------+------------+
10 rows in set (0.001 sec)
mysql:[email protected] [enwiki]> SELECT * FROM
wbc_entity_usage WHERE eu_page_id = 64091671 AND eu_entity_id = 'Q94595271';
+-----------+--------------+-----------+------------+
| eu_row_id | eu_entity_id | eu_aspect | eu_page_id |
+-----------+--------------+-----------+------------+
| 68680169 | Q94595271 | C | 64091671 |
| 297239474 | Q94595271 | C.P18 | 64091671 |
| 297239475 | Q94595271 | C.P1960 | 64091671 |
| 297239476 | Q94595271 | C.P2038 | 64091671 |
| 297239472 | Q94595271 | C.P31 | 64091671 |
| 297239473 | Q94595271 | C.P569 | 64091671 |
| 70607769 | Q94595271 | D.en | 64091671 |
| 68680144 | Q94595271 | O | 64091671 |
| 68680148 | Q94595271 | S | 64091671 |
| 68680168 | Q94595271 | T | 64091671 |
+-----------+--------------+-----------+------------+
10 rows in set (0.001 sec)
- The `C` row, on the other hand, kept its old row ID. But really, it should
have been removed.
We haven’t managed to reproduce the issue elsewhere yet. Locally, an extra
`C` usage seems to be removed correctly when the page isn’t actually above the
`entityUsageModifierLimits`, and changing those limits seems to work as we
would expect. On Beta, pages with both `C` and `C.P%` usages seem to get their
`C.P%` usages removed instead – we’re not sure if this is because the affected
Beta pages (there are only about a dozen of them) genuinely use enough
different statements to exceed the limit, or if some other bug is going on.
(The connected items don’t have that many statements, but that doesn’t mean
much – statement usage is still tracked even if no such statement exists. So
it’s possible that the affected Beta pages use some templates and modules
imported from enwiki that genuinely ask for tons of statements and therefore
exceed the limit. But we’re not sure. A previously affected Beta page, which
now only has the `C` usage left after we issued a purge, is Berlin
<https://en.wikipedia.beta.wmflabs.org/w/index.php?curid=159418>.)
Relevant places in the code include
`DataUpdateHookHandler::doLinksUpdateComplete()` and
`DataUpdateHookHandler::onParserCacheSaveComplete()`; the former replaces all
entity usages after a proper edit, while the latter only adds extra usages
(e.g. when a page is viewed in a different language on a multilingual wiki like
Commons). The latter is also what pushes the job that we see in this task
(`AddUsagesForPageJob`); the former doesn’t involve jobs.
Notably, it seems that `onParserCacheSaveComplete()` can be called //twice//
per edit, even locally. It’s also not clear how its calls relate to the call of
`doLinksUpdateComplete()`, timing-wise. The multiple calls might be a bug
somewhere. (Redundant parse? cf. T288707
<https://phabricator.wikimedia.org/T288707>)
Given that we don’t really understand the deadlock yet, we’ll see if we can
fix this logic error before trying the deadlock improvements from
T255706#9302521 <https://phabricator.wikimedia.org/T255706#9302521>.
TASK DETAIL
https://phabricator.wikimedia.org/T255706
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Lucas_Werkmeister_WMDE
Cc: ArthurTaylor, hoo, Lucas_Werkmeister_WMDE, ItamarWMDE, Ladsgroup, Krinkle,
eprodromou, aaron, Michael, Aklapper, thcipriani, Danny_Benjafield_WMDE,
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, Akuckartz, darthmon_wmde,
Rosalie_WMDE, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer,
_jensen, rosalieper, Scott_WUaS, Verdy_p, Wikidata-bugs, aude, Jdforrester-WMF,
Mbch331, Jay8g
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]