Addshore added a comment.
The current pattern of odd rows in the DB that we are currently seeing
doesn't only remove the text row, but also the text in lang row.
mysql:[email protected] [wikidatawiki]> SELECT * FROM
wbt_property_terms LEFT JOIN wbt_term_in_lang ON wbpt_term_in_lang_id = wbtl_id
LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT JOIN
wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON
wbxl_text_id = wbx_id WHERE wby_name = 'label' AND wbx_text IS NULL ORDER BY
wbpt_property_id;
+---------+------------------+----------------------+-----------+--------------+----------------------+--------+----------+---------+---------------+--------------+--------+----------+
| wbpt_id | wbpt_property_id | wbpt_term_in_lang_id | wbtl_id |
wbtl_type_id | wbtl_text_in_lang_id | wby_id | wby_name | wbxl_id |
wbxl_language | wbxl_text_id | wbx_id | wbx_text |
+---------+------------------+----------------------+-----------+--------------+----------------------+--------+----------+---------+---------------+--------------+--------+----------+
| 325236 | 225 | 388713206 | 388713206 |
1 | 383127030 | 1 | label | NULL | NULL |
NULL | NULL | NULL |
| 325246 | 433 | 388715670 | 388715670 |
1 | 379975720 | 1 | label | NULL | NULL |
NULL | NULL | NULL |
+---------+------------------+----------------------+-----------+--------------+----------------------+--------+----------+---------+---------------+--------------+--------+----------+
2 rows in set (0.95 sec)
The same patch touches the same sort of code in `cleanTermInLangIds` that
probably causes the same issue.
- `cleanTermInLangIds` is called with $termInLangIds which contains the
termInlangIds that are not used in the property or items tables (this is
correct)
- Example: ID 999 (some ID from the edit that triggered the deletion in the
above seen case)
- text in lang Ids are then selected from `wbt_term_in_lang` where the text
in lang id is in `$potentiallyUnusedTextInLangIds`
- `$potentiallyUnusedTextInLangIds` would contain all of the text in lang
IDS for the term in lang 999, lets say 11,12,383127030
- All of the `$termInLangIds` are then deleted (this is correct)
- term id 999 has been deleted
- each `$potentiallyUnusedTextInLangIds` which currently still looks fine is
then select from `wbt_term_in_lang` a final time and `$unusedTextInLangIds` is
built up when no rows are found for the text in lang id in the term in lang
table. (correct)
- `$unusedTextInLangIds` now contains all of the text in lang ids that are
still in use, so this should be 11 and 12
- `$unusedTextInLangIds` are then selected from wbt_term_in_lang FOR UPDATE,
setting `$stillUsedTextInLangIds` to the resulting IDs
- `stillUsedTextInLangIds` would then still include 11 and 12
- A diff then occurs including things that are in
`$potentiallyUnusedTextInLangIds` and not in `$stillUsedTextInLangIds`
- so things that are in 11,12,383127030 and not in, 11 and 12, thus
383127030 is passed down for deletion when it should not be.
TASK DETAIL
https://phabricator.wikimedia.org/T237984
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Addshore
Cc: jcrespo, Marostegui, abian, JAnD, Ash_Crow, Addshore, PKM, Moebeus,
alaa_wmde, VIGNERON, Aklapper, Lydia_Pintscher, Ladsgroup, Lea_Lacroix_WMDE,
Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, Iflorez,
darthmon_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris,
Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30,
Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic,
Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, WSH1906,
Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs,
aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs