Addshore added a comment.
Further investigating the case in discussed in T237984#5748184 <https://phabricator.wikimedia.org/T237984#5748184>, T237984#5748567 <https://phabricator.wikimedia.org/T237984#5748567>, T237984#5750348 <https://phabricator.wikimedia.org/T237984#5750348> looking at the transaction that happened. **TLDR; Scroll down the the "Got it" section for the location of the issue...** - 539 calls to `Wikibase\Lib\Store\Sql\TermSqlIndex::deleteTerms` were made (deleting all terms) - 539 ids selected for deletion in `Wikibase\Lib\Store\Sql\Terms\DatabaseItemTermStore::acquireAndInsertTerms` - 539 ids (537 UNIQUE) selected for deletion in `Wikibase\Lib\Store\Sql\Terms\DatabaseTermIdsCleaner::cleanTermInLangIds` - 539 ids (107 UNIQUE) selected for deletion in `Wikibase\Lib\Store\Sql\Terms\DatabaseTermIdsCleaner::cleanTextInLangIds` (at least this should be less, one of these deleted was 110010309 'van' (this id only occurred once in the list of ids to be deleted.)) TODO Perhaps the list of ids sent in these queries should be made unique before sending..... A full walkthrough of this bit of the code, highlighting some interesting bits, but still not finding the issue. - `DatabaseItemTermStore::storeTerms` - The method all of this "magic" is happening in - `DatabaseItemTermStore::acquireAndInsertTerms` - Nothing evil here... - Selects `wbit_term_in_lang_id` values from `wbt_item_terms` for the item ID to altered - Acquirers term ids (for things being inserted). Also generates `$termIdsToClean` which is a diff of what is in the table and what is desired, resulting in what to be removed. - Inserts terms that are needed but missing from the store - DELETE from `wbt_item_terms` all `$termIdsToClean` which is a collection of `wbit_term_in_lang_id` for the single `wbit_item_id` - Return the `$termIdsToClean` which is a collection of `wbit_term_in_lang_id` - `DatabaseItemTermStore::cleanTermsIfUnused` - Called with `$termIdsToClean` - Iterate through the `$termIdsToClean` checking both the `wbt_property_terms` and `wbt_item_terms` tables for usage **<- This will not detect the property usage of the term text, as the term is in a different language** - `$termIdsUnused` is the list of ids that are unused according to the last checks - Select these once more FOR UPDATE - `DatabaseTermIdsCleaner::cleanTermIds` - Called with all term ids to clean that have not been detected as used in any properties or items - `DatabaseTermIdsCleaner::cleanTermInLangIds` - Called with the same as the above method - Select text in lang ids for the term in lang rows we want to delete, set as `$potentiallyUnusedTextInLangIds` - DELETE the `wbt_term_in_lang` rows that we are cleaning up - For each `$potentiallyUnusedTextInLangIds` check if after the above delete there are still entries in `wbt_term_in_lang` indicating usage **<- Still will not detect the property usage of the term text, as the term text is in a different lang** - SELECT for update - `DatabaseTermIdsCleaner::cleanTextInLangIds` - Called with the 'unused' text in lang ids - Check the `wbt_text_in_lang` table looking at the same `$textInLangIds` for find texts we can also delete **< - Still won't find the usage of text by the property as the text in lang for the property is not in the list being checked** - DELETE the text in lang rows - foreach `$potentiallyUnusedTextIds` select from the `wbt_text_in_lang` table where `wbxl_text_id` matches the potentially unused id. If select === false, add to `$unusedTextIds` **<- This should find and not delete the text row used by the property** - SELECT `$unusedTextIds` for update - `DatabaseTermIdsCleaner::cleanTextIds` - called with `$unusedTextIds` - DELETE from `wbt_text` And some tests for the rough situation investigated above https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/559434/ but they all pass.. **Got it...** Following exactly where this textId came from before being listed for deletion: - `cleanTextInLangIds` selects some `$potentiallyUnusedTextIds` which will list 110010309, as this text was used in a term in the entity being deleted - Each ID is then checked for existence (after some deletes) in `wbt_text_in_lang`, which should have flagged the ID as still being used (by the property), and thus the ID should not be added to `$unusedTextIds`, as it is used - Everything in `$unusedTextIds` is then selected FOR UPDATE from `wbt_text_in_lang`, and `$stillUsedTextIds` is set, this **WILL NOT include 110010309**, as 110010309 is not in `$unusedTextIds` - A diff then happens between `$potentiallyUnusedTextIds` and `$stillUsedTextIds` to create the final list of text ids to be cleaned. - **At this point `$potentiallyUnusedTextIds` has the ID, `$stillUsedTextIds` does not, thus 110010309 will be marked for deletion..** - cleanTextIds is called, which does no further checking and the row is deleted. This code was last touched in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/547535/ which subtly changed the behaviour. The previous behaviour before this patch: - (same) `cleanTextInLangIds` selects some `$potentiallyUnusedTextIds` which will list 110010309, as this text was used in a term in the entity being deleted - Everything in `$potentiallyUnusedTextIds` is then selected FOR UPDATE from `wbt_text_in_lang`, this will include 110010309 as it is in `$potentiallyUnusedTextIds`, and `$stillUsedTextIds` is set, **this WILL include 110010309**, as 110010309 is not in `$potentiallyUnusedTextIds` and is still in the table - A diff then happens between `$potentiallyUnusedTextIds` and `$stillUsedTextIds` - **At this point both of these arrays have the ID, thus the ID is not returned for deletion.** - cleanTextIds is called, which does no further checking and the row is deleted. TASK DETAIL https://phabricator.wikimedia.org/T237984 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Addshore Cc: jcrespo, Marostegui, abian, JAnD, Ash_Crow, Addshore, PKM, Moebeus, alaa_wmde, VIGNERON, Aklapper, Lydia_Pintscher, Ladsgroup, Lea_Lacroix_WMDE, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, Iflorez, darthmon_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, WSH1906, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
