Addshore added a comment.

  Further investigating the case in discussed in T237984#5748184 
<https://phabricator.wikimedia.org/T237984#5748184>, T237984#5748567 
<https://phabricator.wikimedia.org/T237984#5748567>, T237984#5750348 
<https://phabricator.wikimedia.org/T237984#5750348> looking at the transaction 
that happened.
  
  **TLDR; Scroll down the the "Got it" section for the location of the 
issue...**
  
  - 539 calls to `Wikibase\Lib\Store\Sql\TermSqlIndex::deleteTerms` were made 
(deleting all terms)
  - 539 ids selected for deletion in 
`Wikibase\Lib\Store\Sql\Terms\DatabaseItemTermStore::acquireAndInsertTerms`
  - 539 ids (537 UNIQUE) selected for deletion in 
`Wikibase\Lib\Store\Sql\Terms\DatabaseTermIdsCleaner::cleanTermInLangIds`
  - 539 ids (107 UNIQUE) selected for deletion in 
`Wikibase\Lib\Store\Sql\Terms\DatabaseTermIdsCleaner::cleanTextInLangIds` (at 
least this should be less, one of these deleted was 110010309 'van' (this id 
only occurred once in the list of ids to be deleted.))
  
  TODO Perhaps the list of ids sent in these queries should be made unique 
before sending.....
  
  A full walkthrough of this bit of the code, highlighting some interesting 
bits, but still not finding the issue.
  
  - `DatabaseItemTermStore::storeTerms` - The method all of this "magic" is 
happening in
    - `DatabaseItemTermStore::acquireAndInsertTerms` - Nothing evil here...
      - Selects `wbit_term_in_lang_id` values from `wbt_item_terms` for the 
item ID to altered
      - Acquirers term ids (for things being inserted). Also generates 
`$termIdsToClean` which is a diff of what is in the table and what is desired, 
resulting in what to be removed.
      - Inserts terms that are needed but missing from the store
      - DELETE from `wbt_item_terms` all `$termIdsToClean` which is a 
collection of `wbit_term_in_lang_id` for the single `wbit_item_id`
      - Return the `$termIdsToClean` which is a collection of 
`wbit_term_in_lang_id`
    - `DatabaseItemTermStore::cleanTermsIfUnused` - Called with 
`$termIdsToClean`
      - Iterate through the `$termIdsToClean` checking both the 
`wbt_property_terms` and `wbt_item_terms` tables for usage **<- This will not 
detect the property usage of the term text, as the term is in a different 
language**
      - `$termIdsUnused` is the list of ids that are unused according to the 
last checks
      - Select these once more FOR UPDATE
      - `DatabaseTermIdsCleaner::cleanTermIds` - Called with all term ids to 
clean that have not been detected as used in any properties or items
        - `DatabaseTermIdsCleaner::cleanTermInLangIds` - Called with the same 
as the above method
          - Select text in lang ids for the term in lang rows we want to 
delete, set as `$potentiallyUnusedTextInLangIds`
          - DELETE the `wbt_term_in_lang` rows that we are cleaning up
          - For each `$potentiallyUnusedTextInLangIds` check if after the above 
delete there are still entries in `wbt_term_in_lang` indicating usage **<- 
Still will not detect the property usage of the term text, as the term text is 
in a different lang**
          - SELECT for update
          - `DatabaseTermIdsCleaner::cleanTextInLangIds` - Called with the 
'unused' text in lang ids
            - Check the `wbt_text_in_lang` table looking at the same 
`$textInLangIds` for find texts we can also delete **< - Still won't find the 
usage of text by the property as the text in lang for the property is not in 
the list being checked**
            - DELETE the text in lang rows
            - foreach `$potentiallyUnusedTextIds` select from the 
`wbt_text_in_lang` table where `wbxl_text_id` matches the potentially unused 
id. If select === false, add to `$unusedTextIds` **<- This should find and not 
delete the text row used by the property**
            - SELECT `$unusedTextIds` for update
            - `DatabaseTermIdsCleaner::cleanTextIds` - called with 
`$unusedTextIds`
              - DELETE from `wbt_text`
  
  And some tests for the rough situation investigated above 
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/559434/ but 
they all pass..
  
  **Got it...**
  
  Following exactly where this textId came from before being listed for 
deletion:
  
  - `cleanTextInLangIds` selects some `$potentiallyUnusedTextIds` which will 
list 110010309, as this text was used in a term in the entity being deleted
  - Each ID is then checked for existence (after some deletes) in 
`wbt_text_in_lang`, which should have flagged the ID as still being used (by 
the property), and thus the ID should not be added to `$unusedTextIds`, as it 
is used
  - Everything in `$unusedTextIds` is then selected FOR UPDATE from 
`wbt_text_in_lang`, and `$stillUsedTextIds` is set, this **WILL NOT  include 
110010309**, as 110010309 is not in `$unusedTextIds`
  - A diff then happens between `$potentiallyUnusedTextIds` and 
`$stillUsedTextIds` to create the final list of text ids to be cleaned.
    - **At this point `$potentiallyUnusedTextIds` has the ID, 
`$stillUsedTextIds` does not, thus 110010309 will be marked for deletion..**
  - cleanTextIds is called, which does no further checking and the row is 
deleted.
  
  This code was last touched in 
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/547535/ 
which subtly changed the behaviour.
  
  The previous behaviour before this patch:
  
  - (same) `cleanTextInLangIds` selects some `$potentiallyUnusedTextIds` which 
will list 110010309, as this text was used in a term in the entity being deleted
  - Everything in `$potentiallyUnusedTextIds` is then selected FOR UPDATE from 
`wbt_text_in_lang`, this will include 110010309 as it is in 
`$potentiallyUnusedTextIds`, and `$stillUsedTextIds` is set, **this WILL 
include 110010309**, as 110010309 is not in `$potentiallyUnusedTextIds` and is 
still in the table
  - A diff then happens between `$potentiallyUnusedTextIds` and 
`$stillUsedTextIds`
    - **At this point both of these arrays have the ID, thus the ID is not 
returned for deletion.**
  - cleanTextIds is called, which does no further checking and the row is 
deleted.

TASK DETAIL
  https://phabricator.wikimedia.org/T237984

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: jcrespo, Marostegui, abian, JAnD, Ash_Crow, Addshore, PKM, Moebeus, 
alaa_wmde, VIGNERON, Aklapper, Lydia_Pintscher, Ladsgroup, Lea_Lacroix_WMDE, 
Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, Iflorez, 
darthmon_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, 
Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, 
Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, 
Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, WSH1906, 
Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, 
aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to