Lucas_Werkmeister_WMDE added a comment.
Found a new example, Still Life with Pie and Silver Ewer <https://www.wikidata.org/w/index.php?title=Q17524323&diff=1672132901&oldid=1672058773>. The merge added a German label, one Dutch and English alias each, and a West Frisian description; the West Frisian description and English alias exist in the term store, but the German label and Dutch alias are missing, in an especially interesting manner: MariaDB [wikidatawiki]> SELECT wbit_item_id, wbit_term_in_lang_id, wby_name, wbtl_text_in_lang_id, wbxl_id FROM wbt_item_terms LEFT JOIN wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wbit_item_id IN (17524323) AND wbxl_id IS NULL; +--------------+----------------------+----------+----------------------+---------+ | wbit_item_id | wbit_term_in_lang_id | wby_name | wbtl_text_in_lang_id | wbxl_id | +--------------+----------------------+----------+----------------------+---------+ | 17524323 | 906991069 | NULL | NULL | NULL | | 17524323 | 957282069 | alias | 228497807 | NULL | +--------------+----------------------+----------+----------------------+---------+ 2 rows in set (0.015 sec) For the alias, the `wbt_text_in_lang` was cleaned up, but the `wbt_term_in_lang` remained behind; for the label, seemingly, even the `wbt_term_in_lang` is gone. The logstash board <https://logstash.wikimedia.org/goto/4c6e1ce2f277dd795c5d5b4968d51c87> for this is unfortunately a bit long, 84 events in total (because each term ID gets a separate clean job scheduled, don’t know why). Also, mysteriously, the “schedule deleteTermsOfEntity” log message seems to be missing – not just for this merge, but in general, there are no instances of this message after midnight UTC yesterday/today. (The last instances before then were at 23:57 UTC, so the cutoff is suspiciously close to midnight – see Logstash of this message four hours before+after midnight <https://logstash.wikimedia.org/goto/74573b4a79aa0eea25c35307aface569>.) Searching for “schedule/running/ran CleanTermsIfUnusedJob” <https://logstash.wikimedia.org/goto/1c1b4408d1438d84d0ce23d2058beccc> for the two `term_in_lang` IDs in the SQL output above, there is only one set of messages: that for the ID `906991069` (the label). Looking for just those jobs plus all the non-job messages (link <https://logstash.wikimedia.org/goto/ccba38b354ed6ceb88a625e84c1619ad>), we get this timeline, in terms of seconds after 13:24 UTC: - 34.756 (mw1358): schedule saveTermsOfEntity for Q28060661 (the source item) - 35.105: (mw1358) run saveTermsOfEntity for Q28060661 - 35.146 (mw1448): schedule saveTermsOfEntity for Q28060661 - 35.161 (mw1448): run saveTermsOfEntity for Q28060661 - 35.472 (mw1358): schedule CleanTermsIfUnusedJob for Q28060661 - 35.532 (mw1437): running CleanTermsIfUnusedJob for Q28060661 - 35.553 (mw1437): ran CleanTermsIfUnusedJob for Q28060661 - 35.561 (mw1358): schedule saveTermsOfEntity for Q17524323 (the target item) - 35.835 (mw1358): run saveTermsOfEntity for Q17524323 The source item history <https://www.wikidata.org/w/index.php?title=Q28060661&action=history> also shows //three// edits as part of the merge there, instead of the usual two (plus one on the target item) that you can see e.g. on the José Barreto source item <https://www.wikidata.org/w/index.php?title=Q112678118&action=history>: Between “merged item into” (remove most data) and “redirected to” (turn into redirect), there is also “cleared an item” (remove four descriptions). This comes from a property of the merge gadget that I overlooked earlier – it doesn’t always just make a `wbmergeitems` API request. - If `wbmergeitems` actually redirects the item, then the merge gadget is basically done, and all three total edits – “merged item into” (remove all data from source), “redirected to” (turn source into redirect), “merged item from” (add data to target) – come from the one API request. - If `wbmergeitems` //doesn’t// redirect the item – I think this happens when there are conflicting descriptions (though the `wbmergeitems` request is made with `ignoreconflicts: 'description'` – then I think `wbmergeitems` only makes two edits, “merged item into” (remove most data from source) and “merged item from” (add data to target); the merge gadget then follows this with separate API requests `wbeditentity` with `clear: true` “cleared an item” (remove descriptions from source) and `wbcreateredirect` “redirected to” (turn source into redirect). We can see this in the different mw servers in the timeline: I think that mw1358 served the original `wbmergeitems` request, mw1448 the `wbeditentity` (clear), and mw1437 ran a job. The “schedule deleteTermsOfEntity” log message that should correspond to the `wbcreateredirect` request seems to have gone missing, as already mentioned, otherwise we should see a fourth mw server. This timeline claims that the final `saveTermsOfEntity` for the target item was scheduled and run //after// everything was done on the source item – including the additional API requests to clear and redirect the source item! I find this hard to believe right now: the merge gadget only makes those requests after `wbmergeitems` has returned, and I would think that at least the scheduling of the target item `saveTermsOfEntity` must happen before the API response is sent. (The mw1358 requests all have the same request ID, so it’s not just coincidentally the same backend server, it really is the same request… but I wonder if the clock can be trusted?) Also, I’m still wondering why there’s no sign of a “clean” job for the other `term_in_lang` ID. TASK DETAIL https://phabricator.wikimedia.org/T309445 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucas_Werkmeister_WMDE Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org