Lucas_Werkmeister_WMDE added a comment.
In T309445#8055711 <https://phabricator.wikimedia.org/T309445#8055711>, @Lucas_Werkmeister_WMDE wrote: > Also, mysteriously, the “schedule deleteTermsOfEntity” log message seems to be missing – not just for this merge, but in general, there are no instances of this message after midnight UTC yesterday/today. (The last instances before then were at 23:57 UTC, so the cutoff is suspiciously close to midnight – see Logstash of this message four hours before+after midnight <https://logstash.wikimedia.org/goto/74573b4a79aa0eea25c35307aface569>.) This is still happening – the “schedule deleteTermsOfEntity” log messages vanished starting on the 6th of July and haven’t come back since then. Logstash <https://logstash.wikimedia.org/goto/9f1c7ed1b9aebfbfa2fdd58f6c1157f1> F35315354: image.png <https://phabricator.wikimedia.org/F35315354> I don’t know what to make of this. It looks like the code still runs, in that redirects get their term_in_langs deleted. --- Found another merge, 3 turn <https://www.wikidata.org/w/index.php?title=Q3945846&diff=1675942567&oldid=850742120>. The English alias’ text_in_lang is gone from the term store: MariaDB [wikidatawiki]> SELECT wbit_item_id, wbit_term_in_lang_id, wby_name, wbtl_text_in_lang_id, wbxl_id FROM wbt_item_terms LEFT JOIN wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wbit_item_id IN (3945846) AND wbxl_id IS NULL; +--------------+----------------------+----------+----------------------+---------+ | wbit_item_id | wbit_term_in_lang_id | wby_name | wbtl_text_in_lang_id | wbxl_id | +--------------+----------------------+----------+----------------------+---------+ | 3945846 | 960876949 | alias | 955984243 | NULL | +--------------+----------------------+----------+----------------------+---------+ 1 row in set (0.006 sec) Logstash board <https://logstash.wikimedia.org/goto/1715411c720e8fed4f599a2d8231ef8e>: only 19 messages this time. Note that the merged item (history <https://www.wikidata.org/w/index.php?action=history&title=Q20918681>) had had an English label and description added to it right before it was merged into the other item. (The other item conveniently didn’t receive any edits immediately before or after the merge.) The two separate edits happened at :00 and :01, whereas the merge took place after :27: MariaDB [wikidatawiki]> SELECT rev_timestamp FROM revision WHERE rev_page = (SELECT page_id FROM page WHERE page_namespace = 0 AND page_title = 'Q20918681') ORDER BY rev_timestamp DESC; +----------------+ | rev_timestamp | +----------------+ | 20220712152628 | | 20220712152628 | | 20220712152627 | | 20220712152601 | | 20220712152600 | | 20181023231241 | | 20181023231235 | | 20150906094514 | +----------------+ 8 rows in set (0.001 sec) Also, we’re again dealing with an extra API request to clear the item, due to a description conflict. There are four request IDs in logstash (note that jobs have the same request ID as the request that triggered them), which I’ll give nicknames so I can refer to them below: - “addlabel”: request ID `7a16bb17-58ed-48db-9cb6-1ada9c7cfe49` – added the English label - “adddescription”: request ID `dd5ec054-d8db-4f92-b6ed-687fa76dd039` – added the English description - “mergeitems”: request ID `4ba56d96-123d-4047-b6b0-f01142c42e60` – tried to merge one item into the other, added the data to the target item, but did not redirect the source item due to the description conflict - “clearitem”: request ID `8ff656c2-dfa9-4112-8bdd-74682b150486` – cleared the source item (removing the conflicting English description) to prepare it for being redirected to the target item I assume there must have been a fifth request, “createredirect”, but its “schedule deleteTermsOfEntity” message did not get logged, and so there’s no trace of it in the `WikibaseTerms` channel. The timeline of the log messages is: - 15:26:01.032, “addlabel”: schedule saveTermsOfEntity for Q20918681 - 15:26:01.076, “addlabel”: run saveTermsOfEntity for Q20918681 (2 labels, 0 descriptions, 0 aliases) - 15:26:01.478, “adddescription”: schedule saveTermsOfEntity for Q20918681 - 15:26:01.498, “adddescription”: run saveTermsOfEntity for Q20918681 (2 labels, 1 description, 0 aliases) - 15:26:27.764, “mergeitems”: schedule saveTermsOfEntity for Q20918681 - 15:26:27.961, “mergeitems”: run saveTermsOfEntity for Q20918681 (0 labels, 1 description, 0 aliases) - 15:26:27.969, “mergeitems”: schedule CleanTermsIfUnusedJob for Q20918681 (term_in_lang 56239649) - 15:26:27.981, “mergeitems”: schedule CleanTermsIfUnusedJob for Q20918681 (term_in_lang 960876926) - 15:26:28.011, “mergeitems”: running CleanTermsIfUnusedJob for Q20918681 (term_in_lang 56239649) - 15:26:28.017, “mergeitems”: ran CleanTermsIfUnusedJob for Q20918681 (term_in_lang 56239649) - 15:26:28.029, “mergeitems”: running CleanTermsIfUnusedJob for Q20918681 (term_in_lang 960876926) - 15:26:28.040, “mergeitems”: ran CleanTermsIfUnusedJob for Q20918681 (term_in_lang 960876926) - 15:26:28.077, “mergeitems”: schedule saveTermsOfEntity for Q3945846 - 15:26:28.204, “mergeitems”: run saveTermsOfEntity for Q3945846 (7 labels, 2 descriptions, 3 aliases) - 15:26:28.301, “clearitem”: schedule saveTermsOfEntity for Q20918681 - 15:26:28.321, “clearitem”: run saveTermsOfEntity for Q20918681 (0 labels, 0 descriptions, 0 aliases) - 15:26:28.326, “clearitem”: schedule CleanTermsIfUnusedJob for Q20918681 (term_in_lang 960876927) - 15:26:28.393, “clearitem”: running CleanTermsIfUnusedJob for Q20918681 (term_in_lang 960876927) - 15:26:28.407, “clearitem”: ran CleanTermsIfUnusedJob for Q20918681 (term_in_lang 960876927) At least the requests that we have logs for are nicely sequential and not interleaved. Here are the mentioned term_in_lang IDs: MariaDB [wikidatawiki]> SELECT wbit_item_id, wbit_term_in_lang_id, wbtl_id, wby_name, wbtl_text_in_lang_id, wbxl_id, wbxl_text_id, wbxl_language, wbx_id, wbx_text FROM wbt_item_terms LEFT JOIN wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wbtl_id IN (56239649, 960876926, 960876927) AND wbit_item_id IN (20918681, 3945846); +--------------+----------------------+----------+----------+----------------------+----------+--------------+---------------+----------+----------+ | wbit_item_id | wbit_term_in_lang_id | wbtl_id | wby_name | wbtl_text_in_lang_id | wbxl_id | wbxl_text_id | wbxl_language | wbx_id | wbx_text | +--------------+----------------------+----------+----------+----------------------+----------+--------------+---------------+----------+----------+ | 3945846 | 56239649 | 56239649 | label | 39332690 | 39332690 | 15585948 | fi | 15585948 | Kolmonen | +--------------+----------------------+----------+----------+----------------------+----------+--------------+---------------+----------+----------+ 1 row in set (0.001 sec) Only the Finnish label is left, the other two term_in_langs presumably got cleaned. But the term_in_lang that we saw as incomplete above – ID 960876949 – doesn’t occur in this log at all! It looks like, whatever causes the “schedule deleteTermsOfEntity” log messages to vanish, actually causes //all// messages after that point to vanish? And this even propagates across jobs? I don’t understand how this is possible – but otherwise, I would expect some message in logstash about a `CleanTermsIfUnusedJob` with `target:960876949`, and I can’t find it. TASK DETAIL https://phabricator.wikimedia.org/T309445 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucas_Werkmeister_WMDE Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, Aklapper, Moebeus, Astuthiodit_1, Invadibot, Universal_Omega, maantietaja, ItamarWMDE, Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org