Lucas_Werkmeister_WMDE added a comment.

  In T309445#8055711 <https://phabricator.wikimedia.org/T309445#8055711>, 
@Lucas_Werkmeister_WMDE wrote:
  
  > Also, mysteriously, the “schedule deleteTermsOfEntity” log message seems to 
be missing – not just for this merge, but in general, there are no instances of 
this message after midnight UTC yesterday/today. (The last instances before 
then were at 23:57 UTC, so the cutoff is suspiciously close to midnight – see 
Logstash of this message four hours before+after midnight 
<https://logstash.wikimedia.org/goto/74573b4a79aa0eea25c35307aface569>.)
  
  This is still happening – the “schedule deleteTermsOfEntity” log messages 
vanished starting on the 6th of July and haven’t come back since then. Logstash 
<https://logstash.wikimedia.org/goto/9f1c7ed1b9aebfbfa2fdd58f6c1157f1>
  F35315354: image.png <https://phabricator.wikimedia.org/F35315354>
  I don’t know what to make of this. It looks like the code still runs, in that 
redirects get their term_in_langs deleted.
  
  ---
  
  Found another merge, 3 turn 
<https://www.wikidata.org/w/index.php?title=Q3945846&diff=1675942567&oldid=850742120>.
 The English alias’ text_in_lang is gone from the term store:
  
    MariaDB [wikidatawiki]> SELECT wbit_item_id, wbit_term_in_lang_id, 
wby_name, wbtl_text_in_lang_id, wbxl_id FROM wbt_item_terms LEFT JOIN 
wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON 
wbtl_type_id = wby_id LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = 
wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wbit_item_id IN 
(3945846) AND wbxl_id IS NULL;
    
+--------------+----------------------+----------+----------------------+---------+
    | wbit_item_id | wbit_term_in_lang_id | wby_name | wbtl_text_in_lang_id | 
wbxl_id |
    
+--------------+----------------------+----------+----------------------+---------+
    |      3945846 |            960876949 | alias    |            955984243 |   
 NULL |
    
+--------------+----------------------+----------+----------------------+---------+
    1 row in set (0.006 sec)
  
  Logstash board 
<https://logstash.wikimedia.org/goto/1715411c720e8fed4f599a2d8231ef8e>: only 19 
messages this time. Note that the merged item (history 
<https://www.wikidata.org/w/index.php?action=history&title=Q20918681>) had had 
an English label and description added to it right before it was merged into 
the other item. (The other item conveniently didn’t receive any edits 
immediately before or after the merge.) The two separate edits happened at :00 
and :01, whereas the merge took place after :27:
  
    MariaDB [wikidatawiki]> SELECT rev_timestamp FROM revision WHERE rev_page = 
(SELECT page_id FROM page WHERE page_namespace = 0 AND page_title = 
'Q20918681') ORDER BY rev_timestamp DESC;
    +----------------+
    | rev_timestamp  |
    +----------------+
    | 20220712152628 |
    | 20220712152628 |
    | 20220712152627 |
    | 20220712152601 |
    | 20220712152600 |
    | 20181023231241 |
    | 20181023231235 |
    | 20150906094514 |
    +----------------+
    8 rows in set (0.001 sec)
  
  Also, we’re again dealing with an extra API request to clear the item, due to 
a description conflict. There are four request IDs in logstash (note that jobs 
have the same request ID as the request that triggered them), which I’ll give 
nicknames so I can refer to them below:
  
  - “addlabel”: request ID `7a16bb17-58ed-48db-9cb6-1ada9c7cfe49` – added the 
English label
  - “adddescription”: request ID `dd5ec054-d8db-4f92-b6ed-687fa76dd039` – added 
the English description
  - “mergeitems”: request ID `4ba56d96-123d-4047-b6b0-f01142c42e60` – tried to 
merge one item into the other, added the data to the target item, but did not 
redirect the source item due to the description conflict
  - “clearitem”: request ID `8ff656c2-dfa9-4112-8bdd-74682b150486` – cleared 
the source item (removing the conflicting English description) to prepare it 
for being redirected to the target item
  
  I assume there must have been a fifth request, “createredirect”, but its 
“schedule deleteTermsOfEntity” message did not get logged, and so there’s no 
trace of it in the `WikibaseTerms` channel.
  
  The timeline of the log messages is:
  
  - 15:26:01.032, “addlabel”: schedule saveTermsOfEntity for Q20918681
  - 15:26:01.076, “addlabel”: run saveTermsOfEntity for Q20918681 (2 labels, 0 
descriptions, 0 aliases)
  - 15:26:01.478, “adddescription”: schedule saveTermsOfEntity for Q20918681
  - 15:26:01.498, “adddescription”: run saveTermsOfEntity for Q20918681 (2 
labels, 1 description, 0 aliases)
  - 15:26:27.764, “mergeitems”: schedule saveTermsOfEntity for Q20918681
  - 15:26:27.961, “mergeitems”: run saveTermsOfEntity for Q20918681 (0 labels, 
1 description, 0 aliases)
  - 15:26:27.969, “mergeitems”: schedule CleanTermsIfUnusedJob for Q20918681 
(term_in_lang 56239649)
  - 15:26:27.981, “mergeitems”: schedule CleanTermsIfUnusedJob for Q20918681 
(term_in_lang 960876926)
  - 15:26:28.011, “mergeitems”: running CleanTermsIfUnusedJob for Q20918681 
(term_in_lang 56239649)
  - 15:26:28.017, “mergeitems”: ran CleanTermsIfUnusedJob for Q20918681 
(term_in_lang 56239649)
  - 15:26:28.029, “mergeitems”: running CleanTermsIfUnusedJob for Q20918681 
(term_in_lang 960876926)
  - 15:26:28.040, “mergeitems”: ran CleanTermsIfUnusedJob for Q20918681 
(term_in_lang 960876926)
  - 15:26:28.077, “mergeitems”: schedule saveTermsOfEntity for Q3945846
  - 15:26:28.204, “mergeitems”: run saveTermsOfEntity for Q3945846 (7 labels, 2 
descriptions, 3 aliases)
  - 15:26:28.301, “clearitem”: schedule saveTermsOfEntity for Q20918681
  - 15:26:28.321, “clearitem”: run saveTermsOfEntity for Q20918681 (0 labels, 0 
descriptions, 0 aliases)
  - 15:26:28.326, “clearitem”: schedule CleanTermsIfUnusedJob for Q20918681 
(term_in_lang 960876927)
  - 15:26:28.393, “clearitem”: running CleanTermsIfUnusedJob for Q20918681 
(term_in_lang 960876927)
  - 15:26:28.407, “clearitem”: ran CleanTermsIfUnusedJob for Q20918681 
(term_in_lang 960876927)
  
  At least the requests that we have logs for are nicely sequential and not 
interleaved. Here are the mentioned term_in_lang IDs:
  
    MariaDB [wikidatawiki]> SELECT wbit_item_id, wbit_term_in_lang_id, wbtl_id, 
wby_name, wbtl_text_in_lang_id, wbxl_id, wbxl_text_id, wbxl_language, wbx_id, 
wbx_text FROM wbt_item_terms LEFT JOIN wbt_term_in_lang ON wbit_term_in_lang_id 
= wbtl_id LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT JOIN 
wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON 
wbxl_text_id = wbx_id WHERE wbtl_id IN (56239649, 960876926, 960876927) AND 
wbit_item_id IN (20918681, 3945846);
    
+--------------+----------------------+----------+----------+----------------------+----------+--------------+---------------+----------+----------+
    | wbit_item_id | wbit_term_in_lang_id | wbtl_id  | wby_name | 
wbtl_text_in_lang_id | wbxl_id  | wbxl_text_id | wbxl_language | wbx_id   | 
wbx_text |
    
+--------------+----------------------+----------+----------+----------------------+----------+--------------+---------------+----------+----------+
    |      3945846 |             56239649 | 56239649 | label    |             
39332690 | 39332690 |     15585948 | fi            | 15585948 | Kolmonen |
    
+--------------+----------------------+----------+----------+----------------------+----------+--------------+---------------+----------+----------+
    1 row in set (0.001 sec)
  
  Only the Finnish label is left, the other two term_in_langs presumably got 
cleaned. But the term_in_lang that we saw as incomplete above – ID 960876949 – 
doesn’t occur in this log at all!
  
  It looks like, whatever causes the “schedule deleteTermsOfEntity” log 
messages to vanish, actually causes //all// messages after that point to 
vanish? And this even propagates across jobs? I don’t understand how this is 
possible – but otherwise, I would expect some message in logstash about a 
`CleanTermsIfUnusedJob` with `target:960876949`, and I can’t find it.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, Universal_Omega, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to