Lucas_Werkmeister_WMDE added a comment.

  Found a new example, Still Life with Pie and Silver Ewer 
<https://www.wikidata.org/w/index.php?title=Q17524323&diff=1672132901&oldid=1672058773>.
 The merge added a German label, one Dutch and English alias each, and a West 
Frisian description; the West Frisian description and English alias exist in 
the term store, but the German label and Dutch alias are missing, in an 
especially interesting manner:
  
    MariaDB [wikidatawiki]> SELECT wbit_item_id, wbit_term_in_lang_id, 
wby_name, wbtl_text_in_lang_id, wbxl_id FROM wbt_item_terms LEFT JOIN 
wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON 
wbtl_type_id = wby_id LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = 
wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wbit_item_id IN 
(17524323) AND wbxl_id IS NULL;
    
+--------------+----------------------+----------+----------------------+---------+
    | wbit_item_id | wbit_term_in_lang_id | wby_name | wbtl_text_in_lang_id | 
wbxl_id |
    
+--------------+----------------------+----------+----------------------+---------+
    |     17524323 |            906991069 | NULL     |                 NULL |   
 NULL |
    |     17524323 |            957282069 | alias    |            228497807 |   
 NULL |
    
+--------------+----------------------+----------+----------------------+---------+
    2 rows in set (0.015 sec)
  
  For the alias, the `wbt_text_in_lang` was cleaned up, but the 
`wbt_term_in_lang` remained behind; for the label, seemingly, even the 
`wbt_term_in_lang` is gone.
  
  The logstash board 
<https://logstash.wikimedia.org/goto/4c6e1ce2f277dd795c5d5b4968d51c87> for this 
is unfortunately a bit long, 84 events in total (because each term ID gets a 
separate clean job scheduled, don’t know why). Also, mysteriously, the 
“schedule deleteTermsOfEntity” log message seems to be missing – not just for 
this merge, but in general, there are no instances of this message after 
midnight UTC yesterday/today. (The last instances before then were at 23:57 
UTC, so the cutoff is suspiciously close to midnight – see Logstash of this 
message four hours before+after midnight 
<https://logstash.wikimedia.org/goto/74573b4a79aa0eea25c35307aface569>.)
  
  Searching for “schedule/running/ran CleanTermsIfUnusedJob” 
<https://logstash.wikimedia.org/goto/1c1b4408d1438d84d0ce23d2058beccc> for the 
two `term_in_lang` IDs in the SQL output above, there is only one set of 
messages: that for the ID `906991069` (the label). Looking for just those jobs 
plus all the non-job messages (link 
<https://logstash.wikimedia.org/goto/ccba38b354ed6ceb88a625e84c1619ad>), we get 
this timeline, in terms of seconds after 13:24 UTC:
  
  - 34.756 (mw1358): schedule saveTermsOfEntity for Q28060661 (the source item)
  - 35.105: (mw1358) run saveTermsOfEntity for Q28060661
  - 35.146 (mw1448): schedule saveTermsOfEntity for Q28060661
  - 35.161 (mw1448): run saveTermsOfEntity for Q28060661
  - 35.472 (mw1358): schedule CleanTermsIfUnusedJob for Q28060661
  - 35.532 (mw1437): running CleanTermsIfUnusedJob for Q28060661
  - 35.553 (mw1437): ran CleanTermsIfUnusedJob for Q28060661
  - 35.561 (mw1358): schedule saveTermsOfEntity for Q17524323 (the target item)
  - 35.835 (mw1358): run saveTermsOfEntity for Q17524323
  
  The source item history 
<https://www.wikidata.org/w/index.php?title=Q28060661&action=history> also 
shows //three// edits as part of the merge there, instead of the usual two 
(plus one on the target item) that you can see e.g. on the José Barreto source 
item <https://www.wikidata.org/w/index.php?title=Q112678118&action=history>: 
Between “merged item into” (remove most data) and “redirected to” (turn into 
redirect), there is also “cleared an item” (remove four descriptions). This 
comes from a property of the merge gadget that I overlooked earlier – it 
doesn’t always just make a `wbmergeitems` API request.
  
  - If `wbmergeitems` actually redirects the item, then the merge gadget is 
basically done, and all three total edits – “merged item into” (remove all data 
from source), “redirected to” (turn source into redirect), “merged item from” 
(add data to target) – come from the one API request.
  - If `wbmergeitems` //doesn’t// redirect the item – I think this happens when 
there are conflicting descriptions (though the `wbmergeitems` request is made 
with `ignoreconflicts: 'description'` – then I think `wbmergeitems` only makes 
two edits, “merged item into” (remove most data from source) and “merged item 
from” (add data to target); the merge gadget then follows this with separate 
API requests `wbeditentity` with `clear: true` “cleared an item” (remove 
descriptions from source) and `wbcreateredirect` “redirected to” (turn source 
into redirect).
  
  We can see this in the different mw servers in the timeline: I think that 
mw1358 served the original `wbmergeitems` request, mw1448 the `wbeditentity` 
(clear), and mw1437 ran a job. The “schedule deleteTermsOfEntity” log message 
that should correspond to the `wbcreateredirect` request seems to have gone 
missing, as already mentioned, otherwise we should see a fourth mw server.
  
  This timeline claims that the final `saveTermsOfEntity` for the target item 
was scheduled and run //after// everything was done on the source item – 
including the additional API requests to clear and redirect the source item! I 
find this hard to believe right now: the merge gadget only makes those requests 
after `wbmergeitems` has returned, and I would think that at least the 
scheduling of the target item `saveTermsOfEntity` must happen before the API 
response is sent. (The mw1358 requests all have the same request ID, so it’s 
not just coincidentally the same backend server, it really is the same request… 
but I wonder if the clock can be trusted?) Also, I’m still wondering why 
there’s no sign of a “clean” job for the other `term_in_lang` ID.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to