[Wikidata-bugs] [Maniphest] T309445: [Investigation] Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-26 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, Universal_Omega, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: [Investigation] Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-26 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE closed subtask T313039: Remove debug logging for item 
terms storage after merging as Resolved.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, Universal_Omega, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: [Investigation] Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-14 Thread Manuel
Manuel renamed this task from "INVESTIGATION: Terms storage missing most item 
labels after merge, resulting in item IDs being shown" to "[Investigation] 
Terms storage missing most item labels after merge, resulting in item IDs being 
shown".

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, Manuel
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, Universal_Omega, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-13 Thread Manuel
Manuel edited projects, added Wikidata-Campsite; removed Wikidata Dev Team 
(Sprint-♾), Special:NewLexeme revival (Special:NewLexeme revival - sprint 11), 
Wikidata-Campsite (Team A Hearth ).
Manuel added a comment.


  Ah okay I see, yes that makes sense!
  
  Thank you, Lucas, for everything that you have put into this and for 
documenting it so well!
  
  Let's now do what you suggested and end the investigation at this point. I 
removed the task from the iteration boards so that I can keep the task open as 
a reference for the future.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, Manuel
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, Universal_Omega, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-13 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  No, I don’t think so. The merge gadget could in theory do it, but I’m not 
sure that would be a good idea, since doing the purge right after the merge 
probably doesn’t help with the race condition.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, Universal_Omega, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-13 Thread Manuel
Manuel added a comment.


  Is the "purge with links update" something we should do routinely after 
merges?

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, Manuel
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, Universal_Omega, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-12 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  Trying something slightly different. I think the Still Life with Pie and 
Silver Ewer 
,
 inspected previously (T309445#8055711 
), is too confusing to be of 
much further use in its broken state; therefore, I decided to purge it. An API 
purge with `forcelinkupdate` set successfully fixed all the terms of the item:
  
MariaDB [wikidatawiki]> SELECT wbit_item_id, wbit_term_in_lang_id, 
wby_name, wbtl_text_in_lang_id, wbxl_id FROM wbt_item_terms LEFT JOIN 
wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON 
wbtl_type_id = wby_id LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = 
wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wbit_item_id IN 
(17524323) AND wbxl_id IS NULL;
Empty set (0.002 sec)
  
  And the partial rows from before seem to have been properly cleaned up as 
well, there is no trace left of either the term_in_lang IDs or the text_in_lang 
ID:
  
MariaDB [wikidatawiki]> SELECT wbtl_id, wby_name, wbtl_text_in_lang_id, 
wbxl_id FROM wbt_term_in_lang LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT 
JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON 
wbxl_text_id = wbx_id WHERE wbtl_id IN (906991069, 957282069);
Empty set (0.001 sec)

MariaDB [wikidatawiki]> SELECT * FROM wbt_term_in_lang WHERE 
wbtl_text_in_lang_id = 228497807;
Empty set (0.001 sec)
  
  So I think one thing I //am// confident to say (for a change!) is that **a 
purge with links update fixes the issue**; and I extrapolate from this that **a 
subsequent edit probably fixes the issue as well**. (Note: on other wikis, a 
“purge with links update” is equivalent to a “null edit”, but you can’t null 
edit an item. Also, a “UI purge”, via `?action=purge` in the normal URL, does 
not include a links update, and will not fix the issue.)

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, Universal_Omega, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-12 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  In T309445#8055711 , 
@Lucas_Werkmeister_WMDE wrote:
  
  > Also, mysteriously, the “schedule deleteTermsOfEntity” log message seems to 
be missing – not just for this merge, but in general, there are no instances of 
this message after midnight UTC yesterday/today. (The last instances before 
then were at 23:57 UTC, so the cutoff is suspiciously close to midnight – see 
Logstash of this message four hours before+after midnight 
.)
  
  This is still happening – the “schedule deleteTermsOfEntity” log messages 
vanished starting on the 6th of July and haven’t come back since then. Logstash 

  F35315354: image.png 
  I don’t know what to make of this. It looks like the code still runs, in that 
redirects get their term_in_langs deleted.
  
  ---
  
  Found another merge, 3 turn 
.
 The English alias’ text_in_lang is gone from the term store:
  
MariaDB [wikidatawiki]> SELECT wbit_item_id, wbit_term_in_lang_id, 
wby_name, wbtl_text_in_lang_id, wbxl_id FROM wbt_item_terms LEFT JOIN 
wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON 
wbtl_type_id = wby_id LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = 
wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wbit_item_id IN 
(3945846) AND wbxl_id IS NULL;

+--+--+--+--+-+
| wbit_item_id | wbit_term_in_lang_id | wby_name | wbtl_text_in_lang_id | 
wbxl_id |

+--+--+--+--+-+
|  3945846 |960876949 | alias|955984243 |   
 NULL |

+--+--+--+--+-+
1 row in set (0.006 sec)
  
  Logstash board 
: only 19 
messages this time. Note that the merged item (history 
) had had 
an English label and description added to it right before it was merged into 
the other item. (The other item conveniently didn’t receive any edits 
immediately before or after the merge.) The two separate edits happened at :00 
and :01, whereas the merge took place after :27:
  
MariaDB [wikidatawiki]> SELECT rev_timestamp FROM revision WHERE rev_page = 
(SELECT page_id FROM page WHERE page_namespace = 0 AND page_title = 
'Q20918681') ORDER BY rev_timestamp DESC;
++
| rev_timestamp  |
++
| 20220712152628 |
| 20220712152628 |
| 20220712152627 |
| 20220712152601 |
| 20220712152600 |
| 20181023231241 |
| 20181023231235 |
| 20150906094514 |
++
8 rows in set (0.001 sec)
  
  Also, we’re again dealing with an extra API request to clear the item, due to 
a description conflict. There are four request IDs in logstash (note that jobs 
have the same request ID as the request that triggered them), which I’ll give 
nicknames so I can refer to them below:
  
  - “addlabel”: request ID `7a16bb17-58ed-48db-9cb6-1ada9c7cfe49` – added the 
English label
  - “adddescription”: request ID `dd5ec054-d8db-4f92-b6ed-687fa76dd039` – added 
the English description
  - “mergeitems”: request ID `4ba56d96-123d-4047-b6b0-f01142c42e60` – tried to 
merge one item into the other, added the data to the target item, but did not 
redirect the source item due to the description conflict
  - “clearitem”: request ID `8ff656c2-dfa9-4112-8bdd-74682b150486` – cleared 
the source item (removing the conflicting English description) to prepare it 
for being redirected to the target item
  
  I assume there must have been a fifth request, “createredirect”, but its 
“schedule deleteTermsOfEntity” message did not get logged, and so there’s no 
trace of it in the `WikibaseTerms` channel.
  
  The timeline of the log messages is:
  
  - 15:26:01.032, “addlabel”: schedule saveTermsOfEntity for Q20918681
  - 15:26:01.076, “addlabel”: run saveTermsOfEntity for Q20918681 (2 labels, 0 
descriptions, 0 aliases)
  - 15:26:01.478, “adddescription”: schedule saveTermsOfEntity for Q20918681
  - 15:26:01.498, “adddescription”: run saveTermsOfEntity for Q20918681 (2 
labels, 1 description, 0 aliases)
  - 15:26:27.764, “mergeitems”: schedule saveTermsOfEntity for Q20918681
  - 15:26:27.961, “mergeitems”: run saveTermsOfEntity for Q20918681 (0 labels, 
1 description, 0 aliases)
  - 15:26:27.969, “mergeitems”: schedule CleanTermsIfUnusedJob for Q20918681 
(term_in_lang 56239649)
  - 15:26:27.981, “mergeitems”: schedule CleanTermsIfUnusedJob for Q20918681 
(term_in_lang 960876926)
  - 

[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-12 Thread karapayneWMDE
karapayneWMDE added a project: Wikidata Dev Team (Sprint-♾).

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, karapayneWMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, Universal_Omega, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-11 Thread ReleaseTaggerBot
ReleaseTaggerBot added a project: MW-1.39-notes (1.39.0-wmf.21; 2022-07-18).

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, ReleaseTaggerBot
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, Universal_Omega, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-06 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  (I had a quick look at mw1358 and mw1448, and their clocks seemed to be no 
more than 20 ms apart, not enough to fully explain the timeline being 
confusing. Also, we seem to be running `systemd-timesyncd` in production, with 
Wikimedia-specific time source servers. It doesn’t look like we can blame the 
confusing timeline on clock drift.)

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-06 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  Found a new example, Still Life with Pie and Silver Ewer 
.
 The merge added a German label, one Dutch and English alias each, and a West 
Frisian description; the West Frisian description and English alias exist in 
the term store, but the German label and Dutch alias are missing, in an 
especially interesting manner:
  
MariaDB [wikidatawiki]> SELECT wbit_item_id, wbit_term_in_lang_id, 
wby_name, wbtl_text_in_lang_id, wbxl_id FROM wbt_item_terms LEFT JOIN 
wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON 
wbtl_type_id = wby_id LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = 
wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wbit_item_id IN 
(17524323) AND wbxl_id IS NULL;

+--+--+--+--+-+
| wbit_item_id | wbit_term_in_lang_id | wby_name | wbtl_text_in_lang_id | 
wbxl_id |

+--+--+--+--+-+
| 17524323 |906991069 | NULL | NULL |   
 NULL |
| 17524323 |957282069 | alias|228497807 |   
 NULL |

+--+--+--+--+-+
2 rows in set (0.015 sec)
  
  For the alias, the `wbt_text_in_lang` was cleaned up, but the 
`wbt_term_in_lang` remained behind; for the label, seemingly, even the 
`wbt_term_in_lang` is gone.
  
  The logstash board 
 for this 
is unfortunately a bit long, 84 events in total (because each term ID gets a 
separate clean job scheduled, don’t know why). Also, mysteriously, the 
“schedule deleteTermsOfEntity” log message seems to be missing – not just for 
this merge, but in general, there are no instances of this message after 
midnight UTC yesterday/today. (The last instances before then were at 23:57 
UTC, so the cutoff is suspiciously close to midnight – see Logstash of this 
message four hours before+after midnight 
.)
  
  Searching for “schedule/running/ran CleanTermsIfUnusedJob” 
 for the 
two `term_in_lang` IDs in the SQL output above, there is only one set of 
messages: that for the ID `906991069` (the label). Looking for just those jobs 
plus all the non-job messages (link 
), we get 
this timeline, in terms of seconds after 13:24 UTC:
  
  - 34.756 (mw1358): schedule saveTermsOfEntity for Q28060661 (the source item)
  - 35.105: (mw1358) run saveTermsOfEntity for Q28060661
  - 35.146 (mw1448): schedule saveTermsOfEntity for Q28060661
  - 35.161 (mw1448): run saveTermsOfEntity for Q28060661
  - 35.472 (mw1358): schedule CleanTermsIfUnusedJob for Q28060661
  - 35.532 (mw1437): running CleanTermsIfUnusedJob for Q28060661
  - 35.553 (mw1437): ran CleanTermsIfUnusedJob for Q28060661
  - 35.561 (mw1358): schedule saveTermsOfEntity for Q17524323 (the target item)
  - 35.835 (mw1358): run saveTermsOfEntity for Q17524323
  
  The source item history 
 also 
shows //three// edits as part of the merge there, instead of the usual two 
(plus one on the target item) that you can see e.g. on the José Barreto source 
item : 
Between “merged item into” (remove most data) and “redirected to” (turn into 
redirect), there is also “cleared an item” (remove four descriptions). This 
comes from a property of the merge gadget that I overlooked earlier – it 
doesn’t always just make a `wbmergeitems` API request.
  
  - If `wbmergeitems` actually redirects the item, then the merge gadget is 
basically done, and all three total edits – “merged item into” (remove all data 
from source), “redirected to” (turn source into redirect), “merged item from” 
(add data to target) – come from the one API request.
  - If `wbmergeitems` //doesn’t// redirect the item – I think this happens when 
there are conflicting descriptions (though the `wbmergeitems` request is made 
with `ignoreconflicts: 'description'` – then I think `wbmergeitems` only makes 
two edits, “merged item into” (remove most data from source) and “merged item 
from” (add data to target); the merge gadget then follows this with separate 
API requests `wbeditentity` with `clear: true` “cleared an item” (remove 
descriptions from source) and `wbcreateredirect` “redirected to” (turn source 
into redirect).
  
  We can see this in the different mw servers in the timeline: I think that 
mw1358 served the original `wbmergeitems` request, mw1448 the `wbeditentity` 
(clear), and mw1437 ran a job. The “schedule 

[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-04 Thread Maintenance_bot
Maintenance_bot removed a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, Maintenance_bot
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331, 
Hellket777, 786, Biggs657, Juan90264, Alter-paule, Beast1978, Un1tY, Hook696, 
Kent7301, joker88john, CucyNoiD, Gaboe420, Giuliamocci, Cpaulf30, Af420, 
Bsandipan, Lewizho99, Maathavan, Neuronton
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-04 Thread gerritbot
gerritbot added a comment.


  Change 810921 **merged** by jenkins-bot:
  
  [mediawiki/extensions/Wikibase@master] Add debug logging for term storage 
cleanup
  
  https://gerrit.wikimedia.org/r/810921

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, gerritbot
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Hellket777, Astuthiodit_1, 786, Biggs657, Invadibot, 
maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, lucamauri, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, 
QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-04 Thread gerritbot
gerritbot added a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, gerritbot
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Hellket777, Astuthiodit_1, 786, Biggs657, Invadibot, 
maantietaja, Juan90264, Alter-paule, Beast1978, ItamarWMDE, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, lucamauri, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, 
QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-04 Thread gerritbot
gerritbot added a comment.


  Change 810921 had a related patch set uploaded (by Lucas Werkmeister (WMDE); 
author: Lucas Werkmeister (WMDE)):
  
  [mediawiki/extensions/Wikibase@master] Add debug logging for term storage 
cleanup
  
  https://gerrit.wikimedia.org/r/810921

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, gerritbot
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-04 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  Coming back to José Barreto for a moment, since that item hasn’t been edited 
yet and is still missing the terms: The missing alias («José Alberto Barreto», 
Spanish) is not only missing from the term store as a whole; it’s not even 
present in the innermost text table:
  
MariaDB [wikidatawiki]> SELECT * FROM wbt_text WHERE wbx_text = 'José 
Alberto Barreto';
Empty set (0.001 sec)
  
  (And just to confirm that this isn’t an input encoding issue with the é – the 
text without the middle name works:)
  
MariaDB [wikidatawiki]> SELECT * FROM wbt_text WHERE wbx_text = 'José 
Barreto';
+--+---+
| wbx_id   | wbx_text  |
+--+---+
| 18515003 | José Barreto  |
+--+---+
1 row in set (0.001 sec)
  
  But there is a trace of this alias in `wbt_term_in_lang` and 
`wbt_item_terms`! Success!!
  
MariaDB [wikidatawiki]> SELECT wbit_item_id, wbit_term_in_lang_id, 
wby_name, wbtl_text_in_lang_id, wbxl_id FROM wbt_item_terms LEFT JOIN 
wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON 
wbtl_type_id = wby_id LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = 
wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wbit_item_id IN 
(61827465) AND wbxl_id IS NULL;

+--+--+--+--+-+
| wbit_item_id | wbit_term_in_lang_id | wby_name | wbtl_text_in_lang_id | 
wbxl_id |

+--+--+--+--+-+
| 61827465 |951539057 | alias|946668473 |   
 NULL |

+--+--+--+--+-+
1 row in set (0.002 sec)
  
  So we must have some race condition here, where the `wbt_text_in_lang` (and 
`wbt_text`) gets cleaned up in parallel to its being used for a new 
`wbt_term_in_lang` (and `wbt_item_terms`) row.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-07-04 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  Okay, I found another broken merge. Museo diocesano di Capua 

 gained an Italian alias through a merge, but in the entityterms API 

 (and in the term store) it’s missing. However, in logstash 
, there 
are only two `WikibaseTerms` events for this ID:
  
  - 09:58:28.607 UTC: schedule `saveTermsOfEntity` for Q21196221
  - 09:58:28.813 UTC: run `saveTermsOfEntity` for Q21196221
  
  In the full log events for this request ID 
, we see 
a few more events:
  
  - Transaction spent 9ms in writes, under the 3s limit
  - schedule `saveTermsOfEntity` for Q55386461
  - run `saveTermsOfEntity` for Q55386461
  - schedule `saveTermsOfEntity` for Q21196221
  - Possibly redundant parse
  - run `saveTermsOfEntity` for Q21196221
  - schedule `deleteTermsOfEntity` for Q55386461
  - Possibly redundant parse
  - Possibly redundant parse
  
  (In this order – i.e. the Q55386461 `saveTermsOfEntity` //runs// before the 
Q21196221 one is scheduled!)
  
  There is no trace of an error here. So I think we’re looking at two (or more) 
separate issues here: as mentioned in T309445#8039853 
, sometimes writing the 
terms fails due to a database error, and it’s a bug that we don’t retry it; but 
also, unrelatedly, sometimes the terms don’t end up in the term store even 
though `ItemTermStoreWriterAdapter::saveTermsOfEntity` does //not// detect any 
error.
  
  I wonder if this is due to the fact that we’re making several edits (remove 
data from old item, add data to new item, redirect old item to new item) in one 
API request? At least, I assume that explains why we’re seeing some secondary 
data updates run before others are scheduled, and I could imagine that this 
causes some confusion.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-30 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  Although I //still// don’t see an exception corresponding to José Barreto 

 (from T309445#8023533 ), 
even when just looking through all the errors for that time regardless of item 
ID or other search terms.
  
  I guess I should look for some missing labels from merges after we deployed 
this logging, and see if there are any without corresponding log events.
  
  (If this is really due to locking errors as above, then I think in principle 
it could happen on any edit, but I guess merges make it much more likely that 
there’s a conflict around the same terms being removed somewhere and added 
elsewhere at the same time.)

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-30 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  In T309445#8028484 , 
@Lucas_Werkmeister_WMDE wrote:
  
  > …and while working on adding the logging, I suddenly find this:
  >
  > name=ItemTermStoreWriterAdapter::saveTermsOfEntity()
  > if ( $entity instanceof Item ) {
  > try {
  > $this->store->storeTerms( $entity->getId(), 
$entity->getFingerprint() );
  > return true;
  > } catch ( TermStoreException $ex ) {
  > return false;
  > }
  > }
  >
  > Small wonder there’s no trace of errors in Logstash…
  
  This turns out to be a red herring, there’s no code that actually throws this 
exception type. However, we can now see **uncaught exceptions** in logstash – 
they get logged to the normal mediawiki-errors board as well, we just didn’t 
find them before because they don’t include the item ID anywhere in the request 
URL or stack trace. These seem to happen in bursts – for example, 45 of them 
all between 5:19 and 5:26 AM today
  F35289127: image.png 
  – and the error seems to be consistent:
  
Error 1205: Lock wait timeout exceeded; try restarting transaction
Function: 
Wikibase\Lib\Store\Sql\Terms\DatabaseTermStoreWriterBase::acquireAndInsertTerms
Query: SELECT  wbit_term_in_lang_id AS `value`  FROM `wbt_item_terms`
WHERE wbit_item_id = 41443421   FOR UPDATE
  
  This can happen – I think we need to wrap the deferred update in a try/catch 
block and reschedule the update as a job if it fails. (And then rethrow the 
exception, so MediaWiki knows there was a DB error.) I guess when we wrote this 
code we might have assumed that MediaWiki did this automatically?
  
  (We might also reconsider if the `FOR UPDATE` there is necessary, but I think 
that would be a larger project to change.)

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-30 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE closed subtask T311307: Add debug logging for item terms 
storage after merging as Resolved.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-28 Thread karapayneWMDE
karapayneWMDE moved this task from Special:NewLexeme revival - sprint 10 to 
Special:NewLexeme revival - sprint 11 on the Special:NewLexeme revival board.
karapayneWMDE edited projects, added Special:NewLexeme revival 
(Special:NewLexeme revival - sprint 11); removed Special:NewLexeme revival 
(Special:NewLexeme revival - sprint 10).

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

WORKBOARD
  https://phabricator.wikimedia.org/project/board/5674/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, karapayneWMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-27 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  In T309445#8023699 , 
@Lydia_Pintscher wrote:
  
  > Does any of the lists of potential merge candidates at 
https://www.wikidata.org/wiki/Help:Merge#Finding_items_to_merge help?
  
  Thanks, I might try some of those later.
  
  
  
  (Aside about localhost: I tried to make my wiki serve API requests in 
parallel more, because I now suspect this is a race condition. Currently, I can 
send requests in parallel and they’ll be processed in parallel, and after 
setting `output_buffering = 0;` in php.init and therefore letting MediaWiki 
taking care of buffering and sending a `Content-Length` response header, the 
browser will also perceive the API requests to have completed as soon as the 
main request is done, so that secondary data updates actually take place 
post-send as expected and can potentially run in parallel to subsequent API 
requests. But for some reason, a `sleep()` in a post-send update will still 
prevent //later// API requests from starting processing, so that post-send 
updates still won’t overlap with requests that came in after the first request 
finished sending the response. It’s possible that session_write_close() 
 should help 
with this, or at least that it’s related to sessions – I stopped looking into 
it at this point, and started experimenting on Test Wikidata instead.)
  
  It turns out the merge gadget doesn’t do as many different API things as I 
thought. It has some pre-merge checks and some error handling, but at the API 
level, the merge it does is just:
  
  1. wbmergeitems (with `ignoreconflicts=description`)
  2. purge (“// XXX: Do we even need to purge? Why?”)
  3. load target item
  
  There is no extra API request for blanking the item or anything – if the 
merge makes several edits, they all come from `wbmergeitems`.
  
  If you just create two new items and then merge them with `wbmergeitems` on 
Test Wikidata , and then 
immediately get their `entityterms`:
  
api = new mw.Api(); now = new Date().toISOString(); item1 = ( await 
api.postWithEditToken( { action: 'wbeditentity', new: 'item', summary: 
'T309445', data: JSON.stringify( { labels: { en: { language: 'en', value: 
`${now} (label)` } } } ) } ) ).entity.id; console.log( item1 ); item2 = ( await 
api.postWithEditToken( { action: 'wbeditentity', new: 'item', summary: 
'T309445', data: JSON.stringify( { descriptions: { en: { language: 'en', value: 
`${now} (description)` } } } ) } ) ).entity.id; console.log( item2 ); await 
api.postWithEditToken( { action: 'wbmergeitems', fromid: item2, toid: item1, 
summary: 'T309445' } ); await api.get( { action: 'query', prop: 'entityterms', 
wbetlanguage: 'en', titles: item1, formatversion: 2 } ).then( r => 
r.query.pages[ 0 ].entityterms );
  
  Then the `entityterms` will fairly often just have the terms of the original 
item (a `label`), without the merged terms of the other item (a `description`). 
But this resolves itself after a second or so (once the secondary data updates 
of the merge finish), and if you repeat the `entityterms` request it’ll show 
all the terms as expected.
  
  So I tried matching the merge gadget more closely, by also purging the page 
and then getting its HTML:
  
api = new mw.Api(); now = new Date().toISOString(); item1 = ( await 
api.postWithEditToken( { action: 'wbeditentity', new: 'item', summary: 
'T309445', data: JSON.stringify( { labels: { en: { language: 'en', value: 
`${now} (label)` } } } ) } ) ).entity.id; console.log( item1 ); item2 = ( await 
api.postWithEditToken( { action: 'wbeditentity', new: 'item', summary: 
'T309445', data: JSON.stringify( { descriptions: { en: { language: 'en', value: 
`${now} (description)` } } } ) } ) ).entity.id; console.log( item2 ); await 
api.postWithEditToken( { action: 'wbmergeitems', fromid: item2, toid: item1, 
summary: 'T309445' } ); await api.post( { action: 'purge', titles: item1 } ); 
await fetch( `https://test.wikidata.org/wiki/${item1}` ); await api.get( { 
action: 'query', prop: 'entityterms', wbetlanguage: 'en', titles: item1, 
formatversion: 2 } ).then( r => r.query.pages[ 0 ].entityterms );
  
  But with this version, the logged `entityterms` are actually always complete 
(at least every time I tried this). It looks like the purge + render gives the 
secondary data updates enough time to finish.
  
  In theory, I think both the merge and the page view could attempt to write 
terms to the term store. (Though I’m not sure if loading an item page that 
isn’t in the parser cache will actually cause the secondary data updates to run 
again.) However, the page view should always have the merged item state, thanks 
to MediaWiki’s ChronologyProtector, so there shouldn’t be any race condition 
there. The only possible race condition I can see at the moment is if someone 
//else// 

[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-27 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  …and while working on adding the logging, I suddenly find this:
  
  name=ItemTermStoreWriterAdapter::saveTermsOfEntity()
if ( $entity instanceof Item ) {
try {
$this->store->storeTerms( $entity->getId(), 
$entity->getFingerprint() );
return true;
} catch ( TermStoreException $ex ) {
return false;
}
}
  
  Small wonder there’s no trace of errors in Logstash…

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-24 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  The simplest explanation for this whole business would be that the secondary 
data updates somehow just don’t happen. I don’t know why this would be – of 
course it’s possible that they crash for some reason, but I’d expect to see 
something in logstash in that case, and I’m not seeing anything – but at this 
point, I think this is a theory worth either confirming or ruling out, by 
**adding some logging.** I suggest that we add a temporary log channel that 
captures messages at any level (like, for example, the `DuplicateParse` 
channel), and make `ItemHandler::getSecondaryDataUpdates()` and 
`ItemTermStoreWriterAdapter::saveTermsOfEntity()` log debug messages indicating 
that the secondary data updates were created and executed, with some basic 
information about the terms being stored (number of labels, descriptions, and 
aliases across all languages; we don’t have access to the revision ID at this 
point, unfortunately).

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-24 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  I tested it: a regular `index.php?action=purge` or `action.php?action=purge` 
doesn’t trigger secondary data updates, not even when you visit the page 
afterwards; `api.php?action=purge=1` does trigger secondary 
data updates. The merge gadget doesn’t set `forcelinkupdate`, so the secondary 
data updates shouldn’t get triggered by it (other than through the edit).

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-24 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE claimed this task.
Lucas_Werkmeister_WMDE moved this task from To Do to Doing on the 
Special:NewLexeme revival (Special:NewLexeme revival - sprint 10) board.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

WORKBOARD
  https://phabricator.wikimedia.org/project/board/5983/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-23 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.


  Does any of the lists of potential merge candidates at 
https://www.wikidata.org/wiki/Help:Merge#Finding_items_to_merge help?

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-23 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  Okay, but after 50 repeats of the code from earlier 
 (i.e. 100 items, 50 
merges), I still haven’t been able to reproduce the issue locally once. So now 
I’m back to thinking that the issue doesn’t happen with `wbmergeitems` alone, 
but is related to the combination of API requests the merge gadget makes.
  
  But if it doesn’t happen very often, then trying to find enough merge pairs 
on real Wikidata to test this with might be challenging – porting the merge 
gadget code  to work 
on a local wiki might be more promising.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, Aklapper, Moebeus, 
Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, 
lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-23 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  Recent merges:
  
  - Perdikas Palaeontological Museum 
:
 the merge added an English alias which //is// present in the term store.
  - Cecilio Angulo 
:
 the merge added a Dutch alias which //is// present in the term store.
  - José Barreto 
:
 the merge added a Spanish alias which //is not// present in the term store.
  - hydrosulfide 
:
 the merge added a Uzbek label which //is// present in the term store.
  - Kummerower See 
:
 the merge added a Serbian label which //is// present in the term store.
  
  So the issue still seems to happen occasionally (José Barreto), but not all 
the time, and in fact not even most of the time.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, Aklapper, Moebeus, 
Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, 
lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-23 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  Found and merged another pair 
,
 again unable to reproduce the issue:
  
  name=before merge
MariaDB [wikidatawiki]> SELECT wbit_item_id, wby_name, wbx_text, 
wbxl_language FROM wbt_item_terms JOIN wbt_term_in_lang ON wbit_term_in_lang_id 
= wbtl_id JOIN wbt_type ON wbtl_type_id = wby_id JOIN wbt_text_in_lang ON 
wbtl_text_in_lang_id = wbxl_id JOIN wbt_text ON wbxl_text_id = wbx_id WHERE 
wbit_item_id IN (111854511, 112670581);

+--+-+-+---+
| wbit_item_id | wby_name| wbx_text 

   | wbxl_language |

+--+-+-+---+
|111854511 | label   | Hum3D

   | en|
|111854511 | description | Hum3D sells high quality 3D models, which 
are all made by themselves. They have the biggest collection of cars on the 
Internet. | en|
|112670581 | label   | Hum3D

   | en|
|112670581 | label   | Hum3D

   | uk|

+--+-+-+---+
4 rows in set (0.004 sec)
  
  name=after merge
MariaDB [wikidatawiki]> SELECT wbit_item_id, wby_name, wbx_text, 
wbxl_language FROM wbt_item_terms JOIN wbt_term_in_lang ON wbit_term_in_lang_id 
= wbtl_id JOIN wbt_type ON wbtl_type_id = wby_id JOIN wbt_text_in_lang ON 
wbtl_text_in_lang_id = wbxl_id JOIN wbt_text ON wbxl_text_id = wbx_id WHERE 
wbit_item_id IN (111854511, 112670581);

+--+-+-+---+
| wbit_item_id | wby_name| wbx_text 

   | wbxl_language |

+--+-+-+---+
|111854511 | label   | Hum3D

   | en|
|111854511 | description | Hum3D sells high quality 3D models, which 
are all made by themselves. They have the biggest collection of cars on the 
Internet. | en|
|111854511 | label   | Hum3D

   | uk|

+--+-+-+---+
3 rows in set (0.001 sec)
  
  (Both had the same English label, so two of the rows got merged into one. But 
all the terms are still there.)
  
  Might be time to look at other recent merges and see if this issue still 
happens at all. Previously I was under the impression that it seemed to happen 
fairly reliably – I hadn’t found any merges where it didn’t happen…

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, Aklapper, Moebeus, 
Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, 
lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-23 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  Managed to find a pair of duplicates via the recent changes tagged as 
sitelink change from connected wiki 

 (the dewiki and ptwiki articles were clearly translations of one another, with 
a left parenthesis missing in the same place in both) and merged them 

 using the merge gadget. I was //not// able to reproduce the issue.
  
  name=before merge
MariaDB [wikidatawiki]> SELECT wbit_item_id, wby_name, wbx_text, 
wbxl_language FROM wbt_item_terms JOIN wbt_term_in_lang ON wbit_term_in_lang_id 
= wbtl_id JOIN wbt_type ON wbtl_type_id = wby_id JOIN wbt_text_in_lang ON 
wbtl_text_in_lang_id = wbxl_id JOIN wbt_text ON wbxl_text_id = wbx_id WHERE 
wbit_item_id IN (112664122, 112671429);
+--+--+--+---+
| wbit_item_id | wby_name | wbx_text | wbxl_language |
+--+--+--+---+
|112664122 | label| Vyeyra   | de|
|112671429 | label| Vyeyra   | es|
|112671429 | label| Vyeyra   | pt|
+--+--+--+---+
3 rows in set (0.005 sec)
  
  name=after merge
MariaDB [wikidatawiki]> SELECT wbit_item_id, wby_name, wbx_text, 
wbxl_language FROM wbt_item_terms JOIN wbt_term_in_lang ON wbit_term_in_lang_id 
= wbtl_id JOIN wbt_type ON wbtl_type_id = wby_id JOIN wbt_text_in_lang ON 
wbtl_text_in_lang_id = wbxl_id JOIN wbt_text ON wbxl_text_id = wbx_id WHERE 
wbit_item_id IN (112664122, 112671429);
+--+--+--+---+
| wbit_item_id | wby_name | wbx_text | wbxl_language |
+--+--+--+---+
|112664122 | label| Vyeyra   | de|
|112664122 | label| Vyeyra   | es|
|112664122 | label| Vyeyra   | pt|
+--+--+--+---+
3 rows in set (0.001 sec)

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, Aklapper, Moebeus, 
Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, 
lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-23 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.


  In T309445#7966722 , 
@Lucas_Werkmeister_WMDE wrote:
  
  > Tried reproducing this locally with the following snippet in the dev tools 
console … But didn’t succeed in reproducing the error, both labels are present 
in the terms store afterwards. It’s possible that this bug is related to some 
of the extra stuff that the merge gadget does.
  
  Haven’t been able to reproduce it on Test Wikidata using that wiki’s version 
of the Merge gadget either. I’m not sure how to test this on Wikidata – we’d 
need to have a source of duplicate items that we can test the merge behavior 
with (using the gadget vs. using Special:MergeItems vs. using the API).

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, Aklapper, Moebeus, 
Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, 
lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-23 Thread ItamarWMDE
ItamarWMDE moved this task from Tech backlog to Special:NewLexeme revival - 
sprint 10 on the Special:NewLexeme revival board.
ItamarWMDE edited projects, added Special:NewLexeme revival (Special:NewLexeme 
revival - sprint 10); removed Special:NewLexeme revival.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

WORKBOARD
  https://phabricator.wikimedia.org/project/board/5674/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ItamarWMDE
Cc: karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, Aklapper, Moebeus, 
Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, 
lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T309445: INVESTIGATION: Terms storage missing most item labels after merge, resulting in item IDs being shown

2022-06-14 Thread karapayneWMDE
karapayneWMDE renamed this task from "Terms storage missing most item labels 
after merge, resulting in item IDs being shown" to "INVESTIGATION: Terms 
storage missing most item labels after merge, resulting in item IDs being 
shown".
karapayneWMDE updated the task description.
karapayneWMDE set the point value for this task to "1".

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: karapayneWMDE
Cc: karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, Aklapper, Moebeus, 
Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, 
lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org