[Wikidata-bugs] [Maniphest] T297238: Wikidata edit did not update the langlinks tables on MediaWiki side

2024-03-13 Thread Michael
Michael closed this task as "Resolved".
Michael claimed this task.
Michael added a comment.
Restricted Application added a project: User-Michael.


  As this has seen no new comments in over two years, I'm closing it as 
probably fixed per my previous comment. Please reopen if this issue persists.

TASK DETAIL
  https://phabricator.wikimedia.org/T297238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Michael
Cc: Michael, Manuel, Nirmos, Ladsgroup, Xqt, Aklapper, Huji, 
pywikibot-bugs-list, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, WDoranWMF, DannyS712, Nandana, 
Amorymeltzer, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, 
Sethakill, dg711, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T297238: Wikidata edit did not update the langlinks tables on MediaWiki side

2022-02-01 Thread Michael
Michael added a project: Wikibase change dispatching scripts to jobs.
Michael added a comment.


  I looked into this in context of T299828 
:
  
  Originally, this seems to indeed having been caused by sometimes not 
scheduling the correct job if a new sitelink was added. That was fixed 
, and 
that fix is deployed since about January 20th. So for sitelinks being added 
after that, this should not be happening anymore.
  
  To fix pages from before that, it needs a purge //with forced links update//. 
This can be done via the api 
.
 A null-edit probably also works.
  
  While this should work automatically for future sitelinks being added to 
Wikidata, keep in mind that the job which triggers that refresh of links has 
currently a backlog of over **3 days 
**.
  
  Let me know if you find more recent examples of this, where the langlinks 
weren't updated after a few days.

TASK DETAIL
  https://phabricator.wikimedia.org/T297238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Michael
Cc: Michael, Manuel, Nirmos, Ladsgroup, Xqt, Aklapper, Huji, 
pywikibot-bugs-list, karapayneWMDE, Invadibot, maantietaja, Akuckartz, 
WDoranWMF, DannyS712, Nandana, Amorymeltzer, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, Sethakill, dg711, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, jayvdb, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T297238: Wikidata edit did not update the langlinks tables on MediaWiki side

2021-12-08 Thread Huji
Huji added a comment.


  I like that plan. I find it way about my paygrade (of 0 as a volunteer) to 
implement.

TASK DETAIL
  https://phabricator.wikimedia.org/T297238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Huji
Cc: Ladsgroup, Xqt, Aklapper, Huji, pywikibot-bugs-list, Invadibot, 
maantietaja, Akuckartz, WDoranWMF, DannyS712, Nandana, Amorymeltzer, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, Sethakill, dg711, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T297238: Wikidata edit did not update the langlinks tables on MediaWiki side

2021-12-08 Thread Ladsgroup
Ladsgroup added a comment.


  I can't answer that with 100% certainty but I think it's mostly the organic 
growth of this part of mediawiki. The only thing to keep in mind is that 
ParserCache for each page gets evicted after certain time (currently twenty 
days, seven days for talk pages) and it can be missing for lots of pages that 
don't get much views. So I suggest that the code look into PC first and if the 
entry is missing (or expired or not canonical or whatever and whatnot), it 
should still fall back to langlinks as generating the ParserCache entry is 
expensive.

TASK DETAIL
  https://phabricator.wikimedia.org/T297238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Ladsgroup, Xqt, Aklapper, Huji, pywikibot-bugs-list, Invadibot, 
maantietaja, Akuckartz, WDoranWMF, DannyS712, Nandana, Amorymeltzer, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, Sethakill, dg711, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T297238: Wikidata edit did not update the langlinks tables on MediaWiki side

2021-12-08 Thread Huji
Huji added a comment.


  I appreciate your explanation.
  
  Is there a reason the API should not use ParserCache output?

TASK DETAIL
  https://phabricator.wikimedia.org/T297238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Huji
Cc: Ladsgroup, Xqt, Aklapper, Huji, pywikibot-bugs-list, Invadibot, 
maantietaja, Akuckartz, WDoranWMF, DannyS712, Nandana, Amorymeltzer, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, Sethakill, dg711, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T297238: Wikidata edit did not update the langlinks tables on MediaWiki side

2021-12-08 Thread Ladsgroup
Ladsgroup added a comment.


  The web sidebar links come from content of ParserCache (which gets reparsed 
in edit/purge requests) but API (by proxy `langlinks` table) gets updated if 
purge has its explicit option or on edit. But for most cases, it should work 
the same and if wikidata purges don't update langlinks, then it's a bug. (While 
again, I can't help on fixing it, I just can explain how it works)

TASK DETAIL
  https://phabricator.wikimedia.org/T297238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Ladsgroup, Xqt, Aklapper, Huji, pywikibot-bugs-list, Invadibot, 
maantietaja, Akuckartz, WDoranWMF, DannyS712, Nandana, Amorymeltzer, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, Sethakill, dg711, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T297238: Wikidata edit did not update the langlinks tables on MediaWiki side

2021-12-08 Thread Huji
Huji added a project: MediaWiki-API.
Huji added a comment.


  In T297238#7556908 , 
@Ladsgroup wrote:
  
  > Three reasons I can think of:
  >
  > - Loading an item is heavy and requires a lot of network bandwith inside 
the dc. Plus it would put a lot of pressure on memcached and databases. See 
this as a cache layer
  > - Client wikis need to override those entries (sometimes) with `[[en:Foo]]`
  
  Wouldn't all those apply to the Web version too? But the Web version 
correctly shows the interwiki links based on Wikidata. If the web version can 
do it efficiently, why wouldn't the API do the same?
  
  More explicitly stated: when I go to 
https://en.wikipedia.org/wiki/Category:Companies_based_in_Edmonton I see an 
interwiki to fa page "رده:شرکت‌های ادمنتن" (which is compatible with Wikidata), 
but when I go to 
https://en.wikipedia.org/w/api.php?action=query=langlinks=Category:Companies%20based%20in%20Edmonton=
 I get the old fa interwiki as described above. Is there a reason that API 
returns a different response than the Web version. Is it even good practice?
  
  > - it allows more complex querying in labs, e.g. articles that have a lot of 
interwiki links but not in language foo.
  
  This one is not relevant to the question about API. I am not proposing that 
we should do away with the `langlinks` table in MediaWiki altogether. I am 
saying the Web and the API version of MediaWiki should use the same source of 
data (both Wikidata).

TASK DETAIL
  https://phabricator.wikimedia.org/T297238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Huji
Cc: Ladsgroup, Xqt, Aklapper, Huji, pywikibot-bugs-list, Invadibot, 
maantietaja, Akuckartz, WDoranWMF, DannyS712, Nandana, Amorymeltzer, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, Sethakill, dg711, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, jayvdb, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T297238: Wikidata edit did not update the langlinks tables on MediaWiki side

2021-12-08 Thread Ladsgroup
Ladsgroup added a comment.


  Three reasons I can think of:
  
  - Loading an item is heavy and requires a lot of network bandwith inside the 
dc. Plus it would put a lot of pressure on memcached and databases. See this as 
a cache layer
  - Client wikis need to override those entries (sometimes) with `[[en:Foo]]`
  - it allows more complex querying in labs, e.g. articles that have a lot of 
interwiki links but not in language foo.

TASK DETAIL
  https://phabricator.wikimedia.org/T297238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Ladsgroup, Xqt, Aklapper, Huji, pywikibot-bugs-list, Invadibot, 
maantietaja, Akuckartz, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T297238: Wikidata edit did not update the langlinks tables on MediaWiki side

2021-12-08 Thread Huji
Huji added a comment.


  Potentially dumb question: is there a reason MediaWiki's API uses its own 
`langlinks` table (rather than Wikidata) as the data source? The wiki pages 
themselves show the interwiki links based on Wikidata; why doesn't the API use 
the same source?

TASK DETAIL
  https://phabricator.wikimedia.org/T297238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Huji
Cc: Ladsgroup, Xqt, Aklapper, Huji, pywikibot-bugs-list, Invadibot, 
maantietaja, Akuckartz, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T297238: Wikidata edit did not update the langlinks tables on MediaWiki side

2021-12-08 Thread Ladsgroup
Ladsgroup added a comment.


  Since I'm pinged. Yes. This is sorta expected and due to CAP theorem 
. The more we become distributed and 
scale, the discrepancy gets bigger, jobs that trigger the refresh might fail. 
Might not get queued (used to be 1% of the time, now it's around 0.006% still 
non-zero) and the most importantly, there will be some latency between the 
change and it being reflected in Wikipedias. The most important thing is that 
we should have only one source of truth (SSOT principle) to avoid corruption.
  
  All of that being said, we can definitely do better. In my last project 
before departing WMDE, I worked on dispatching of changes from Wikidata to 
Wikipedias (and co) and noticed that the time between changes getting reflected 
in wikis is  quite small but only for changes that don't require re-parsing of 
a page (e.g. injecting rc entries). The reparsing goes into the same queue of 
reparsing pages caused by for example template changes (to be a bit more 
technical: `htmlCacheUpdate` queue in job runners) and that queue is quite busy 
(to put it mildly). Depending on the load of or if a used template has been 
changed, it can take days to reach that change.
  
  So to mitigate and reduce the latency:
  
  - Have a dedicated job for sitelink changes and for other sitelinks of that 
item and put them in a dedicated lane
- Having dedicated queues for higher priority work is more aligned with 
distributed computing principles (look at the Tannenbaum book)
- This is not as straightforward as it looks due to internals of Wikibase.
- I'm not in WMDE to help move it forward.
  - Reduce the load on `htmlCacheUpdate`. The amount of load on it is 
unbelievable.
- There are some tickets already. I remember T250205 
 from top of my head.
- I honestly don't know if people really looked into why it's so so massive
- A low-hanging fruit would be to avoid re-parsing transclusions when 
inside of `` has been changed in the template. So updating 
documentation would not cause a massive cascade of several million reparses.
- Fixes in this direction improves health and reliability of all of our 
systems, from ParserCache, to DB load, to appservers, to job runners, to edge 
caches. It's even greener.
- But it's a big endeavor and will take years at least.
  - As a terrible band-aid, you can write a bot to listen to rc entries of your 
wiki with source of wikidata and force a reparse (action=purge) if that 
injected change is a sitelink change.
- A complicating factor is that an article can subscribe to sitelink of 
another item and get notified for that sitelink changes (it's quite common in 
commons, pun intended). Make sure to avoid reparsing because of that.
  
  HTH

TASK DETAIL
  https://phabricator.wikimedia.org/T297238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Ladsgroup, Xqt, Aklapper, Huji, pywikibot-bugs-list, Invadibot, 
maantietaja, Akuckartz, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T297238: Wikidata edit did not update the langlinks tables on MediaWiki side

2021-12-07 Thread Huji
Huji added a comment.


  Not to distract the discussion on this specific case, but it seems like this 
is not an only case. T190667  from 
two years ago seems to be about a similar issue. And the answer offered there 
(namely "don't use langlinks table") is invalid because that is what MediaWiki 
itself is using for its API (which in turn is what Pywikibot uses to retrieve 
langlinks for a given page). Even T43387 
 from nine years ago seems related.
  
  Point being: do we have a process that randomly checks the consistency of 
MediaWiki langlinks table with Wikidata to detect such anomalies?

TASK DETAIL
  https://phabricator.wikimedia.org/T297238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Huji
Cc: Ladsgroup, Xqt, Aklapper, Huji, pywikibot-bugs-list, Invadibot, 
maantietaja, Akuckartz, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T297238: Wikidata edit did not update the langlinks tables on MediaWiki side

2021-12-07 Thread Huji
Huji renamed this task from "Pywikibot returns incorrect result when querying 
Wikidata" to "Wikidata edit did not update the langlinks tables on MediaWiki 
side".

TASK DETAIL
  https://phabricator.wikimedia.org/T297238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Huji
Cc: Ladsgroup, Xqt, Aklapper, Huji, pywikibot-bugs-list, Invadibot, 
maantietaja, Akuckartz, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org