jayvdb created this task.
jayvdb added subscribers: jayvdb, Legoktm.
jayvdb added a project: pywikibot-core.
Restricted Application added subscribers: Aklapper, pywikipedia-bugs.

TASK DESCRIPTION
  (This may be only an obscure problem which doesnt have a lot of impact, but 
we need to be very sure that we know the limitations, and pywikibot developers 
and script writers need to be made aware of them.)
  
  We know that links on a page (as exposed via the API) may not out of sync 
with the page.  There isnt clear documentation explaining all the ways this can 
occur.
  
  Page.templatesWithParams has this warning as a comment:
  
  ```
          # WARNING: may not return all templates used in particularly
          # intricate cases such as template substitution
  ```
  
  Usually bots dont need 100% accurate information, but there are times when 
they need to be accurate. e.g. honouring {{nobots}} templates is a pretty hard 
requirement.  However if {{nobots}} is only used on pages in ways that are 
always accurate, the botMayEdit function doesnt need to worry about the edge 
cases in the underlying MediaWiki parser.  Using the {{nobots}} example, it is 
typically a top level object in the parse tree, and not included in intricate 
templates, or transcluded.  The documentation at 
https://en.wikipedia.org/wiki/Template:Bots is fairly clear that usage of the 
template should be simple and bots may ignore the template if it is not used in 
a bot-friendly way.
  
  One example is parser functions are lazy evaluated (T10314), leading to bugs 
like {T20478}.
  
  It seems that MW also doesnt know that the links are out of date, so it isnt 
just intentional deferment of an expensive db update.
  
  [[https://mwparserfromhell.readthedocs.org/en/latest/ | mwparserfromhell]] 
and the various regex in pywikibot are far from perfect also, and are unlikely 
to ever support parser functions, so they cant be used as a way to reliably get 
links.  And it wouldnt surprise me if these approaches also match links 
(templates, categories,etc) in unevaluated portions of the wikicode, which 
means they would include links which do not exist.
  
  Maybe the only way to make Page methods reliable (wrt to *used* links) is to 
have a parameter that causes PWB to do a `purge` with `forcelinkupdate` before 
using these API methods.
  
  Another option is to use 
[[https://www.mediawiki.org/wiki/API:Parsing_wikitext|API parse]] to obtain the 
links.
  
  This may be a situation where we need to ensure the Pywikibot devs and 
Wikimedia ops / dbs / devs are all on the same page before we proceed with a 
solution that works for Wikimedia sites.

TASK DETAIL
  https://phabricator.wikimedia.org/T101596

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jayvdb
Cc: Legoktm, jayvdb, Aklapper, pywikipedia-bugs



_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs

Reply via email to