jayvdb created this task.
jayvdb added subscribers: jayvdb, Legoktm.
jayvdb added a project: pywikibot-core.
Restricted Application added subscribers: Aklapper, pywikipedia-bugs.
TASK DESCRIPTION
(This may be only an obscure problem which doesnt have a lot of impact, but
we need to be very sure that we know the limitations, and pywikibot developers
and script writers need to be made aware of them.)
We know that links on a page (as exposed via the API) may not out of sync
with the page. There isnt clear documentation explaining all the ways this can
occur.
Page.templatesWithParams has this warning as a comment:
```
# WARNING: may not return all templates used in particularly
# intricate cases such as template substitution
```
Usually bots dont need 100% accurate information, but there are times when
they need to be accurate. e.g. honouring {{nobots}} templates is a pretty hard
requirement. However if {{nobots}} is only used on pages in ways that are
always accurate, the botMayEdit function doesnt need to worry about the edge
cases in the underlying MediaWiki parser. Using the {{nobots}} example, it is
typically a top level object in the parse tree, and not included in intricate
templates, or transcluded. The documentation at
https://en.wikipedia.org/wiki/Template:Bots is fairly clear that usage of the
template should be simple and bots may ignore the template if it is not used in
a bot-friendly way.
One example is parser functions are lazy evaluated (T10314), leading to bugs
like {T20478}.
It seems that MW also doesnt know that the links are out of date, so it isnt
just intentional deferment of an expensive db update.
[[https://mwparserfromhell.readthedocs.org/en/latest/ | mwparserfromhell]]
and the various regex in pywikibot are far from perfect also, and are unlikely
to ever support parser functions, so they cant be used as a way to reliably get
links. And it wouldnt surprise me if these approaches also match links
(templates, categories,etc) in unevaluated portions of the wikicode, which
means they would include links which do not exist.
Maybe the only way to make Page methods reliable (wrt to *used* links) is to
have a parameter that causes PWB to do a `purge` with `forcelinkupdate` before
using these API methods.
Another option is to use
[[https://www.mediawiki.org/wiki/API:Parsing_wikitext|API parse]] to obtain the
links.
This may be a situation where we need to ensure the Pywikibot devs and
Wikimedia ops / dbs / devs are all on the same page before we proceed with a
solution that works for Wikimedia sites.
TASK DETAIL
https://phabricator.wikimedia.org/T101596
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb
Cc: Legoktm, jayvdb, Aklapper, pywikipedia-bugs
_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs