Multichill created this task.
Herald added subscribers: pywikibot-bugs-list, Aklapper.
TASK DESCRIPTION
I dusted off an old bot that used to work. It uses the following code to get
all items that use a certain property:
repo = pywikibot.Site().data_repository()
ppage = pywikibot.PropertyPage(repo, u'Property:P197')
gen =
pagegenerators.NamespaceFilterPageGenerator(pagegenerators.ReferringPageGenerator(ppage,
withTemplateInclusion=False, onlyTemplateInclusion=False), namespaces=[0])
This used to return all usage, but now seems to get stuck at about 500 items.
The last item I get is Q455649 and that so happens to be the last item on
https://www.wikidata.org/w/index.php?title=Special:WhatLinksHere/Property:P197&limit=500
. So it looks like not all items are retrieved.
pagegenerators.ReferringPageGenerator is supposed to return all usage:
def ReferringPageGenerator(referredPage, followRedirects=False,
withTemplateInclusion=True,
onlyTemplateInclusion=False,
total=None, content=False):
"""Yield all pages referring to a specific page."""
return referredPage.getReferences(
follow_redirects=followRedirects,
withTemplateInclusion=withTemplateInclusion,
onlyTemplateInclusion=onlyTemplateInclusion,
total=total, content=content)
Total is not set so it should pass total=None to Page.getReferences()
def getReferences(self, follow_redirects=True, withTemplateInclusion=True,
onlyTemplateInclusion=False, redirectsOnly=False,
namespaces=None, total=None, content=False):
"""
Return an iterator all pages that refer to or embed the page.
If you need a full list of referring pages, use
C{pages = list(s.getReferences())}
@param follow_redirects: if True, also iterate pages that link to a
redirect pointing to the page.
@param withTemplateInclusion: if True, also iterate pages where self
is used as a template.
@param onlyTemplateInclusion: if True, only iterate pages where self
is used as a template.
@param redirectsOnly: if True, only iterate redirects to self.
@param namespaces: only iterate pages in these namespaces
@param total: iterate no more than this number of pages in total
@param content: if True, retrieve the content of the current version
of each referring page (default False)
"""
# N.B.: this method intentionally overlaps with backlinks() and
# embeddedin(). Depending on the interface, it may be more efficient
# to implement those methods in the site interface and then combine
# the results for this method, or to implement this method and then
# split up the results for the others.
return self.site.pagereferences(
self,
followRedirects=follow_redirects,
filterRedirects=redirectsOnly,
withTemplateInclusion=withTemplateInclusion,
onlyTemplateInclusion=onlyTemplateInclusion,
namespaces=namespaces,
total=total,
content=content
)
This passes on the work to Site.pagereferences() again with total=None and
withTemplateInclusion=False, onlyTemplateInclusion=False
def pagereferences(self, page, followRedirects=False, filterRedirects=None,
withTemplateInclusion=True,
onlyTemplateInclusion=False,
namespaces=None, total=None, content=False):
"""
Convenience method combining pagebacklinks and page_embeddedin.
@param namespaces: If present, only return links from the namespaces
in this list.
@type namespaces: iterable of basestring or Namespace key,
or a single instance of those types. May be a '|' separated
list of namespace identifiers.
@raises KeyError: a namespace identifier was not resolved
@raises TypeError: a namespace identifier has an inappropriate
type such as NoneType or bool
"""
if onlyTemplateInclusion:
return self.page_embeddedin(page, namespaces=namespaces,
filterRedirects=filterRedirects,
total=total, content=content)
if not withTemplateInclusion:
return self.pagebacklinks(page, followRedirects=followRedirects,
filterRedirects=filterRedirects,
namespaces=namespaces,
total=total, content=content)
(skipped the last part). It should hit on the "if not withTemplateInclusion"
def pagebacklinks(self, page, followRedirects=False, filterRedirects=None,
namespaces=None, total=None, content=False):
"""Iterate all pages that link to the given page.
@param page: The Page to get links to.
@param followRedirects: Also return links to redirects pointing to
the given page.
@param filterRedirects: If True, only return redirects to the given
page. If False, only return non-redirect links. If None, return
both (no filtering).
@param namespaces: If present, only return links from the namespaces
in this list.
@type namespaces: iterable of basestring or Namespace key,
or a single instance of those types. May be a '|' separated
list of namespace identifiers.
@param total: Maximum number of pages to retrieve in total.
@param content: if True, load the current content of each iterated page
(default False)
@raises KeyError: a namespace identifier was not resolved
@raises TypeError: a namespace identifier has an inappropriate
type such as NoneType or bool
"""
bltitle = page.title(withSection=False).encode(self.encoding())
blargs = {"gbltitle": bltitle}
if filterRedirects is not None:
blargs["gblfilterredir"] = (filterRedirects and "redirects" or
"nonredirects")
blgen = self._generator(api.PageGenerator, type_arg="backlinks",
namespaces=namespaces, total=total,
g_content=content, **blargs)
if followRedirects:
# links identified by MediaWiki as redirects may not really be,
# so we have to check each "redirect" page and see if it
# really redirects to this page
# see fixed MediaWiki bug T9304
redirgen = self._generator(api.PageGenerator,
type_arg="backlinks",
gbltitle=bltitle,
gblfilterredir="redirects")
genlist = {None: blgen}
for redir in redirgen:
if redir == page:
# if a wiki contains pages whose titles contain
# namespace aliases that existed before those aliases
# were defined (example: [[WP:Sandbox]] existed as a
# redirect to [[Wikipedia:Sandbox]] before the WP: alias
# was created) they can be returned as redirects to
# themselves; skip these
continue
if redir.getRedirectTarget() == page:
genlist[redir.title()] = self.pagebacklinks(
redir, followRedirects=True,
filterRedirects=filterRedirects,
namespaces=namespaces,
content=content
)
return itertools.chain(*list(genlist.values()))
return blgen
This function doesn't seem to contain any loop so it probably only hits
https://www.wikidata.org/w/api.php?action=help&recursivesubmodules=1#query+backlinks
once. Maybe someone broke this when deprecating "step"?
TASK DETAIL
https://phabricator.wikimedia.org/T129021
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Multichill
Cc: Aklapper, pywikibot-bugs-list, Multichill
_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs