Hi Daniel, Changing the loop to the below tells me the first problematic pageid is 28644448 <https://en.wikipedia.org/wiki/Special:Redirect/page/28644448>, which is the character \x85.
>>> for each_article in cat.articles(namespaces=(0)): ... try: ... print(each_article.title(withNamespace=True), each_article.pageid) ... except pywikibot.exceptions.InvalidTitle: ... print(each_article.pageid) ... raise ... str.strip() removes this character resulting an empty string, so the exception is raised. (page.py#L5666-L5670 <https://github.com/wikimedia/pywikibot/blob/16a31c88b67c7af1966ca00ed998db01f76c2adb/pywikibot/page.py#L5666-L5670> ) Regards, JJ On Mon, Jun 18, 2018 at 1:23 PM Daniel Glus <[email protected]> wrote: > Hi all, > > I'm getting a strange InvalidTitle error while iterating through each of > the articles in the English Wikipedia's "Unprintworthy redirects" category > using the .articles() function. > > In particular, if you run this code: > > import pywikibot > site = pywikibot.Site("en", "wikipedia"); site.login() > cat = pywikibot.Category(site, "Category:Unprintworthy redirects") > for each_article in cat.articles(namespaces=(0)): > print(each_article.title(withNamespace=True), each_article.pageid) > > Then it'll run for a while, printing out a bunch of titles and page IDs, > and then crash: > > Traceback (most recent call last): > File "/data/project/apersonbot/test-redir-bann.py", line 5, in <module> > print(each_article.title(withNamespace=True), each_article.pageid) > File "/shared/pywikipedia/core/pywikibot/tools/__init__.py", line 1446, > in wrapper > return obj(*__args, **__kw) > File "/shared/pywikipedia/core/pywikibot/page.py", line 322, in title > title = self._link.canonical_title() > File "/shared/pywikipedia/core/pywikibot/page.py", line 5737, in > canonical_title > if self.namespace != Namespace.MAIN: > File "/shared/pywikipedia/core/pywikibot/page.py", line 5698, in > namespace > self.parse() > File "/shared/pywikipedia/core/pywikibot/page.py", line 5669, in parse > raise pywikibot.InvalidTitle("The link does not contain a page " > pywikibot.exceptions.InvalidTitle: The link does not contain a page title > CRITICAL: Closing network session. > > Any ideas? I don't think this is expected behavior, but I could be wrong. > > - Daniel > _______________________________________________ > pywikibot mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/pywikibot >
_______________________________________________ pywikibot mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikibot
