Hi Daniel,

Changing the loop to the below tells me the first problematic pageid is
28644448 <https://en.wikipedia.org/wiki/Special:Redirect/page/28644448>,
which is the character \x85.

>>> for each_article in cat.articles(namespaces=(0)):
...     try:
...         print(each_article.title(withNamespace=True),
each_article.pageid)
...     except pywikibot.exceptions.InvalidTitle:
...         print(each_article.pageid)
...         raise
...

str.strip() removes this character resulting an empty string, so the
exception is raised. (page.py#L5666-L5670
<https://github.com/wikimedia/pywikibot/blob/16a31c88b67c7af1966ca00ed998db01f76c2adb/pywikibot/page.py#L5666-L5670>
)

Regards,
JJ

On Mon, Jun 18, 2018 at 1:23 PM Daniel Glus <[email protected]> wrote:

> Hi all,
>
> I'm getting a strange InvalidTitle error while iterating through each of
> the articles in the English Wikipedia's "Unprintworthy redirects" category
> using the .articles() function.
>
> In particular, if you run this code:
>
> import pywikibot
> site = pywikibot.Site("en", "wikipedia"); site.login()
> cat = pywikibot.Category(site, "Category:Unprintworthy redirects")
> for each_article in cat.articles(namespaces=(0)):
>     print(each_article.title(withNamespace=True), each_article.pageid)
>
> Then it'll run for a while, printing out a bunch of titles and page IDs,
> and then crash:
>
> Traceback (most recent call last):
>   File "/data/project/apersonbot/test-redir-bann.py", line 5, in <module>
>     print(each_article.title(withNamespace=True), each_article.pageid)
>   File "/shared/pywikipedia/core/pywikibot/tools/__init__.py", line 1446,
> in wrapper
>     return obj(*__args, **__kw)
>   File "/shared/pywikipedia/core/pywikibot/page.py", line 322, in title
>     title = self._link.canonical_title()
>   File "/shared/pywikipedia/core/pywikibot/page.py", line 5737, in
> canonical_title
>     if self.namespace != Namespace.MAIN:
>   File "/shared/pywikipedia/core/pywikibot/page.py", line 5698, in
> namespace
>     self.parse()
>   File "/shared/pywikipedia/core/pywikibot/page.py", line 5669, in parse
>     raise pywikibot.InvalidTitle("The link does not contain a page "
> pywikibot.exceptions.InvalidTitle: The link does not contain a page title
> CRITICAL: Closing network session.
>
> Any ideas? I don't think this is expected behavior, but I could be wrong.
>
> - Daniel
> _______________________________________________
> pywikibot mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/pywikibot
>
_______________________________________________
pywikibot mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot

Reply via email to