XZise added a comment. Okay thank you that helps a lot. Here are all the steps to understand what is happening: The page août <https://pt.wiktionary.org/w/index.php?title=ao%C3%BBt&oldid=1930450> on the Portuguese wiki is using `{{urlencode:ao%FBt}}`. Now our code is searching through the text for the templates to make sure that it is not protected for bot edits and it picks up `{{urlencode:ao%FBt}}` as a template. With that it tries to create a `Link` instance and by doing that tries to decode the percent encoding. Which is why `urlencode:ao%FBt` is the text you got when printed.
And the rest is straight forward: It encodes that using the site's encoding, tries to handle the percent encoding and then decodes the bytes it got from that again with the site's encoding. And that makes `u'urlencode:ao%FBt'` first into `b'urlencode:ao%FBt'` using UTF-8 (as all characters are ASCII characters) it decodes the percent encoding to `b'urlencode:ao\xFBt'` and then tries to decode it using UTF-8 which does not work as `0xFB` alone is no valid UTF-8 character. Now to fix this particular case (as you've already done <https://pt.wiktionary.org/w/index.php?title=ao%C3%BBt&diff=1994294&oldid=1930450>) it's possible to just fix the usage in the page as it doesn't make sense to percent encode a percent encoded string. But while the fault lies by whoever wrote that text and not really by pywikibot I think we need to mitigate that. I don't think it's possible to get percent encoded text in the API as it will use `\u00FB` instead, so I think we could skip that and that it's probably because previous versions screen scraped an HTML page which might use %-encoded text. Alternatively we should provide a more sensible output including the original values which would make it more obvious what went wrong in case some page has the same problem in the future. TASK DETAIL https://phabricator.wikimedia.org/T111116 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: XZise Cc: XZise, pywikibot-bugs-list, Malafaya, Aklapper, jayvdb, Malyacko _______________________________________________ pywikibot-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs
