XZise added a comment.
Ah I think no need for that, because I think I know what is happening:
>>> from __future__ import unicode_literals
>>> import re
>>> re.sub('(?is)A', '', 'Ö'.encode('latin1'))
'\xd6'
>>> re.sub('(?is)A', '', 'ÖA'.encode('latin1'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/xzise/.pyenv/versions/2.7.8/lib/python2.7/re.py", line 151, in
sub
return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0:
ordinal not in range(128)
The error only appears when it actually replaces anything. In my previous
examples it didn't replaced anything and it worked. But when it replaces
something it tries to put the unicode into the bytes which doesn't work. You
could test and verify that when you edit the line where the error happens (from
your previous errors that is "core/scripts/reflinks.py" in line 647). Currently
it looks like this:
linkedpagetext = self.NON_HTML.sub('', linkedpagetext)
But it should work when it looks like this:
linkedpagetext = self.NON_HTML.sub(str(''), linkedpagetext)
I need to figure out if `linkedpagetext` is also `bytes` in Python 3 but that
fix will work at least in Python 2.
TASK DETAIL
https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign
<username>.
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise
Cc: Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs
_______________________________________________
Pywikipedia-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-bugs