XZise added a comment.

Ah I think no need for that, because I think I know what is happening:

  >>> from __future__ import unicode_literals
  >>> import re
  >>> re.sub('(?is)A', '', 'Ö'.encode('latin1'))
  '\xd6'
  >>> re.sub('(?is)A', '', 'ÖA'.encode('latin1'))
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/home/xzise/.pyenv/versions/2.7.8/lib/python2.7/re.py", line 151, in 
sub
      return _compile(pattern, flags).sub(repl, string, count)
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: 
ordinal not in range(128)

The error only appears when it actually replaces anything. In my previous 
examples it didn't replaced anything and it worked. But when it replaces 
something it tries to put the unicode into the bytes which doesn't work. You 
could test and verify that when you edit the line where the error happens (from 
your previous errors that is "core/scripts/reflinks.py" in line 647). Currently 
it looks like this:

  linkedpagetext = self.NON_HTML.sub('', linkedpagetext)

But it should work when it looks like this:

  linkedpagetext = self.NON_HTML.sub(str(''), linkedpagetext)

I need to figure out if `linkedpagetext` is also `bytes` in Python 3 but that 
fix will work at least in Python 2.


TASK DETAIL
  https://phabricator.wikimedia.org/T94688

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: XZise
Cc: Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs



_______________________________________________
Pywikipedia-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-bugs

Reply via email to