XZise added a subscriber: jayvdb.
XZise added a comment.
Okay looking at it there may be several factors. For one `unicode_literals`
(https://phabricator.wikimedia.org/rPWBC1e54a7d6886d56a21101900025038e25bab5ad03)
adds that strings (using just quotes) are now unicode. Now this does not
directly affect the line where you are because it's already unicode so there
nothing changed. Now additionally
https://phabricator.wikimedia.org/rPWBCb44e59ae60a65bcba0e5fbc6d1941f1edcdc640c
added support for `Page` instances in `Request`. So instead of the title with
the section added manually it always uses the `Page` instance and `Request`
itself extracts the title. This is not done in the parameters though but only
on submit. And on submit it creates a new list of the parameters normalized to
str/bytes but the original parameters (`Request._params`) are untouched.
And now we are getting somewhere, because the error line in question uses
`self._params` and thus might get a `Page` instance. And `Page.__repr__`
currently returns bytes in Python 2 and encoded in the console encoding. Now
when you have something like `u'%s' % (u'Ünicöde tëxt'.encode('latin1'))` it
tries to put bytes into unicode (in that case it doesn't matter if
`unicode_literals` is used and if it's used that just means you could remove
the u-prefixes). Now whenever Python tries to put a `bytes` into `unicode` it
decodes it using ASCII which won't work because there are several characters
not mapped by ASCII.
So the underlying issue is simply that `Page.__repr__` does not return an ASCII
compatible string, which also leads to failures seen in
https://phabricator.wikimedia.org/T95809. @jayvdb actually reverted the changes
related to `Page.__repr__` from the `unicode_literals` patch in
https://phabricator.wikimedia.org/rPWBC853e6b0bdce3e4fe60920c1c04f64d3a8eecde5e.
Although the `unicode_literals` variant is also not appropriate as it returned
a `unicode` in Python 2 but it wouldn't try to decode it.
I guess the most appropriate way would be to just do a `repr` on the title and
site. And then surround that with something to make it look like today. The
obvious disadvantage would be that the representation is not as readable
(especially on wikis not using the latin alphabet) but it would be standard
conform. The implementation in the `unicode_literals` patch wasn't that far
off. On Python 2.7 I get the following:
>>> u'Ünicöde tëxt'.encode('latin1').decode('unicode-escape')
u'\xdcnic\xf6de t\xebxt'
>>> repr(u'Ünicöde tëxt')
"u'\\xdcnic\\xf6de t\\xebxt'"
And the result in Python 3, which is using `str` (`unicode` in Python 2 terms)
as a result for `__repr__` (afaik is `__ascii__` the implementation for
`__repr__` returning an ASCII compatible string) looks also quite similar:
>>> u'Ünicöde tëxt'.encode('latin1').decode('unicode-escape')
'Ünicöde tëxt'
>>> repr(u'Ünicöde tëxt')
"'Ünicöde tëxt'"
TASK DETAIL
https://phabricator.wikimedia.org/T103253
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Xqt, XZise
Cc: jayvdb, gerritbot, XZise, Aklapper, Xqt, pywikibot-bugs-list, Malyacko,
P.Copp
_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs