New submission from Jan Kaliszewski:
It seems that the 'raw_unicode_escape' codec:
1) produces data that could be suitable for Python 2.x raw unicode string
literals and not for Python 3.x raw unicode string literals (in Python 3.x
\u... escapes are also treated literally);
2) seems to be buggy anyway: bytes in range 128-255 are encoded with the
'latin-1' encoding (in Python 3.x it is definitely a bug; and even in Python
2.x the feature is dubious, although at least the Py2's eval() and compile()
functions officially accept 'latin-1'-encoded byte strings...).
Python 3.3:
>>> b = "zażółć".encode('raw_unicode_escape')
>>> literal = b'r"' + b + b'"'
>>> literal
b'r"za\\u017c\xf3\\u0142\\u0107"'
>>> eval(literal)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xf3 in position
8: invalid continuation byte
>>> b'\xf3'.decode('latin-1')
'ó'
>>> b = "zaż".encode('raw_unicode_escape')
>>> literal = b'r"' + b + b'"'
>>> literal
b'r"za\\u017c"'
>>> eval(literal)
'za\\u017c'
>>> print(eval(literal))
za\u017c
It believe that the 'raw_unicode_escape' codes should either be deprecated and
later removed or be modified to accept only printable ascii characters.
PS. Also, as a side note: neither 'raw_unicode_escape' nor 'unicode_escape'
does escape quotes (see issue #7615) -- shouldn't it be at least documented
explicitly?
----------
components: Library (Lib), Unicode
messages: 202505
nosy: ezio.melotti, haypo, zuo
priority: normal
severity: normal
status: open
title: The 'raw_unicode_escape' codec buggy + not apropriate for Python 3.x
versions: Python 3.4, Python 3.5
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue19539>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com