New submission from Sean B. Palmer: The following error is uncatchable:
>>> try: ur'\U0010FFFF' ... except UnicodeDecodeError: pass ... UnicodeDecodeError: 'rawunicodeescape' codec can't decode byte 0x5c in position 0: \Uxxxxxxxx out of range This is in a narrow unicode build: >>> sys.version_info, hex(sys.maxunicode) ((2, 5, 1, 'final', 0), '0xffff') Of course the r in ur'...' is redundant in the test case above, but there are cases in which it isn't... >>> ur'\U0010FFFF\test' u'\U0010ffff\\test' - from a wide unicode build >>> ur'\U0010FFFF\test' UnicodeDecodeError: 'rawunicodeescape' codec can't decode byte 0x5c in position 0: \Uxxxxxxxx out of range - from the narrow unicode build The problem occurs with .decode('raw-unicode-escape') too. >>> '\U0010FFFF\test'.decode('raw-unicode-escape') Traceback (most recent call last): [&c.] Most surprisingly of all, however, this problem doesn't occur when you don't use a raw string: >>> u'\U0010ffff\\test' u'\U0010ffff\\test' So there is at least a workaround for all cases, which is why this bug is marked as Severity: minor. It did take a while to work out that what manifests with ur mightn't apply to u, however; it's usually one's first thought to think the bug is with you, not with python. ---------- components: Unicode messages: 57710 nosy: sbp severity: minor status: open title: UnicodeDecodeError that cannot be caught in narrow unicode builds type: behavior versions: Python 2.5 __________________________________ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1477> __________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com