[issue16979] Broken error handling in codecs.unicode_escape_decode()
Roundup Robot added the comment: New changeset a242ac99161f by Serhiy Storchaka in branch '2.7': Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. http://hg.python.org/cpython/rev/a242ac99161f New changeset 084bec5443d6 by Serhiy Storchaka in branch '3.2': Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. http://hg.python.org/cpython/rev/084bec5443d6 New changeset 086defaf16fe by Serhiy Storchaka in branch '3.3': Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. http://hg.python.org/cpython/rev/086defaf16fe New changeset 218da678bb8b by Serhiy Storchaka in branch 'default': Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder. http://hg.python.org/cpython/rev/218da678bb8b -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16979 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16979] Broken error handling in codecs.unicode_escape_decode()
Serhiy Storchaka added the comment: Until subtests added an explicit call looks better to me. And when subtests will be added we will just add subtest inside the helper function. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16979 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16979] Broken error handling in codecs.unicode_escape_decode()
Changes by Serhiy Storchaka storch...@gmail.com: -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16979 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16979] Broken error handling in codecs.unicode_escape_decode()
Serhiy Storchaka added the comment: Ezio, is it a good factorization? def check(self, coder): def checker(input, expect): self.assertEqual(coder(input), (expect, len(input))) return checker def test_escape_decode(self): decode = codecs.unicode_escape_decode check = self.check(decode) check(b[\\\n], []) check(br'[\]', '[]') check(br[\'], [']) # other 20 checks ... And same for test_escape_encode and for bytes escape decoder. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16979 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16979] Broken error handling in codecs.unicode_escape_decode()
Ezio Melotti added the comment: LGTM. If you want to push it even further you could make a list of (input, expected) and call the check() in a loop. That way it will also be easier to refactor if/when we add subtests (#16997). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16979 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16979] Broken error handling in codecs.unicode_escape_decode()
Changes by Serhiy Storchaka storch...@gmail.com: Removed file: http://bugs.python.org/file28752/unicode_escape_decode_error_handling-3.4.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16979 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16979] Broken error handling in codecs.unicode_escape_decode()
Serhiy Storchaka added the comment: Here is a set of patches for all versions (patch for 3.4 updated). -- Added file: http://bugs.python.org/file28833/unicode_escape_decode_error_handling-2.7.patch Added file: http://bugs.python.org/file28834/unicode_escape_decode_error_handling-3.2.patch Added file: http://bugs.python.org/file28835/unicode_escape_decode_error_handling-3.3.patch Added file: http://bugs.python.org/file28836/unicode_escape_decode_error_handling-3.4.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16979 ___diff -r 5970c90dd8d1 Lib/test/test_codeccallbacks.py --- a/Lib/test/test_codeccallbacks.py Fri Jan 25 23:30:50 2013 +0200 +++ b/Lib/test/test_codeccallbacks.py Sat Jan 26 00:51:30 2013 +0200 @@ -262,12 +262,12 @@ self.assertEqual( \\u3042\u3xxx.decode(unicode-escape, test.handler1), -u\u3042[9211751120]xx +u\u3042[9211751]xxx ) self.assertEqual( \\u3042\u3xx.decode(unicode-escape, test.handler1), -u\u3042[9211751120120] +u\u3042[9211751]xx ) self.assertEqual( diff -r 5970c90dd8d1 Lib/test/test_codecs.py --- a/Lib/test/test_codecs.py Fri Jan 25 23:30:50 2013 +0200 +++ b/Lib/test/test_codecs.py Sat Jan 26 00:51:30 2013 +0200 @@ -1786,6 +1786,84 @@ self.assertEqual(srw.read(), u\xfc) +class UnicodeEscapeTest(unittest.TestCase): +def test_empty(self): +self.assertEqual(codecs.unicode_escape_encode(u), (, 0)) +self.assertEqual(codecs.unicode_escape_decode(), (u, 0)) + +def test_raw_encode(self): +encode = codecs.unicode_escape_encode +for b in range(32, 127): +if b != ord('\\'): +self.assertEqual(encode(unichr(b)), (chr(b), 1)) + +def test_raw_decode(self): +decode = codecs.unicode_escape_decode +for b in range(256): +if b != ord('\\'): +self.assertEqual(decode(chr(b) + '0'), (unichr(b) + u'0', 2)) + +def test_escape_encode(self): +encode = codecs.unicode_escape_encode +self.assertEqual(encode(u'\t'), (r'\t', 1)) +self.assertEqual(encode(u'\n'), (r'\n', 1)) +self.assertEqual(encode(u'\r'), (r'\r', 1)) +self.assertEqual(encode(u'\\'), (r'\\', 1)) +for b in range(32): +if chr(b) not in '\t\n\r': +self.assertEqual(encode(unichr(b)), ('\\x%02x' % b, 1)) +for b in range(127, 256): +self.assertEqual(encode(unichr(b)), ('\\x%02x' % b, 1)) +self.assertEqual(encode(u'\u20ac'), (r'\u20ac', 1)) +self.assertEqual(encode(u'\U0001d120'), + (r'\U0001d120', len(u'\U0001d120'))) + +def test_escape_decode(self): +decode = codecs.unicode_escape_decode +self.assertEqual(decode([\\\n]), (u[], 4)) +self.assertEqual(decode(r'[\]'), (u'[]', 4)) +self.assertEqual(decode(r[\']), (u['], 4)) +self.assertEqual(decode(r[\\]), (ur[\], 4)) +self.assertEqual(decode(r[\a]), (u[\x07], 4)) +self.assertEqual(decode(r[\b]), (u[\x08], 4)) +self.assertEqual(decode(r[\t]), (u[\x09], 4)) +self.assertEqual(decode(r[\n]), (u[\x0a], 4)) +self.assertEqual(decode(r[\v]), (u[\x0b], 4)) +self.assertEqual(decode(r[\f]), (u[\x0c], 4)) +self.assertEqual(decode(r[\r]), (u[\x0d], 4)) +self.assertEqual(decode(r[\7]), (u[\x07], 4)) +self.assertEqual(decode(r[\8]), (ur[\8], 4)) +self.assertEqual(decode(r[\78]), (u[\x078], 5)) +self.assertEqual(decode(r[\41]), (u[!], 5)) +self.assertEqual(decode(r[\418]), (u[!8], 6)) +self.assertEqual(decode(r[\101]), (u[A], 6)) +self.assertEqual(decode(r[\1010]), (u[A0], 7)) +self.assertEqual(decode(r[\x41]), (u[A], 6)) +self.assertEqual(decode(r[\x410]), (u[A0], 7)) +self.assertEqual(decode(r\u20ac), (u\u20ac, 6)) +self.assertEqual(decode(r\U0001d120), (u\U0001d120, 10)) +for b in range(256): +if chr(b) not in '\n\'\\abtnvfr01234567xuUN': +self.assertEqual(decode('\\' + chr(b)), + (u'\\' + unichr(b), 2)) + +def test_decode_errors(self): +decode = codecs.unicode_escape_decode +for c, d in ('x', 2), ('u', 4), ('U', 4): +for i in range(d): +self.assertRaises(UnicodeDecodeError, decode, + \\ + c + 0*i) +self.assertRaises(UnicodeDecodeError, decode, + [\\ + c + 0*i + ]) +data = [\\ + c + 0*i + ]\\ + c + 0*i +self.assertEqual(decode(data, ignore), (u[], len(data))) +self.assertEqual(decode(data, replace), + (u[\ufffd]\ufffd, len(data))) +
[issue16979] Broken error handling in codecs.unicode_escape_decode()
New submission from Serhiy Storchaka: An error handler in unicode_escape_decode() eats at least one byte (or more) after illegal escape sequence. import codecs codecs.unicode_escape_decode(br'\u!@#', 'replace') ('�', 5) codecs.unicode_escape_decode(br'\u!@#$', 'replace') ('�@#$', 6) raw_unicode_escape_decode() works right: codecs.raw_unicode_escape_decode(br'\u!@#', 'replace') ('�!@#', 5) codecs.raw_unicode_escape_decode(br'\u!@#$', 'replace') ('�!@#$', 6) See also issue16975. -- assignee: serhiy.storchaka components: Unicode messages: 180077 nosy: ezio.melotti, serhiy.storchaka priority: normal severity: normal stage: needs patch status: open title: Broken error handling in codecs.unicode_escape_decode() type: behavior versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16979 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16979] Broken error handling in codecs.unicode_escape_decode()
Serhiy Storchaka added the comment: Here is a patch for 3.4. Patches for other versions will be different a lot. -- dependencies: +SystemError in codecs.unicode_escape_decode() keywords: +patch stage: needs patch - patch review Added file: http://bugs.python.org/file28752/unicode_escape_decode_error_handling-3.4.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16979 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com