[issue16979] Broken error handling in codecs.unicode_escape_decode()

2013-01-29 Thread Roundup Robot

Roundup Robot added the comment:

New changeset a242ac99161f by Serhiy Storchaka in branch '2.7':
Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder.
http://hg.python.org/cpython/rev/a242ac99161f

New changeset 084bec5443d6 by Serhiy Storchaka in branch '3.2':
Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder.
http://hg.python.org/cpython/rev/084bec5443d6

New changeset 086defaf16fe by Serhiy Storchaka in branch '3.3':
Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder.
http://hg.python.org/cpython/rev/086defaf16fe

New changeset 218da678bb8b by Serhiy Storchaka in branch 'default':
Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder.
http://hg.python.org/cpython/rev/218da678bb8b

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16979
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16979] Broken error handling in codecs.unicode_escape_decode()

2013-01-29 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Until subtests added an explicit call looks better to me. And when subtests 
will be added we will just add subtest inside the helper function.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16979
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16979] Broken error handling in codecs.unicode_escape_decode()

2013-01-29 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16979
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16979] Broken error handling in codecs.unicode_escape_decode()

2013-01-28 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Ezio, is it a good factorization?

def check(self, coder):
def checker(input, expect):
self.assertEqual(coder(input), (expect, len(input)))
return checker

def test_escape_decode(self):
decode = codecs.unicode_escape_decode
check = self.check(decode)
check(b[\\\n], [])
check(br'[\]', '[]')
check(br[\'], ['])
# other 20 checks ...

And same for test_escape_encode and for bytes escape decoder.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16979
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16979] Broken error handling in codecs.unicode_escape_decode()

2013-01-28 Thread Ezio Melotti

Ezio Melotti added the comment:

LGTM.
If you want to push it even further you could make a list of (input, expected) 
and call the check() in a loop.  That way it will also be easier to refactor 
if/when we add subtests (#16997).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16979
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16979] Broken error handling in codecs.unicode_escape_decode()

2013-01-25 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Removed file: 
http://bugs.python.org/file28752/unicode_escape_decode_error_handling-3.4.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16979
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16979] Broken error handling in codecs.unicode_escape_decode()

2013-01-25 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is a set of patches for all versions (patch for 3.4 updated).

--
Added file: 
http://bugs.python.org/file28833/unicode_escape_decode_error_handling-2.7.patch
Added file: 
http://bugs.python.org/file28834/unicode_escape_decode_error_handling-3.2.patch
Added file: 
http://bugs.python.org/file28835/unicode_escape_decode_error_handling-3.3.patch
Added file: 
http://bugs.python.org/file28836/unicode_escape_decode_error_handling-3.4.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16979
___diff -r 5970c90dd8d1 Lib/test/test_codeccallbacks.py
--- a/Lib/test/test_codeccallbacks.py   Fri Jan 25 23:30:50 2013 +0200
+++ b/Lib/test/test_codeccallbacks.py   Sat Jan 26 00:51:30 2013 +0200
@@ -262,12 +262,12 @@
 
 self.assertEqual(
 \\u3042\u3xxx.decode(unicode-escape, test.handler1),
-u\u3042[9211751120]xx
+u\u3042[9211751]xxx
 )
 
 self.assertEqual(
 \\u3042\u3xx.decode(unicode-escape, test.handler1),
-u\u3042[9211751120120]
+u\u3042[9211751]xx
 )
 
 self.assertEqual(
diff -r 5970c90dd8d1 Lib/test/test_codecs.py
--- a/Lib/test/test_codecs.py   Fri Jan 25 23:30:50 2013 +0200
+++ b/Lib/test/test_codecs.py   Sat Jan 26 00:51:30 2013 +0200
@@ -1786,6 +1786,84 @@
 self.assertEqual(srw.read(), u\xfc)
 
 
+class UnicodeEscapeTest(unittest.TestCase):
+def test_empty(self):
+self.assertEqual(codecs.unicode_escape_encode(u), (, 0))
+self.assertEqual(codecs.unicode_escape_decode(), (u, 0))
+
+def test_raw_encode(self):
+encode = codecs.unicode_escape_encode
+for b in range(32, 127):
+if b != ord('\\'):
+self.assertEqual(encode(unichr(b)), (chr(b), 1))
+
+def test_raw_decode(self):
+decode = codecs.unicode_escape_decode
+for b in range(256):
+if b != ord('\\'):
+self.assertEqual(decode(chr(b) + '0'), (unichr(b) + u'0', 2))
+
+def test_escape_encode(self):
+encode = codecs.unicode_escape_encode
+self.assertEqual(encode(u'\t'), (r'\t', 1))
+self.assertEqual(encode(u'\n'), (r'\n', 1))
+self.assertEqual(encode(u'\r'), (r'\r', 1))
+self.assertEqual(encode(u'\\'), (r'\\', 1))
+for b in range(32):
+if chr(b) not in '\t\n\r':
+self.assertEqual(encode(unichr(b)), ('\\x%02x' % b, 1))
+for b in range(127, 256):
+self.assertEqual(encode(unichr(b)), ('\\x%02x' % b, 1))
+self.assertEqual(encode(u'\u20ac'), (r'\u20ac', 1))
+self.assertEqual(encode(u'\U0001d120'),
+ (r'\U0001d120', len(u'\U0001d120')))
+
+def test_escape_decode(self):
+decode = codecs.unicode_escape_decode
+self.assertEqual(decode([\\\n]), (u[], 4))
+self.assertEqual(decode(r'[\]'), (u'[]', 4))
+self.assertEqual(decode(r[\']), (u['], 4))
+self.assertEqual(decode(r[\\]), (ur[\], 4))
+self.assertEqual(decode(r[\a]), (u[\x07], 4))
+self.assertEqual(decode(r[\b]), (u[\x08], 4))
+self.assertEqual(decode(r[\t]), (u[\x09], 4))
+self.assertEqual(decode(r[\n]), (u[\x0a], 4))
+self.assertEqual(decode(r[\v]), (u[\x0b], 4))
+self.assertEqual(decode(r[\f]), (u[\x0c], 4))
+self.assertEqual(decode(r[\r]), (u[\x0d], 4))
+self.assertEqual(decode(r[\7]), (u[\x07], 4))
+self.assertEqual(decode(r[\8]), (ur[\8], 4))
+self.assertEqual(decode(r[\78]), (u[\x078], 5))
+self.assertEqual(decode(r[\41]), (u[!], 5))
+self.assertEqual(decode(r[\418]), (u[!8], 6))
+self.assertEqual(decode(r[\101]), (u[A], 6))
+self.assertEqual(decode(r[\1010]), (u[A0], 7))
+self.assertEqual(decode(r[\x41]), (u[A], 6))
+self.assertEqual(decode(r[\x410]), (u[A0], 7))
+self.assertEqual(decode(r\u20ac), (u\u20ac, 6))
+self.assertEqual(decode(r\U0001d120), (u\U0001d120, 10))
+for b in range(256):
+if chr(b) not in '\n\'\\abtnvfr01234567xuUN':
+self.assertEqual(decode('\\' + chr(b)),
+ (u'\\' + unichr(b), 2))
+
+def test_decode_errors(self):
+decode = codecs.unicode_escape_decode
+for c, d in ('x', 2), ('u', 4), ('U', 4):
+for i in range(d):
+self.assertRaises(UnicodeDecodeError, decode,
+  \\ + c + 0*i)
+self.assertRaises(UnicodeDecodeError, decode,
+  [\\ + c + 0*i + ])
+data = [\\ + c + 0*i + ]\\ + c + 0*i
+self.assertEqual(decode(data, ignore), (u[], len(data)))
+self.assertEqual(decode(data, replace),
+ (u[\ufffd]\ufffd, len(data)))
+

[issue16979] Broken error handling in codecs.unicode_escape_decode()

2013-01-16 Thread Serhiy Storchaka

New submission from Serhiy Storchaka:

An error handler in unicode_escape_decode() eats at least one byte (or more) 
after illegal escape sequence.

 import codecs
 codecs.unicode_escape_decode(br'\u!@#', 'replace')
('�', 5)
 codecs.unicode_escape_decode(br'\u!@#$', 'replace')
('�@#$', 6)

raw_unicode_escape_decode() works right:

 codecs.raw_unicode_escape_decode(br'\u!@#', 'replace')
('�!@#', 5)
 codecs.raw_unicode_escape_decode(br'\u!@#$', 'replace')
('�!@#$', 6)

See also issue16975.

--
assignee: serhiy.storchaka
components: Unicode
messages: 180077
nosy: ezio.melotti, serhiy.storchaka
priority: normal
severity: normal
stage: needs patch
status: open
title: Broken error handling in codecs.unicode_escape_decode()
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16979
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16979] Broken error handling in codecs.unicode_escape_decode()

2013-01-16 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is a patch for 3.4. Patches for other versions will be different a lot.

--
dependencies: +SystemError in codecs.unicode_escape_decode()
keywords: +patch
stage: needs patch - patch review
Added file: 
http://bugs.python.org/file28752/unicode_escape_decode_error_handling-3.4.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16979
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com