[issue20132] Many incremental codecs don’t handle fragmented data

Martin Panter Wed, 17 Dec 2014 18:26:06 -0800

Martin Panter added the comment:

The “unicode-escape” and “utf-7” cases affect the more widely-used 
TextIOWrapper interface:


>>> TextIOWrapper(BytesIO(br"\u2013" * 2000), "unicode-escape").read(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.4/encodings/unicode_escape.py", line 26, in decode
    return codecs.unicode_escape_decode(input, self.errors)[0]
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 
8190-8191: truncated \uXXXX escape
>>> w = TextIOWrapper(BytesIO(), "utf-7")
>>> w.writelines("\xA9\xA9")  # Write one character at a time
>>> w.detach().getvalue()
b'+AKk-+AKk-'
>>> r = TextIOWrapper(BytesIO(b"+" + b"AAAAAAAA" * 100000 + b"-"), "utf-7")
>>> r.read(1)  # Short delay as all 800 kB are decoded to read one character
'\x00'
>>> r.buffer.tell()
800002

For UTF-7 decoding to work optimally I think the amount of data buffering 
necessary would be limited to only a few bytes.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20132>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20132] Many incremental codecs don’t handle fragmented data

Reply via email to