[issue23614] Opaque error message on UTF-8 decoding to surrogates

Serhiy Storchaka Sun, 08 Mar 2015 15:01:42 -0700

Serhiy Storchaka added the comment:

UTF-8 codec can't decode byte 0xed because 0xed is not valid UTF-8 sequence and 
following byte is not expected valid continuation byte.


UTF-8 codec can produce errors of three types:

* "invalid start byte". When the byte is not start byte of UTF-8 sequence 
(%x00-7F, %xC2-F4).
* "invalid continuation byte".  When the byte that follow unfinished UTF-8 
sequence is not valid continuation byte (the validity depends on previous byte).
* "unexpected end of data". When the there are no bytes after unfinished UTF-8 
sequence.

----------
nosy: +serhiy.storchaka

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue23614>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23614] Opaque error message on UTF-8 decoding to surrogates

Reply via email to