Nick Coghlan added the comment:
Following up here after rejecting #15622 as invalid
The "unicode" codes in PEP 3118 need to be seriously rethought before any
related changes are made in the struct module.
1. The 'c' and 's' codes are currently used for raw bytes data (represented as
bytes objects at the Python layer). This means the 'c' code cannot be used as
described in PEP 3118 in a world with strict binary/text separation.
2. Any format codes for UCS1, UCS2 and UCS4 are more usefully modelled on 's'
than they are on 'c' (so that repeat counts create longer strings rather than
lists of strings that each contain a single code point)
3. Given some of the other proposals in PEP 3118, it seems more useful to
define an embedded text format as "S{<encoding>}".
UCS1 would then be "S{latin-1}", UCS2 would be approximated as "S{utf-16}" and
UCS4 would be "S{utf-32}" and arbitrary encodings would also be supported.
struct packing would implicitly encode from text to bytes while unpacking would
implicitly decode bytes to text. As with 's' a length mismatch in the encoded
form would mean an error.
----------
nosy: +ncoghlan
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue3132>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com