Victor Stinner <[email protected]> wrote:
> > 'c' -> UCS1
> > 'u' -> UCS2
> > 'w' -> UCS4
>
> A Unicode string is an array of code point. Another approach is to
> expose such string as an array of uint8/uint16/uint32 integers. I
> don't know if you expect to get a character / a substring when you
> read the buffer of a string object. Using Python 3.2, I get:
>
> >>> memoryview(b"abc")[0]
> b'a'
>
> ... but using Python 3.3 I get a number :-)
Yes, that's changed because officially (see struct module) the format
is unsigned bytes, which are integers in struct module syntax:
>>> unsigned_bytes = memoryview(b"abc")
>>> unsigned_bytes.format
'B'
>>> char_array = unsigned_bytes.cast('c')
>>> char_array.format
'c'
>>> char_array[0]
b'a'
Possibly the uint8/uint16/uint32 integer approach that you mention
would make more sense.
Stefan Krah
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com