Stefan Krah added the comment: Nick's comment in msg167963 got me thinking. Indeed, in Numpy the 'U' specifier is similar to the struct module's 's' format code, only for UCS4. So I'm questioning whether the current semantics of 'u' and 'w' used by array.array were ever intended by the PEP authors:
import numpy >>> nd = numpy.array(["A", "B"], dtype='U') >>> nd array(['A', 'B'], dtype='<U1') >>> nd.tostring() b'A\x00\x00\x00B\x00\x00\x00' >>> >>> nd = numpy.array(["ABC", "D"], dtype='U') >>> nd array(['ABC', 'D'], dtype='<U3') >>> nd.tostring() b'A\x00\x00\x00B\x00\x00\x00C\x00\x00\x00D\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> Internally, in NumPy 'U' is always UCS4, and the data type is a fixed length string that has the length of the longest initializer element. NumPy's use of 'U' seems vastly more useful for arrays than the behavior of array.array: >>> array.array('u', ['A', 'B']) array('u', 'AB') >>> array.array('u', ['ABC', 'D']) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: array item must be unicode character In Numpy, arrays of words are possible, with array.array they are not. An additional thought: The convention in the struct module is to use uppercase for unsigned types. So it would be a possibility to use 'C', 'U' and 'W', where '3C' would denote the same as '3s', except for UCS1 instead of bytes. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue15625> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com