Martin v. Löwis wrote: > Travis E. Oliphant schrieb: >> In this case, the 'kind' does not specify how large the data-type is. >> You can have 'u1', 'u2', 'u4', etc. >> >> The same is true with Unicode. You can have 10-character unicode >> elements, 20-character, etc. But, we have to be clear about what a >> "character" is in the data-format. > > That is certainly confusing. In u1, u2, u4, the digit seems to indicate > the size of a single value (1 byte, 2 bytes, 4 bytes). Right? Yet, > in U20, it does *not* indicate the size of a single value but of an > array? And then, it's not the size, but the number of elements? >
Good point. In NumPy, unicode support was added "in parallel" with string arrays where there is not the ambiguity. So, yes, it's true that the unicode case is a special-case. The other way to handle it would be to describe the 'code'-point size (i.e. 'U1', 'U2', 'U4' for UCS-1, UCS-2, UCS-4) and then have the length be encoded as an "array" of those types. This was not the direction we took with NumPy (which is what I'm using as a reference) because I wanted Unicode and string arrays to look the same and thought of strings differently. How to handle unicode data-formats could definitely be improved. Suggestions are welcome. -Travis _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com