Neal Norwitz wrote: > See http://python.org/sf/1454485 for the gory details. Basically if > you create a unicode array (array.array('u')) and try to append an > 8-bit string (ie, not unicode), you can crash the interpreter. > > The problem is that the string is converted without question to a > unicode buffer. Within unicode, it assumes the data to be valid, but > this isn't necessarily the case. We wind up accessing an array with a > negative index and boom.
There are several problems combined here, which might need discussion: - why does the 'u#' converter use the buffer interface if available? it should just support Unicode objects. The buffer object makes no promise that the buffer actually is meaningful UCS-2/UCS-4, so u# shouldn't guess that it is. (FWIW, it currently truncates the buffer size to the next-smaller multiple of sizeof(Py_UNICODE), and silently so) I think that part should just go: u# should be restricted to unicode objects. - should Python guarantee that all characters in a Unicode object are between 0 and sys.maxunicode? Currently, it is possible to create Unicode strings with either negative or very large Py_UNICODE elements. - if the answer to the last question is no (i.e. if it is intentional that a unicode object can contain arbitrary Py_UNICODE values): should Python then guarantee that Py_UNICODE is an unsigned type? Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com