[issue8821] Range check on unicode repr

STINNER Victor Wed, 29 Dec 2010 16:46:04 -0800

STINNER Victor <[email protected]> added the comment:

> Unicode objects are NUL-terminated, but only very external APIs
> rely on this (e.g. code using the Windows Unicode API).


All Py_UNICODE_str*() functions rely on the NUL character. They are useful when 
patching a function from bytes (char*) to unicode (PyUnicodeObject): the API is 
very close. It is possible to avoid them with new functions using the strings 
length.

All functions using PyUNICODE* as wchar_t* to the Windows wide character API 
(*W functions) also rely on the NUL character. Python core uses a lot of these 
functions. Don't write a NUL character require to create a temporary new string 
ending with a NUL character. It is not efficient, especially on long strings.

And there is the problem of all third party modules (written in C) relying on 
the NUL character.

I think that we have good reasons to not remove the NUL character. So I think 
that we can continue to accept that unicode[length] character can be read. Eg. 
implement text.startswith("ab") as "p=PyUnicode_AS_UNICODE(text); if (p[0] == 
'a' && p[1] == 'b')" without checking the length of text.

Using the NUL character or the length as a terminator condition doesn't really 
matter. I just see one advantage for the NUL character: it is faster in some 
cases.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue8821>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8821] Range check on unicode repr

Reply via email to