Le 04/10/2011 20:09, "Martin v. Löwis" a écrit :
Am 04.10.11 19:50, schrieb Antoine Pitrou:
On Tue, 04 Oct 2011 19:49:09 +0200
"Martin v. Löwis"<mar...@v.loewis.de> wrote:

+ result = PyUnicode_New(slicelength, PyUnicode_MAX_CHAR_VALUE(self));

This is incorrect: the maxchar of the slice might be smaller than the
maxchar of the input string.

I thought that heuristic would be good enough. I'll try to fix it.

No - strings must always be in the canonical form.

I added a check in _PyUnicode_CheckConsistency() (debug mode) to ensure that newly created strings always use the most efficient storage.

For example, PyUnicode_RichCompare considers string unequal if they
> have different kinds. As a consequence, your slice
> result may not compare equal to a canonical variant of itself.

I see this as a micro-optimization. IMO we should *not* rely on these assumptions because we cannot expect that all developers of third party modules will be able to write perfect code, and some (lazy developers!) may prefer to use a fixed maximum character (e.g. 0xFFFF).

To be able to rely on such assumption, we have to make sure that strings are in canonical forms (always check before using a string?). But it would slow down Python because you have to scan the whole string to get the maximum characters (see my change in _PyUnicode_CheckConsistency).

I would prefer to drop such micro-optimization and tolerate non-canonical strings (strings not using the most efficient storage).

Even if PEP 393 is fully backward compatibly (except that PyUnicode_AS_UNICODE and PyUnicode_AsUnicode may now return NULL), it's already a big change (developers may want to move to the new API to benefit of the advantages of the PEP 393), and very few developers understand correctly Unicode.

It's safer to see the PEP 393 as a best-effort method. Hopefuly, most (or all?) strings created by Python itself are in canonical form.

Victor
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to