On 9/15/06, Jim Jewett <[EMAIL PROTECTED]> wrote: > There should be only one reference to a string until is constructed, > and after that, its data should be immutable. Recoding that results > in different bytes should not be in-place. Either it returns a new > string (no problem) or it doesn't change the databuffer-and-encoding > pointer until the new databuffer is fully constructed.
Yes, but then having, say, a Latin-1 string, and repeatedly using it in places where UTF-16 is needed, causes you to repeat the decoding operation. The optimization becomes a pessimization. Here I'm imagining things like taking len(s) of a UTF-8 string, or s==u where u happens to be UTF-16. You only have to do this once or twice per string to start losing. Also, having two different classes of strings means fewer felicitous cases of x==y, where the result is True, being just a pointer comparison. This might matter in dictionaries: imagine a dictionary created as a literal and then used to look up key strings read from a file. > [Nick Coghlan wrote:] > > [...] the > > application is free to decouple the "reading" and "decoding" steps, and just > > transfer raw bytes between the streams. > > So adding boilerplate to treat text as bytes "for efficiency" may > become a standard recipe? Not so good. I'm sure this will happen to the same degree that it's become a standard recipe in Java and C# (both of which lack polymorphic whatzits). Which is to say, not at all. -j _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
