Josiah Carlson schrieb: > For me, having recently remembered what was in a unicode string, and > verifying it by checking the source, the question in my mind is whether > we want to stick with the same 2-representation implementation (default > encoding and UTF-16 or UCS-4 depending on build), or go with more or > fewer representations.
I would personally like to see a Python API that operates on code points, with support for 17 planes. I also think that efficient indexing is important. > We can reduce memory consumption by using a single representation, > whether it be constant or variable based on content, though in some > cases (utf-16, ucs-4) we would lose the 'native' single-segment char (C > char) buffer interface. I don't think reducing memory consumption is that important, for current hardware. Java and .NET have demonstrated that you can do "real" application with that approach. There are trade-offs, of course. I personally think the best trade-off would be to have a two-byte representation, along with a flag telling whether there are any surrogate pairs in the string. Indexing and length would be constant-time if there are no surrogates, and linear time if there are. > After re-reading the source, and thinking a bit more, about my only > real concern is memory use of Python 3.x . The current implementation > works, so I'm +1 on keeping it "as is", but I'm also +0 on some > implementation that would reduce memory use (with limited, if any > slowdown) for as many platforms as possible, not any higher because > changing the underlying implementation would be a PITA. I think supporting multiple representations at run-time would really be terrible. Any API of the "give me the data" kind would either have to expose the choice of representations, or perform a copy. Either alternative would produce many programming errors in extension modules. Regards, Martin _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com