On Mon, 24 Jan 2011 21:17:34 +0100 "Martin v. Löwis" <mar...@v.loewis.de> wrote: > I have been thinking about Unicode representation for some time now. > This was triggered, on the one hand, by discussions with Glyph Lefkowitz > (who complained that his server app consumes too much memory), and Carl > Friedrich Bolz (who profiled Python applications to determine that > Unicode strings are among the top consumers of memory in Python). > On the other hand, this was triggered by the discussion on supporting > surrogates in the library better. > > I'd like to propose PEP 393, which takes a different approach, > addressing both problems simultaneously: by getting a flexible > representation (one that can be either 1, 2, or 4 bytes), we can > support the full range of Unicode on all systems, but still use > only one byte per character for strings that are pure ASCII (which > will be the majority of strings for the majority of users).
For this kind of experiment, I think a concrete attempt at implementing (together with performance/memory savings numbers) would be much more useful than an abstract proposal. It is hard to judge the concrete effects of the changes you are proposing, even though they might (or not) make sense in theory. For example, you are adding a lot of constant overhead to every unicode object, even very small ones, which might be detrimental. Also, accessing the unicode object's payload can become quite a bit more cumbersome. Only implementing can tell how much this is workable in practice. Regards Antoine. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com