Le 24/08/2011 11:22, Glenn Linderman a écrit :
c) mostly ASCII (utf8) with clever indexing/caching to be efficient
d) UTF-8 with clever indexing/caching to be efficient
I see neither a need nor a means to consider these.

The discussion about "mostly ASCII" strings seems convincing that there
could be a significant space savings if such were implemented.

Antoine's optimization in the UTF-8 decoder has been removed. It doesn't change the memory footprint, it is just slower to create the Unicode object.

When you decode an UTF-8 string:

 - "abc" string uses "latin1" (8 bits) units
 - "aé" string uses "latin1" (8 bits) units <= cool!
 - "a€" string uses UCS2 (16 bits) units
 - "a\U0010FFFF" string uses UCS4 (32 bits) units

Victor
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to