Re: [Python-Dev] PEP 393 Summer of Code Project

Victor Stinner Wed, 24 Aug 2011 11:03:11 -0700

Le 24/08/2011 11:22, Glenn Linderman a écrit :

c) mostly ASCII (utf8) with clever indexing/caching to be efficient
d) UTF-8 with clever indexing/caching to be efficient

I see neither a need nor a means to consider these.


The discussion about "mostly ASCII" strings seems convincing that there
could be a significant space savings if such were implemented.

Antoine's optimization in the UTF-8 decoder has been removed. It doesn'tchange the memory footprint, it is just slower to create the Unicode object.


When you decode an UTF-8 string:

 - "abc" string uses "latin1" (8 bits) units
 - "aé" string uses "latin1" (8 bits) units <= cool!
 - "a€" string uses UCS2 (16 bits) units
 - "a\U0010FFFF" string uses UCS4 (32 bits) units

Victor
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393 Summer of Code Project

Reply via email to