On 12/15/2012 11:27 AM Armin Rigo wrote:
Hi,
On Sat, Dec 15, 2012 at 7:51 AM, Maciej Fijalkowski<fij...@gmail.com> wrote:
And ASPW does the same right? I understand the general need for UTF8,
I just didn't find it in this particular query.
Fwiw, I wonder again if we couldn't have all our unicode strings
internally be UTF8 instead of 2- or 4-bytes strings. This would mean
a W_UTF8UnicodeObject class that has both a reference to the RPython
string and some optional extra data to make it faster to locate the
n'th character or the total unicode length. (We discussed it on IRC
some time ago.)
A bientôt,
Armin.
Since
>>> for i in range(256): assert chr(i).decode('latin1') == unichr(i)
I wonder whether something could be gained by having an alternative
internal unicode representation in the form of latin1 8-bit byte strings.
ISTM a lot of English speaking and western European locales would hardly
ever need anything else, and generating code to tag and use/transform
alternative representations would be an internal optimization matter.
I suppose some apps could well result in 8, 16, and 32-bit unicodes and utf8
all coexisting under the hood, but only when actually needed.
Regards,
Bengt Richter
_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev