[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

STINNER Victor Thu, 24 Feb 2011 08:24:51 -0800

STINNER Victor <victor.stin...@haypocalc.com> added the comment:

I think that the normalization function in unicodeobject.c (only used for 
internal functions) can skip any character different than a-z, A-Z and 0-9. 
Something like:


>>> import re
>>> def normalize(name): return re.sub("[^a-z0-9]", "", name.lower())
... 
>>> normalize("UTF-8")
'utf8'
>>> normalize("ISO-8859-1")
'iso88591'
>>> normalize("latin1")
'latin1'

So ISO-8859-1, ISO885-1, LATIN-1, latin1, UTF-8, utf8, etc. will be normalized 
to iso88591, latin1 and utf8.

I don't know any encoding name where a character outside a-z, A-Z, 0-9 means 
anything special. But I don't know all encoding names! :-)

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11303>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

Reply via email to