On 10/5/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: > Of course, a C version could use the same approach as > the unicodedatabase module: that of compressed lookup > tables... > > http://aggregate.org/TechPub/lcpc2002.pdf > > genccodec.py anyone ? >
I had written a test codec for single byte character sets to evaluate algorithms to use in CJKCodecs once before (it's not a direct implemention of you've mentioned, tough) I just ported it to unicodeobject (as attached). It showed relatively fine result than charmap codecs: % python ./Lib/timeit.py -s "s='a'*1024*1024; u=unicode(s)" "s.decode('iso8859-1')" 10 loops, best of 3: 96.7 msec per loop % ./python ./Lib/timeit.py -s "s='a'*1024*1024; u=unicode(s)" "s.decode('iso8859_10_fc')" 10 loops, best of 3: 22.7 msec per loop % ./python ./Lib/timeit.py -s "s='a'*1024*1024; u=unicode(s)" "s.decode('utf-8')" 100 loops, best of 3: 18.9 msec per loop (Note that it doesn't contain any documentation nor good error handling yet. :-) Hye-Shik
fastmapcodec.diff
Description: Binary data
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com