Tony Nelson wrote: >>For decoding it should be sufficient to use a unicode string of >>length 256. u"\ufffd" could be used for "maps to undefined". Or the >>string might be shorter and byte values greater than the length of >>the string are treated as "maps to undefined" too. > > > With Unicode using more than 64K codepoints now, it might be more forward > looking to use a table of 256 32-bit values, with no need for tricky > values.
You might be missing the point. \ufffd is REPLACEMENT CHARACTER, which would indicate that the byte with that index is really unused in that encoding. > Encoding can be made fast using a simple hash table with external chaining. > There are max 256 codepoints to encode, and they will normally be well > distributed in their lower 8 bits. Hash on the low 8 bits (just mask), and > chain to an area with 256 entries. Modest storage, normally short chains, > therefore fast encoding. This is what is currently done: a hash map with 256 keys. You are complaining about the performance of that algorithm. The issue of external chaining is likely irrelevant: there likely are no collisions, even though Python uses open addressing. >>...I suggest instead just /caching/ the translation in C arrays stored >>with the codec object. The cache would be invalidated on any write to the >>codec's mapping dictionary, and rebuilt the next time anything was >>translated. This would maintain the present semantics, work with current >>codecs, and still provide the desired speed improvement. That is not implementable. You cannot catch writes to the dictionary. > Note that this caching is done by new code added to the existing C > functions (which, if I have it right, are in unicodeobject.c). No > architectural changes are made; no existing codecs need to be changed; > everything will just work Please try to implement it. You will find that you cannot. I don't see how regenerating/editing the codecs could be avoided. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com