Martin v. Löwis wrote: > Walter Dörwald wrote: > >> OK, here's a patch that implements this enhancement to >> PyUnicode_DecodeCharmap(): http://www.python.org/sf/1313939 > > Looks nice! > >> Creating the decoding_map as a string should probably be done by >> gencodec.py directly. This way the first import of the codec would be >> faster too. > > Hmm. How would you represent the string in source code? As a Unicode > literal? With \u escapes,
Yes, simply by outputting repr(decoding_string). > or in a UTF-8 source file? This might get unreadable, if your editor can't detect the coding header. > Or as a UTF-8 > string, with an explicit decode call? This is another possibility, but is unreadable too. But we might add the real codepoints as comments. > I like the current dictionary style for being readable, as it also > adds the Unicode character names into comments. We could use decoding_string = ( u"\u009c" # 0x0004 -> U+009C: CONTROL u"\u0009" # 0x0005 -> U+000c: HORIZONTAL TABULATION ... ) However the current approach has the advantage, that only those byte values that differ from the identical mapping have to be specified. Bye, Walter Dörwald _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com