M.-A. Lemburg wrote: > Walter Dörwald wrote: > >>Martin v. Löwis wrote: >> >>>M.-A. Lemburg wrote: >>> >>>>I've checked in a whole bunch of newly generated codecs >>>>which now make use of the faster charmap decoding variant added >>>>by Walter a short while ago. >>>> >>>>Please let me know if you find any problems. >>> >>>I think we should work on eliminating the decoding_map variables. >>>There are some codecs which rely on them being present in other codecs >>>(e.g. koi8_u.py is based on koi8_r.py); however, this could be updated >>>to use, say >>> >>>decoding_table = codecs.update_decoding_map(koi8_r.decoding_table, { >>> 0x00a4: 0x0454, # CYRILLIC SMALL LETTER UKRAINIAN IE >>> 0x00a6: 0x0456, # CYRILLIC SMALL LETTER >>>BYELORUSSIAN-UKRAINIAN I >>> 0x00a7: 0x0457, # CYRILLIC SMALL LETTER YI (UKRAINIAN) >>> 0x00ad: 0x0491, # CYRILLIC SMALL LETTER UKRAINIAN GHE >>>WITH UPTURN >>> 0x00b4: 0x0404, # CYRILLIC CAPITAL LETTER UKRAINIAN IE >>> 0x00b6: 0x0406, # CYRILLIC CAPITAL LETTER >>>BYELORUSSIAN-UKRAINIAN I >>> 0x00b7: 0x0407, # CYRILLIC CAPITAL LETTER YI (UKRAINIAN) >>> 0x00bd: 0x0490, # CYRILLIC CAPITAL LETTER UKRAINIAN GHE >>>WITH UPTURN >>>}) >>> >>>With all these cross-references gone, the decoding_maps could also go. > > I just left them in because I thought they wouldn't do any harm > and might be useful in some applications. > > Removing them where not directly needed by the codec would not > be a problem.
Recreating them is quite simple via dict(enumerate(decoding_table)) so I think we should remove them. >>Why should koi_u.py be defined in terms of koi8_r.py anyway? Why not put >>a complete decoding_table into koi8_u.py? > > KOI8-U is not available as mapping on ftp.unicode.org and > I only recreated codecs from the mapping files available > there. OK, so we'd need something that creates a new decoding table from an old one + changes, i.e. something like: def update_decoding_table(table, new): table = list[table] for (key, value) in new.iteritems(): table[key] = unichr(value) return u"".join(table) >>I'd like to suggest a small cosmetic change: gencodec.py should output >>byte values with two hexdigits instead of four. This makes it easier to >>see what is a byte values and what is a codepoint. And it would make >>grepping for stuff simpler. > > True. > > I'll rerun the creation with the above changes sometime this > week. Great, thanks! Bye, Walter Dörwald _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com