Hye-Shik Chang added the comment: I have generated compressed mapping tables by several ways.
I extracted mapping data into individual files and reorganized them by translating into Python source code or archiving into a zip file. The following table shows the result: (in kilobytes) (also available at http://spreadsheets.google.com/pub?key=pWRBaY2ZM7mRgddF0Itd2IA ) none minimal MSjk MSall current Text 0 207 312 342 570 Data 904 696 592 562 333 raw-py 3006 2392 2016 1932 996 zip-py 720 496 416 384 304 raw-pyc 952 734 624 590 346 zip-pyc 560 384 336 304 240 Text+zip-pyc 560 591 648 646 810 raw-both 3954 3124 2638 2520 1340 zip-both 1248 864 736 672 512 zip-bare 560 384 336 304 240 tarbz2-bare 496 352 320 304 240 Columns represent which mapping files are separated into external files. In "none", no mapping is left as static const C data while only new cns11643 mappings are extracted in "current" column. "minimal" set has the major character set for each country in static C data and other are out. And "MSjk" includes some more MS codepages of Japan and Korea, and "MSall" includes all MS codepage extensions in static const C data. We may fix the list which character sets remain as C data or let users pick the sets using configure option. "Text" is portion that remains in static const C data where is all the current mapping tables are in. As discussed when CJKCodecs had been integrated into python, it can be shared over processes in a system and efficient, but it can't be compressed or reorganized easily by users for redistribution. "Data" is externally managed mapping tables. "raw-py" row shows total volume of mapping tables as in Python source code. "raw-pyc" shows compiled (pyc) version of mapping tables. "zip-py" and "zip-pyc" are zip-compressed archive of "raw-py" and "raw-pyc", respectively. Those can be imported using python zipimport machinery. "zip-bare" and "tarbz2-bare" shows volume of archived raw mapping table files as you can notice from their name. We have 560KB of mapping tables in the Python CJKCodecs part. If we choose "zip-pyc" of "minimal" set, the binary distribution will be just as big as before even if we include CNS11643 character set and pythonXY.dll will get smaller by 363KB. What do you think about the scheme or Any other idea for compression? __________________________________ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2066> __________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com