Steffen, FYI, Unicode 7.0, when it comes out, will have another entire bicameral (casing) script added to it: Warang Citi. And when Old Hungarian is finally published, at some point after Unicode 7.0, that will be *another* bicameral script added. It is unlikely that those two will be the last. And those are in addition to the continual trickle of case pairs to already existing bicameral scripts like Latin and Cyrillic.
It is a false economy for a general Unicode library implementation to be overly clever about how it compresses tables, such as casing tables. That approach can get you into trouble when something else is added to the standard which breaks your initial assumptions. If you want to do this kind of thing, my suggestion would be instead to do a two-step process: first implement a general table which can always be easily updated based on new additions to UnicodeData.txt (and/or SpecialCasing.txt and CaseFolding.txt, depending on what kind of case tables you are implementing), and which doesn't worry too much about table size. Then write a *separate* optimization step which can compress your generic table format into a more compact format. If you do it that way, your adaptation to future additions to the standard can be much more robust, while still optimizing for minimal table size. --Ken > > > I have been able to compress all lower-, upper- and titlecase > mappings, simple and extended (no conditions yet) of Unicode 6.2 > into a 260 entry binary search array. > I'm not with this project at the moment, but looking at the > alloc/Pipeline.html it *could* be that those few characters alone > will add maybe 10 (sorry..) more slots, if the presence of SMALL > or CAPITAL indicates they'll be Lt/Lu/Ll or will have an entry in > `SpecialCasing.txt'. > I hope that this wonderful thing that is the UCS will not become > blurred -- memory size is still a concern for some people. > (Reading how the process works doesn't give a lot of hope, yet > that is what came to my mind.) > Ciao, > > --steffen

