andreas palsson wrote: > > Hi. > > I would just like to know if someone could give me a tip on how to > structure all the unicode-information in memory? > > All the UNIDATA does contain quite a bit of information and I can't see > any obvious method of which is memory-efficient and gives fast access.
You might want to evaluate some of the open source libraries mentioned under "Enabled Products" on the unicode site. For my own lib (http://www.let.uu.nl/~Theo.Veenker/personal/projects/ucp/) I've created a seperate table builder tool for each property or mapping. The tools organize data in planes, and for each plane all possible trie setups are determined (about 80 combinations of one, two or three stage tables). Then the cheapest setup is used. This still requires over 230kb to store all data (except character names and comments) from the following files: UnicodeData.txt, EastAsianWidth.txt, LineBreak.txt, ArabicShaping.txt, Scripts.txt, Blocks.txt, SpecialCasing.txt, CaseFolding.txt, BidiMirroring.txt, PropList.txt, DerivedCoreProperties.txt, DerivedNormalizationProperties.txt, and DerivedJoiningType.txt. For some mappings I've stored 32 bit code points where 16 bit would have been enough, but I decided API uniformness is more important than memory efficiency. I wouldn't bother too much about memory efficiency; it's irrelevant these days. Even your mobile phone has enough memory to store all unicode data 10..20 times. Same thing for lookup speed. All you have to do to get it fast is to wait (a few seasons). Theo

