One of the Dublin papers talks about how this is done in ICU: http://www.unicode.org/iuc/iuc21/a347.html
Mark ————— Γνῶθι σαυτόν — Θαλῆς [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com ----- Original Message ----- From: "Geoffrey Waigh" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, April 21, 2002 03:28 Subject: Re: unidata is big > > I would just like to know if someone could give me a tip on how to > > structure all the unicode-information in memory? > > > > All the UNIDATA does contain quite a bit of information and I can't see > > any obvious method of which is memory-efficient and gives fast access. > > a) you see if there is a Unicode friendly library you can use that already > does this for you. > > b) you write a program to parse the file and extract what your application > needs. With clever data encoding you can pack most of the fields of > UNIDATA into a very tight space. Long ago in the Unicode conference > proceedings somebody illustrated how they used trie structures to > efficiently > build the lookup tables - the boring parts of the encoding space have > shorter branches than the areas where every codepoint is different from > it's neighbour. > > Geoffrey > > >

