Marco Cimarosti wrote: > > John Aurelio Cowan wrote:) > > Marco Cimarosti scripsit: > > > Talking about the format of mapping tables, I always > > > wondered why not using ranges. In the case of ISO > > > 8859-11, the table would become as compact as > > > three lines: > > > > Well, that wins for 8859-1 and 8859-11 and ISCII-88, where Unicode > > copied existing layouts precisely. But it wouldn't help other 8859-x > > much if at all, > > All 8859 tables would be more succint. > > Non-Latin sections use contiguous ranges of letters in alphabetical order > or, however, in the same order used by Unicode; this is also true for most > other non-ISO charsets. > > Latin sections are a worse case, but they still benefit slightly, because > characters shared with Latin-in stay the same positions. > > > and it requires binary search rather than direct > > array access, which would be a terrible lossage in CJK, where the > > real costs are. > > I agree. In the case of CJK it simply doesn't pay.
If I may add my two cents; IMO using search algorithms to reduce table size doesn't pay in any case. If one uses fast one/two-stage lookup tables for both mappings (legacy to unicode and v.v.) then most tables require about 3 kb or less of storage space. Approx. times 10..30 for CJK encodings. Compared to the 256 Mb in a typical PC each lookup table would consume 0.001% (or 0.01-0.03% for CJK) of main memory. My point is it is better to concentrate on processing speed than on table foot print. Theo

