Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

Asmus Freytag Fri, 19 Aug 2011 17:28:46 -0700

On 8/19/2011 2:35 PM, Jukka K. Korpela wrote:

20.8.2011 0:07, Doug Ewell wrote:
Of course, 2.1 billion characters is also overkill, but the advent of
UTF-16 was how we ended up with 17 planes.
And now we think that a little over a million is enough for everyone,just as they thought in the late 1980s that 16 bits is enough foreveryone.

The difference is that these early plans were based on rigorously *not*encoding certain characters, or using combining methodology or variationselection much more aggressively. That might have been more feasible,except for the needs of migrating software and having Unicode-basedsystems play nicely in a world where character sets had different ideasof what constitutes a character.

Allowing thousands of characters for compatibility reasons, more thanten thousand precomposed characters, and many types of other charactersand symbols not originally on the radar still has not inflated thenumbers all that much. The count stands at roughly double that originalgoal, after over twenty years of steady accumulation.

Was the original concept of being able to shoehorn the world intosixteen bit, overly aggressive? Probably, because the estimates hadalways been that there are about a quarter million written "elements".If you took the current repertoire and used code-space saving techniquesin hindsight, you might be able to create something that "fits" into16-bits. But it would end up using strings for many things that are nowsingle characters.

But the numbers, so far, show that this original estimate of a quartermillion, rough as it was, appears to be rather accurate. Over twentyyears of encoding characters have not been enough to exceed that.

The million code points are therefore a much more comfortable "limit"and, from the beginning, assume a ceiling that has ample head-room (asopposed to the "can we fit the world in this shoebox" approach ofearlier designs).


So, no, the two cases are not as comparable.

A./

Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

Reply via email to