On 8/19/2011 2:35 PM, Jukka K. Korpela wrote:
20.8.2011 0:07, Doug Ewell wrote:

Of course, 2.1 billion characters is also overkill, but the advent of
UTF-16 was how we ended up with 17 planes.

And now we think that a little over a million is enough for everyone, just as they thought in the late 1980s that 16 bits is enough for everyone.


The difference is that these early plans were based on rigorously *not* encoding certain characters, or using combining methodology or variation selection much more aggressively. That might have been more feasible, except for the needs of migrating software and having Unicode-based systems play nicely in a world where character sets had different ideas of what constitutes a character.

Allowing thousands of characters for compatibility reasons, more than ten thousand precomposed characters, and many types of other characters and symbols not originally on the radar still has not inflated the numbers all that much. The count stands at roughly double that original goal, after over twenty years of steady accumulation.

Was the original concept of being able to shoehorn the world into sixteen bit, overly aggressive? Probably, because the estimates had always been that there are about a quarter million written "elements". If you took the current repertoire and used code-space saving techniques in hindsight, you might be able to create something that "fits" into 16-bits. But it would end up using strings for many things that are now single characters.

But the numbers, so far, show that this original estimate of a quarter million, rough as it was, appears to be rather accurate. Over twenty years of encoding characters have not been enough to exceed that.

The million code points are therefore a much more comfortable "limit" and, from the beginning, assume a ceiling that has ample head-room (as opposed to the "can we fit the world in this shoebox" approach of earlier designs).

So, no, the two cases are not as comparable.

A./


Reply via email to