Peter Kirk suggested: > I am suggesting that the best way to get the job done properly is to lay > the conceptual foundation properly first, instead of trying to build a > structure on a foundation which doesn't match...
Part of the problem that I think some people are having here, including Peter, is that they are ascribing the wrong level to the Unicode Standard itself. The Unicode Standard is a character encoding standard. What it standardizes are the numerical codes for representing abstract characters (plus quite a number of related things having to do with character properties and algorithms for manipulating characters in text to do various things). The Unicode Standard is *NOT* a standard for the theory or process of character encoding. It does not spell out the rules whereby character encoding committees are constrained in their process, nor does it lay down specifications that would allow anyone to follow some recipe in determining what "thing" is a separate script and what is not, nor what "entity" is an appropriate candidate for encoding as a character and what is not. Ultimately, *those* kinds of determinations are made by the character encoding committees, based on argumentation made in proposals, by proponents and opponents, and in the context of legacy practice, potential cost/benefit tradeoffs for existing and prospective implementations, commitments made to stability, and so on. They don't consist of the encoding committees -- either one of them -- turning to the Unicode Standard, page whatever, or ISO/IEC 10646, page whatever, to find the rule which determines what the answer is. In fact the answers evolve over time, because the demands on the standard evolve, the implementations evolve, and the impact of the dead hand of legacy itself changes over time. It is all fine and good for people to point out the dynamic nature of scripts themselves -- their historic connections and change over time, which often make determinations whether to encode particular instantiations at particular times in history as a "script" in the character encoding standard notably difficult. But I would suggest that people bring an equivalently refined historical analysis to the process of character encoding itself. We are dealing with a *very* complex set of conflicting requirements here for the UCS, and attempting a level of coverage over the entire history of writing systems in the world. Even *cataloging* the world's writing systems is immensely controversial -- let alone trying to hammer some significant set of "historical nodes" into a set of standardized encoded characters that can assist in digital representation of plain text content of the world's accumulated and prospective written heritage. Contrary to what Peter is suggesting, I think it is putting the cart before the horse to expect a standard theory of script encoding to precede the work to actually encode characters for the scripts of the world. The Unicode Standard will turn out the way it does, with all its limitations, warts, and blemishes, because of a decades-long historical process of decisions made by hundreds of people, often interacting under intense pressure. Future generations of scholar will study it and point out its errors. Future generations of programmers will continue to use it as a basis for information processing, and will continue to program around its limitations. And I expect that *THEN* a better, comprehensive theory of script and symbol encoding for information processing will be developed. And some future generation of information technologists will rework the Unicode encoding into a new standard of some sort, compatible with then-existing "legacy" Unicode practice, but avoiding most of the garbage, errors, and 8-bit compatibility practice that we currently have to live with, for hundreds of accumulated (and accumulating) reasons. --Ken

