On 09/12/2000 12:18:37 PM Michael Everson wrote: >I thnk there are codes given to entities in the Ethnologue list that aren't >languages in the sense that we need to identify languages in IT and in >Bibliography (which is what the codes are for). Perhaps there is a cat that needs to be let out of the bag here. ISO 639 codes were primarily intended for bibliography purposes. Gary and I point out in our paper that the needs of that sector do not necessarily correspond to the general needs of IT, particularly for language-specific processing. A tag that denotes a group of languages serves no useful purpose for most language-specific processes. For example, if all you know about the language of some information object is that it is an Athapascan language, you can't spell-check that information. The intro to ISO 639 claims that the standard is intending to serve the needs of a variety of sectors, but in its current state it is failing to adequately serve some. We're not arguing that it is of no use, but it is an open question as to whether bibliographic codes were the best starting point for general IT use. Regardless, we have them, and they are already in use. The important question then is how to move forward to find something that will serve all sectors of IT. Furthermore, we would contend that the categories enumerated in the Ethnologue by-and-large *are* the categories that need to be identified for general IT purposes. In the majority of cases, the distinctions made are those that would be needed to successfully spell-check, for example. (We acknowledge that that is not true in all cases; for example, Chinese spelling would cross multiple languages; and alternate English spellings are needed for what would generally be considered one language. But these are the exceptions, not the norm.) >I think that it is not >mature for International Standardization. It is a work in progress, subject >to change. As such it is a living document. Change is needed as the objects described change and as our knowledge of the objects change. This is no less true of several ISO standards: 10646, 3166,... It is especially true of 639: for example, currently if someone wants to tag a document containing Hopi text, they would need to use the tag nai "North American Indian (other)". Suppose in two years time there is a specific code for Hopi added to ISO 639-2; consider what happens to that existing data: it is now *incorrectly* tagged (not just sub-optimally tagged), because nai no longer includes Hopi since that now has its own code. Every time a new code is added to ISO 639, the meaning of some existing codes changes. That is at least as serious a concern that a person would likely encounter with any changes to the Ethnologue, and it is probably more serious. Please don't assume that carefulness in defining ISO 639 will avoid problems. It already has inescapable problems. We need to understand those problems and learn to manage them, and that will be made rather easier if we quickly expand to include a comprehensive enumeration of modern languages. Yes, that will not solve all problems, but it will be a beneficial move forward. >I don't see what the hurry is. Make a list of 100 languages that you *need* >codes for urgently. Make a list of another 100 after that. Encode languages >that you *really* need codes for. That's what I mean by saying "just >because it's in the list doesn't mean it should get a code". Considering only those languages in which we have been involved, SIL has an immediate need for a couple of thousand codes. But we know that many others have similar large-scale needs that collectively include the entire Ethnologue list. There are *lots* of people asking for this, not just me, not just SIL. - Peter --------------------------------------------------------------------------- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>

