Doug wrote: > Asmus Freytag <asmusf at ix dot netcom dot com> wrote: > > > Unicode 4.0 will be quite specific: P14 tags are "reserved for > > use with particular protocols requiring their use" is what the > > text will say more or less. > > I didn't know the question of what to do about Plane 14 language tags > had already been resolved. > > If that is the case, it might make sense to add an explanatory note to > the Public Review item on Plane 14 tags, or simply to remove the item.
The issue up for public review, as it states, is about formal *deprecation* of the Plane 14 Language Tags. The UTC already has consensus on limiting the use and contexts of use of the language tag characters. Such language was written into Unicode 3.1: "The [language tag] characters... provide a mechanism for language tagging in Unicode plain text. <emphasis>However, the use of these characters is strongly discouraged.</emphasis> The characters in this block are reserved for use with special protocols. They are <emphasis>not</emphasis> to be used in the absence of such protocols, or with <emphasis>any</emphasis> protocols that provide alternate means for language tagging, such as HTML or XML. The requirement for language information embedded in palin text data is often overstated. ... "Because of the extra implementation burden, language tags should be avoided in plain text unless language information is required and it is known that the receivers of the text will properly recognize and maintain the tags... "Language tags should also be avoided wherever higher-level protocols, such as a rich-text format, HTML or MIME, provide language attributes." This language is carried forward, as with the rest of the Unicode 3.1 and Unicode 3.2 text, into the consolidated text of Version 4.0 of the standard. The UTC also long ago approved UTR #20, which states that language tags... "...were solely included for the benefit of those Internet protocols, such as ACAP, which require a standard mechanism for marking language in UTF-8 strings, and at the same time to avoid the use of other tagging schemes that relied on specific details of the encoding form used." So what we are talking about here is not opening up again the wonderful world of what language tag characters are good for, and broadening their use. The issue on the table is: Because the UTC has determined that the use of language tag characters is to be strongly discouraged, and is limited in any case to very particular protocols, should the UTC take one step further and declare them formally *deprecated*? The result of the latter decision would be to add a statement to that effect in the block description in Unicode 4.0 for the language tag characters, and to add the code points U+E0001, U+E0020..U+E007F to the list of code points which get the Deprecated property in PropList.txt. That's it. That's what is on the table for comment and eventual decision by the UTC. My personal opinion? The whole debate about deprecation of language tag characters is a frivolous distraction from other technical matters of greater import, and things would be just fine with the current state of the documentation. But, if formal deprecation by the UTC is what it would take to get people to stop advocating more use of the language tags after the UTC has long determined that their use is strongly discouraged, then so be it. --Ken

