In a message dated 2002-02-04 9:07:22 Pacific Standard Time, [EMAIL PROTECTED] writes:
>> In plain text, I think that plane 14 language tags could be used > > It seems to me that such usage confuses the meaning of "plain text". Use > of the plane 14 tagging characters to indicuate language would be markup > -- metadata that is separate from the content and that has some impact on > how the content should be processed. I'm afraid this is one place where Peter and I are forever destined to disagree. While Plane 14 tags do perform a markup-like function -- just as the directional overrides and variation selectors do -- they are discrete Unicode characters, and so, by definition, they are plain text. From TUS 3.0, page 16: "The Unicode Standard encodes plain text." > It's just a coincidence that the > markup uses distinct characters from the content. It's not a coincidence at all. Plane 14 in general, and the specific code points in particular, were intentionally chosen to ensure that the tag characters would not conflict with any other characters. In HTML, the string "<span lang="xh">" -- a sequence of ordinary ASCII characters -- has a special, higher-level meaning that is defined by the markup language. In another context, that string might not have the same meaning; another string might convey that meaning, or there might not be any such markup available. By contrast, the Unicode sequence U+E0001 U+E0078 U+E0068 has only one meaning, defined by the character encoding standard as clearly as it defines the letter A (if not more so). -Doug Ewell Fullerton, California (address will soon change to [EMAIL PROTECTED])

