In a message dated 2002-02-04 9:07:22 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

>> In plain text, I think that plane 14 language tags could be used
>
> It seems to me that such usage confuses the meaning of "plain text". Use 
> of the plane 14 tagging characters to indicuate language would be markup 
> -- metadata that is separate from the content and that has some impact on 
> how the content should be processed.

I'm afraid this is one place where Peter and I are forever destined to 
disagree.  While Plane 14 tags do perform a markup-like function -- just as 
the directional overrides and variation selectors do -- they are discrete 
Unicode characters, and so, by definition, they are plain text.  From TUS 
3.0, page 16:  "The Unicode Standard encodes plain text."

> It's just a coincidence that the 
> markup uses distinct characters from the content.

It's not a coincidence at all.  Plane 14 in general, and the specific code 
points in particular, were intentionally chosen to ensure that the tag 
characters would not conflict with any other characters.

In HTML, the string "<span lang="xh">" -- a sequence of ordinary ASCII 
characters -- has a special, higher-level meaning that is defined by the 
markup language.  In another context, that string might not have the same 
meaning; another string might convey that meaning, or there might not be any 
such markup available.

By contrast, the Unicode sequence U+E0001 U+E0078 U+E0068 has only one 
meaning, defined by the character encoding standard as clearly as it defines 
the letter A (if not more so).

-Doug Ewell
 Fullerton, California
 (address will soon change to [EMAIL PROTECTED])

Reply via email to