Re: Plain text (from Re: Avoidance variants)

Ken Whistler Fri, 27 Mar 2015 11:37:06 -0700


On 3/27/2015 8:15 AM, William_J_G Overington wrote:

Or you could just redefine "&" and "<" as

----

That encapsulates what I do not like about using markup other than in very 
precise limited circumstances such as designing a web page.

The characters have defined meanings in Unicode: HTML changes those meanings 
for the purpose of writing web page source code.

This represents a fundamental misunderstanding of what Unicode characterencodingis all about. I realize that William is unlikely to be deterred from hisproject to incorporatevarious functions into what he conceives of as plain text, but in hopesof preventingother folk from following him down the garden path, let's consider thisscenario

further.

The Unicode Standard specifies the character encoding for:

U+003C "<" LESS-THAN SIGN
U+003E ">" GREATER-THAN SIGN

That specification clearly *identifies* the characters and their codepoints. The codecharts give the representative glyphs, to help in the identification.And the UnicodeCharacter Database provides precise specification of characterproperties for thesecharacters (as for all others), to assist in uniform and correctimplementations.

What the Unicode Standard does *not* do is define the *meanings* ofthese characters,in the sense of their meaning in use. That is entirely up to the peoplewho use them,and more particularly, to people or agencies or committees or whoeverdecides toapply such characters in particular orthographies, formal syntaxdefinitions, conventions, or

whatever.

Examples:

1. if a < b and c > 0 then ac < bc

Here we have a simple algebraic expression, with ">" meaning 'is greaterthan'and "<" meaning 'is less than'. Talk to the mathematicians for exactmeaning and usage.


2. <i>a</i>

Here we have the "<" and ">" being used as start and end markers of tags

in a markup scheme for text. Furthermore, the entire strings "<i>" and"</i>"

have further defined meaning as start and end of italic style runs. Talk to
W3C for exact meaning and usage.

3. ==> look here <==

Here we have a common ASCII plain text convention for use of "<" and ">"
are arrowheads for constructed arrows. Talk to... well, whoever, writes
plain text email these days for exact meaning and usage.

4. Following is some quoted plain text email:

> -R
>
>>
>> -- Ken
>>
>> On Dec 7, 2011, at 6:41 AM, Richard COOK wrote:
>>
>>> On Dec 6, 2011, at 12:19 PM, Ken Lunde wrote:
>>>
>>>> Richard,

Here we have another common ASCII plain text convention for use of ">" --
but this time it indicates both quotation and indentation. Repetition
of use of the ">" indicates repeated re-quotation and further indentation.

Talk to the implementers of plain text email clients for exact meaningand usage.


5. cout << "hello!" ;

Here we have an instance from C++ program text, where two "<" in
sequence represent a streaming operator. Talk to the documenters
of the C++ standard for exact meaning and usage.

6. template<class T>

Here we have a *different* instance from C++ program text, which
looks a little like HTML tags, but is not. In this case we are using
"<" and ">" again as paired delimiters ("angle brackets"), but the
syntax and interpretation is distinct. This is not a "tag". Talk to
the documenters of the C++ standard for exact meaning and usage.

7. <someb...@somewhere.com>

This is a convention used in email and other contexts, where 003C
and 003E used as paired delimiters (angle brackets) mark off an
email address or a URL. This might look like the HTML usage, but
it isn't. This isn't a tag. Talk to the implementers of email clients
and similar software for exact meaning and usage.

8. Jean a dit : << Je veux le faire. >>

Oops, here we have something different again. This is a *substitution*
use of 003C and 003E to emulate proper French guilllemet punctuation
marks. Poor guy doesn't have guillemets on his keyboard -- what is
he gonna do?!

I'm sure people could come up with many other examples in this vein.
The point of the long-winded exemplification is that characters
"mean" what people use them to "mean". As long as the *identity*
of the character is not in question and the code points are correctly
used and transmitted, then the plain text conformance requirements
of the Unicode Standard have been met.

And this is precisely as it should be. Just as it is not the business
of the Unicode Standard to dictate to anyone how they should spell
text, it is also not the business of the Unicode Standard to limit
or otherwise constrain what conventions of interpretation and/or
what additional layers of syntactic complexity (whether mathematics,
markup, or anything else) people build on top of text characters.


That use should not act as an Aunt Sally argument for stopping the addition of 
additional Unicode characters into regular Unicode.

Adding some additional characters for producing italics, bold and maybe colour 
as well into regular Unicode so that the facilities available for use in plain 
text format are extended. would, in my opinion, be a good thing.


It would be a bad thing. As Asmus has noted in this thread, proposals of
this ilk are dead on arrival at the UTC, because they do not understand
the appropriate layering of text processing. Just because a distinction
is *in* text, does not mean that it should be, ipso facto, defined in plain
text or encoded in characters.

--Ken

_______________________________________________
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

Re: Plain text (from Re: Avoidance variants)

Reply via email to