On 08/28/2018 04:26 AM, William_J_G Overington via Unicode wrote:
Hi
Mark E. Shoulson wrote:
I'm not sure what the advantage is of using circled characters instead of plain 
old ascii.
My thinking is that "plain old ascii" might be used in the text encoded in the file. Sometimes a file containing Private Use Area characters is a mix of regular Unicode Latin characters with just a few Private Use Area characters mixed in with them. So my suggestion of using circled characters is for disambiguation purposes. The circled characters in the PUAINFO sequence would not be displayed if a special software program were being used to read in the text file, then act upon the information that is encoded using the circled characters.

What if circled characters are used in the text encoded in the file?  They're characters too, people use them and all.  Whenever you designate some characters to be used in a way outside their normal meaning, you have the problem of how to use them *with* their normal meaning.  So there are various escaping schemes and all.  So in XML, all characters have their normal meanings—except <, >, and &, which mean something special and change the interpretations of other nearby characters (so "bold" is a word in English that appears in the text, but "<bold>" is part of an instruction to the renderer that doesn't appear in the text.)  And the price is that those three characters have to be expressed differently (&lt; &gt; &amp;).  I don't really see what you gain by branding some large swath of unicode ("circled characters") as "special" and not meaning their usual selves, and for that matter making these hard-to-type characters *necessary* for using your scheme, when you could do something like what XML does, and say "everything between < and > is to be interpreted specially, and there, these characters have the following meanings" and then have some other way of expressing those two reserved characters.  (not saying you need to do it XML's way, but something like that: reserve a small number of characters that have to be escaped, not some huge chunk.)
My thinking is that using this method just adds some encoded information at the start of the text file and does not require the whole document to become designated as a file conformant to a particular markup format.

That's another way of saying that this is a markup format which accepts a large variety of plain texts.  Because you ARE talking about making a "particular markup format," just a different and new one.

I guess there's not even any reason for me to argue the point, though, since it is up to you how to design your markup language, and you can take advice (or not) from anyone you like.  Draw up some design, find some interested people, start a discussion, and work it out.  (but not here; this list is for discussing Unicode.)

~mark

Reply via email to