Leonardo Boiko <leoboiko at gmail dot com> wrote:

There’s some information lost when we render our “plain text” as ancient text. Similarly, there’s some information lost when we render handwritten text, typeset text, or computer “rich text” to plain text. It seems to me these two losses are different only in degree, not in kind.

I think you've proven that rich text can convey information beyond what plain text can convey. I don't dispute that. I disagree that it disproves the existence of plain text as the foundation upon which rich text is built.

The whole premise of reading and writing is that we look below the surface to the identity of the letters and the meaning of the words.

No, the whole premise of reading and writing is to represent language, which is spoken, in a visual manner. Nothing to do with letters; letters are just tools for representing language.

Well, of course that was a gross oversimplification of the complex processes of reading and writing. For that matter, reading (for example) an allegory requires yet another level of interpretation, to understand the story behind the story. And of course the reference to "letters" isn't quite right for languages like Chinese anyway.

But to imply that because text always has a specific appearance, determining the underlying plain text is an artificial process that was imposed on us by computers seems wrong. We (meaning "readers of alphabetic scripts, at least Latin and Cyrillic") learn to recognize letters at an early age, but quickly run into additional glyphs we don't recognize, like certain cursive uppercase letters (especially G and Q) and the two-tier vs. one-tier lowercase a and g. Then we find out they are different forms of the same letter, and learn to read them the same, and that is the essence of "plain text"—the underlying letters behind potentially differing glyphs.

You cannot read without re-creating sound images in your head.

Is this true?

In any case, perhaps what is needed is not an FAQ page on plain text, which begs all these philosophical questions, but rather on what is within scope for Unicode and what is out of scope. Of course, that may be asking too much as well; both the Unicode Standard and the Principles and Practices document already attempt to answer this question, but run into conflicts with existing practice on short-lived symbols. Maybe (though I don't personally believe so) the concept of "plain text" has become so passé that William's variation selectors for swash e's, and additional ligatures, and weather reporting codes, and Portable Interpretable Object Code may one day be considered "within scope" for Unicode.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­


Reply via email to