On Aug 7, 2010, at 10:40 AM, Doug Ewell wrote:

> I'd like to see an FAQ page on "What is Plain Text?" written primarily by UTC 
> officers.  That might go a long way toward resolving the differences between 
> William's interpretation of what plain text is, which people like me think is 
> too broad, and mine, which some people have said is too narrow.
> 

Well, we do have <http://www.unicode.org/faq/ligature_digraph.html#10> and 
related FAQs?

The basic idea is that "plain text" is the minimum amount of information to 
process the given language in a "normal" way.  FOR EXAMPLE, ALTHOUGH ENGLISH 
CAN BE WRITTEN IN ALL-CAPS, IT USUALLY ISN'T, AND DOING IT LOOKS WRONG.  We 
therefore have both upper- and lower-case letters for English.  On the other 
hand, although English *is* usually written with some facility to provide 
emphasis, different media have different ways of providing that facility 
(asterisks, underlining, italicizing), and English written without any of these 
looks perfectly fine.  

Arabic, on the other hand, absolutely must have some way of allowing for 
different letter shapes in different contexts, or it looks just wrong, so 
Arabic "plain text" must have facility to allow for that, either by explicitly 
having different characters for the different shapes the letters take, or by 
providing a default layout algorithm that defines them.  

Beyond rendering, there are also considerations as to the minimal amount of 
information necessary for other text-based processes, such as sorting, 
searching, and text-to-speech.

Yes, there are issues which end up being judgment calls, and it's easy to come 
up with cases where you can't really capture the full semantic intent of the 
author without what Unicode calls "rich text."  My favorite example is "The 
Mouse's Tale" in _Alice in Wonderland_.   Plain text isn't intended to capture 
all the nuances of the original's semantics, but to provide at the least a very 
close approximation.

Variation selectors are intended to cover cases where more information is 
needed for rendering than is required for other processes such as searching 
(Mongolian), or cases where different user communities disagree on whether two 
forms must be unified or must be deunified.

=====
Hoani H. Tinikini
John H. Jenkins
[email protected]




Reply via email to