From: "Peter Constable" <[EMAIL PROTECTED]> > A software product could assign every single PUA codepoint to mean some > kind of formatting instruction, and insert these into the text like > markup. In that case, a user's PUA characters will be re-interpreted by > that software as formatting instructions. Is that product conformant? > Yes. Is it useful? Not for that user.
With a very simple transcoder, you could remap all HTML markup and supplementary end of lines used in markup into 256 PUAs. You would get a file that contains ALL the HTML markup but still complies to the Unicode plain-text definition. Rendering it back to HTML would use a reverse filter, and would create a HTML file without any PUA, so it would be rendered correctly. The only problem is that PUAs have no defined rendering, and Unicode does not specify ranges of PUAs for distinct uses, with distinct but predefined _default_ character properties: why isn't there a range for Mn diacritics, a range for ideographic letters or symbols, and a range for ignorable formatting controls (all of them with combining class 0). At least it would have allowed applications and renderer to behave correctly even in the absence of support for those PUAs, by using a correct _default_ rendering, instead of just displaying narrow white boxes, or nothing... I don't know why this would break anything: documents can still use PUAs the way they want with their own semantic and behavior. But suggesting distinct ranges for the default behavior would be a real bonus to help applications adopt a coherent behavior face to unknown or unspecified PUAs.

