William Overington <WOverington at ngo dot globalnet dot co dot uk> wrote:
> For example, a recent experiment, documented > in the archives of this list as The Respectfully Experiment, shows > that there is now new evidence about the facts regarding the encoding > of code points for ligatures... and <Peter_Constable at sil dot org> responded: > Also, I don't recall posts from this list detailing the experiment > referred to above, which admittedly leaves me at a disadvantage. <sigh /> OK, here are the details. I'm reluctant to admit having been part of this "experiment," since it is now being presented as evidence to support the proliferation of private-use ligatures. But anyway: In late May, William began suggesting that code points in the Private Use Area be assigned to Latin ligatures, so that they could be represented in plain text without the use of smart fonts, ZWJ sequences, etc., which he claims are only available to users of "the very latest" hardware and software. In particular, William proposed the PUA code point U+E707 for the Latin "ct" ligature, and that particular ligature was used as an example throughout the thread. On 2002-05-31, I wrote a response which ended "Respectfully, Doug," except that I used William's code point U+E707 in place of the letters "ct." My intent was that everyone on the Unicode list, including William, would see "Respe<black box>fully," thus demonstrating the lack of interoperability of this PUA solution. Only users of a font that happened to contain William's PUA character would see the ct ligature, and I didn't think any such font existed. Much to my surprise, however, James Kass had modified his private version of the Code2000 font to include William's ct ligature at U+E707, and he was using it to read my message, so oiut of everyone on the list, he alone did see the ligature. William observed that I had sent, and James had received, a ct ligature at U+E707, based not on any private arrangement but on our mutual (and coincidental) use of William's code point. He latched onto this chain of events as proof that end-user publication of PUA code points was a success, and named it "The Respectfully Experiment," despite my protests that the whole incident was a freak accident. (I think "ct" was the only one of William's "golden" ligatures for which James had provided a Code2000 glyph.) William continued: > ... because it has now been realized that > such code points can now be used in conjunction with ZWJ type > mechanisms of advanced font technology formats as an alternative > method of coding to assist people with less than the latest equipment, > such code points for ligatures working in conjunction with advanced > font technology rather than as an alternative to it which is the way > that such code points were regarded when a decision not to encode any > more of them without a strong case was taken, though even for that > decision there was provision for the general thrust of that decision > to be overridden in the light of future evidence. and Peter replied: > ... my knowledge of the Unicode standard and of advanced font > technologies leaves me rather puzzled about how "...code points [for > ligatures] can now be used in conjunction with ZWJ type mechanisms > of advanced font technology...": codepoints for ligatures are > unnecessary because of advanced font technologies. ZWJ does not work > in conjunction with encoded ligatures because encoded ligatures > aren't needed; and if they existed, ZWJ would not particularly > interact with them in any usage that has been described as part of > the Unicode standard. I don't think William really meant "in conjunction with" in the sense that ZWJ would be applied directly to precomposed ligatures. He meant in the same document, or maybe not even that, maybe just that both types of ligation (precomposed and ZWJ) would be conformant to Unicode. There is no new revelation here, though. Nothing "has now been realized" that wasn't already known. Because of the Latin compatibility ligatures already encoded from U+FB00 through U+FB06, it has always been possible to encode (say) an "fi" ligature using either the ZWJ method or the precomposed ligature at U+FB01, or both according to whimsy. (Or it might just magically appear without any special encoding at all, as John Jenkins could point out.) It makes no difference whether a PUA code point or a standardized Unicode code point is used for the precomposed ligature. Nothing about "The Respectfully Experiment" -- and no appeals that users of 80386 PCs must be able to reproduce 18th-century ligatures in plain text -- will serve as the "evidence" that William is waiting for to overturn the decision not to encode any more precomposed ligatures. > There is at present a barrier, which I feel might be called the markup > barrier, which is acting as a barrier to progress. This is not a barrier; it's just a distinction. People who are VERY familiar with the issues have drawn a line between: (a) plain text, which represents content, and (b) markup, which represents formatting. Reasonable people can differ as to exactly where the line should be drawn, and there are indeed some examples where Unicode has tiptoed over the line. But most of the examples are now either deprecated or "strongly discouraged." > I wonder whether > the markup barrier is some absolute barrier or whether it is just a > temporary thing which exists in people's minds It's not the law of gravity. It's a decision that has been made by humans who have studied the issues and drawn a line. > Is the markup barrier absolute or is it just that the markup barrier > is regarded as being an absolute barrier because of a fallacy of > reasoning in that whereas > > 1. markup is useful in some circumstances, > > 2. markup provides the opportunity to encode a system without a need > to have additional code points, > > 3. markup does not have a requirement to have meanings assigned to > code points by a standards committee, > > that those reasons, set against a historical background, have led to a > view that code points cannot be used for things for which markup is > presently used, notwithstanding that there seems to be nothing in the > definition of character that is being used that would seem to go > against using code points directly for such meanings as 36 POINT and > GREEN. You know what? Several of us in technology fields have this nasty habit of trying to embrace ANY change that promises even an 0.01% improvement, regardless of the costs involved in making such a change. I see it in all the time software development, where perfectly good structured-programming techniques are scorned if they are not sufficiently object-oriented, and yesterday's One True Way of C++ is now being cast aside in favor of C#. I see the same effect here, where despite the completely adequate mechanisms we have today for encoding rich text -- HTML, XML, proprietary formats like Word, even old-fashioned RTF -- someone thinks it could all be done better with character encoding. Well, there's a pretty substantial installed base of the existing technologies, and I'm sorry if this sounds like "stifling progress," but it's not enough for a new solution to be simply "better"; it must be "better enough." The benefit has to justify the cost. Even if there are advantages to encoding greenness and 36-point-ness with standardized character codes instead of with existing markup methods (which I'm still not convinced is true), the advantages simply aren't great enough to displace the existing methods. What we have right now is not broken. By contrast, Unicode itself IS "better enough" than the previously existing hodgepodge of character encodings to justify making the switch. (Well, maybe not for everyone yet, as the parallel Shift-JIS thread shows.) > It would be a very interesting exercise for people to discuss exactly, > precisely what Unicode is not, giving detailed reasoning for each such > claim rather than simply saying that that is how it is, because if > suggested formal limitations of the scope of what could be encoded > start to be suggested then it may be that with its usual vigour that > this discussion forum would result in counter examples which would > make many of the suggested limitations obsolete. It would be more than just an exercise, it would be a great idea, an excellent addition to the Unicode Web site, and one of the most on-topic threads I've seen in the 6 years I've been on this list. But please bear in mind, first and foremost, there ARE some things that the Unicode Standard is "for" and some things it is not "for." Please follow my advice and that of others on this list, more expert than I am, and read the Standard and the Technical Reports that have been mentioned. That will, hopefully, give you the background you need to make intelligent contributions to the discussions. -Doug Ewell Fullerton, California

