[Still off-topic, but I'm hopeful that progress can be made, so am continuing a little farther]
On 09/27/2002 10:26:36 AM "William Overington" wrote: >>XML is the way to go. > >Maybe, maybe not. The issue of U+003C being used to mean LESS-THAN SIGN in >documents which mix ordinary text and markup may or may not, depending upon >the application, be a problem. It really isn't a problem. XML provides other means to represent that character when it is needed as part of the content rather then as part of the markup. It is the job of an XML parser to sort that out, and there are various XML parsers that all handle this without a hitch and that are freely available. Someone made reference to MathML, which is a markup language built on XML (XML is a spec for building markup languages), and clearly mathematicians need to be able to represent this character within content, and the special use of U+003C for markup in XML was not seen in any way to be an obstacle. Your proposed markup convention would also need a parser to identify the pieces in a stream of data. If someone wants to use U+2604 in content, you would probably need some indirect way to represent it in a data stream. (E.g. One can imagine a hypothetical message "My favourite Unicode character is P1" into which someone might want to insert the COMET character.) So, I expect you'll have to deal with the same problem anyway. But this parser doesn't yet exist; some software developer will have to create it. On the other hand, XML parsers exist today. If you had been pursuing an XML-based approach, you might already be testing live prototypes rather than discussing a hypothetical system. Also, in an earlier message, you mentioned that you wanted to be able to use this messaging system on the Web. And, of course, you want to be able to represent U+003C directly in content. Did you realise that those two are contradictory? HTML has the same heredity as XML (both are implementations of SGML). It also uses U+003C for markup, and provides the same alternative means to represent that character as part of content. So, if one of the contexts within which you want your system to work is the Web, then you're going to have to deal with indirect representation of U+003C anyway. Since its already a magic character, why not let it be the magic character for your proposed protocol. XML really *is* the way to go. Please believe us. You don't need to believe me; believe Tex, Ken, Marco and the others who have offered you this recommendation. They really are among the most well-informed contributors to this list. BTW, my mail client (Lotus Notes, for better or worse) reports what time in *my* time zone an author wrote the given message. Such reporting of time in international communications is problematic; time zones need to be stated explicitly. We discovered this quite a while ago after scheduling a tele-conference; the half of the dept. in the UK assumed the time they saw was Dallas time (or maybe they suggested the time and we were reading it), but Notes had silently done a time zone conversion. - Peter --------------------------------------------------------------------------- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>

