Re: Keys. (derives from Re: Sequences of combining characters.)

Barry Caplan Sat, 28 Sep 2002 10:11:38 -0700

At 12:23 PM 9/27/2002 +0100, William Overington wrote:
>Are you perhaps trying to make a deduction by the fallacy of the
>undistributed middle, along the following lines.
>
>William's need is a markup system.
>XML is a markup system.
>
>William's need is XML.


I think what is being suggested is not nearly so obvious as that. It is more along the 
lines of:

William's need is a product of which data interchange is a key feature
Said product needs a architecture and a business model
Data interchange happens both externally and internally within the program
The business model chosen may indeed require a non-xml system
XML data interchange is better supported than any proprietary system.
If non-xml is chosen for the outside system, it should be converted to xml as early as 
possible for inbound, and as late as possible for outbound interchange in order to 
capitalize on xml tools

Of course, if the system is closed on the outside, and useful, it will be quickly 
duplicated by someone using open interchange formats anyway,  but that advice on how 
to handle that situation only comes at a price :)


>I am simply saying that XML, as I understand it, does not suit my specific
>need.

It may be, that you don't understand your need well enough to understand why XML for 
outside interchange is an extremely strong contender.

>text cannot be used directly.  For me, that is a major limitation of XML.

Why is it a "major limitation" of XML? Have not already over a million applications 
and web sites been implemented using XML technology? Is there a record of anyone ever 
griping about this limitation at all?

>legacy issue of which I do not want to have the problem with my research in
>language translation and distance education. 

How so? A single line of code will automatically escape any characters as needed.


> Maybe one day Unicode will
>encode special XML opening and closing angle brackets so that XML can
>operate without that problem.  

This is not up to Unicode to decide, it is XML's choice to specify the way its tags 
are constructed. XML's family tree starting with SGML (or earlier for all I know) and 
going through HTML pretty much constrains it. Trillions of people know <> as the tag 
delimiter. Earlier markup languages used a . PERIOD in the first character in a line 
as a delimiter - I think RTF is of this heritage. when was the last time someone 
mentioned they were creating or editing a RTF file compared to *ML?

>However, as XML uses the U+003C character in
>that manner at the moment, for me it is a problem and it has led me to use
>the key method using a comet circumflex key.

Instead of typing a trivial escape character in the rare case of a < in the content 
you want to force people to type weird Unicode characters in every tag?


>Also, I do not need to have all those " characters and = characters and /
>characters within messages.

Have not thought the problem all the way through? Why on earth would you want your 
message creators typing raw XML anyway? You are going to need some other UI, right? 
And that "message editor" can generate the XML, complete with escapes, using existing 
code you can have for free. This frees your time from having to create your own wheel 
and maintain it.


>Well, U+2604 U+0302 U+20E3 is not ridiculous.  It is entirely permissible
>within the Unicode specification.  

He is not saying it is ridiculous because it is not within the specification. He is 
saying it is ridiculous because the development community as a whole (a very large 
whole), both closed source and open source advocates, is rallied around  XML as a 
basis for data interchange. If you ever wanted to move your comet files to another 
system, or create them from data in an existing system (such as Trados or another 
translation memory), you will need a 2 way XML<->Comet converter anyway. Why bother?

>you think it ridiculous then maybe that is good evidence of its originality
>as a piece of creativity.  

I am sure it will create a pretty glyph. But software creation is about way more than 
pretty glyphs.

>A comet circumflex key could be viewed as a piece
>of original art.  I specifically designed it so as to be a design which
>involves an inventive leap so as to produce something new and unexpected,
>which someone "skilled in the art" would not produce as the application of
>skill in the existing art without invention, yet which would display
>properly using an all-Unicode font.

This sounds a lot like you are planning to trademark or patent a character. I would 
personally travel to the ends of the earth to testify that all possible combining 
sequences are described as prior art in the description of how to create them in the 
Unicode specification and thus can never be proprietary. Now if you want to have a 
graphic artist draw a logo of a comet with a box around it, that is your prerogative. 
But the idea that combining characters in any fashion is somehow proprietary is not 
ridiculous, is it just a waste of time. In case you think otherwise, I can write a 5 
line perl program to run on a spare machine that will create prior art of every 
possible combination of characters.. I can let it run forever and hook it to a web 
server to make it visible too.

>An added bonus of using the comet circumflex key is that documents
>containing comet circumflex codes do not necessarily need to contain any
>characters from the Latin alphabet.

Why is this a bonus, let alone an added one? I have a 4 year old niece just learning 
the "latin alphabet" and as far as I can tell it hasn't changed since I learned it. 
There is no +U003C character in that alphabet.

In fact, the bonus of using 3C as a delimiter (along with other XML delimiters) is 
that they are in every legacy encoding, meaning if no Unicode tools are available for 
editing, a regular text editor can be used and the conversion to Unicode can happen 
later.

Your method requires Unicode support and fonts (not the same thing) at the editing 
stage, which is not realistic unless you want to limit your community to a few of your 
closest friends so to speak.

No one is suggesting such a system can't be built, only that its usefulness would be 
strongly limited for a lot of very good reasons. As others have noted, I concur that 
this is not really a Unicode issue per se, but a software design issue.

Barry Caplan

Re: Keys. (derives from Re: Sequences of combining characters.)

Reply via email to