Andrew Patterson wrote: >> One of Andrew's issues I don't think matters too much - the quoting of >> &; this is because & is used to quote the '&' character. However, I >> agree that using a more uniform '\'-based quoting approach will be >> clearer, and make for easier parser construction. So let's say that we >> will go for the \uHHHH \UHHHHHHHH approach. >> > > Are you saying that the \u quoting will be used instead of the > XML quoting or in addition to? If you are saying the first, please > ignore the following rant :) > I am following your original suggestion, to replace the current XML quoting rules with \u and \U (since we already use \ to quote anyway, and as you point out, the & stuff is ugly.) > I still think the & is needlessly confusing and pointless. My > issues are: > > 1) it is completely non-obvious - as an ADL user I would never expect to use > the > XML quoting rules in the string definition in ADL because ADL > is clearly not an XML document.. sure, it has bits that are like XML, but > if you want it to be XML, then go the whole way. More importantly, > for one of the target groups of ADL, the clinicians, it is a behaviour > that I imagine could confuse them. They have never heard of XML > quoting rules and hence may just type in strings like > "term code meaning pain to head & chest" in their ADL strings. > Now this may be mitigated by the fact that they will often > be editing ADL in a tool, but if ADL is only going to be edited > by tools we should drop the human parseable format and > do the whole thing in XML. > agreed. I personally don't see XML as useful other than a purely literal transfer syntax, i.e. a serialisation of objects. ADL is an abstract syntax, which is both readable by humans, and for which abstract parsers can be written; the parser that can read the XML form (which will be supported fairly soon, but is completely unreadable) is a pure object serialiser/deserialiser, not a language parser. > 2) It is a pain to implement - now every ADL parser needs to > have an XML entity converter built in as well - which entities are > included - just the XML ones (< > &)? What about the > HTML/SGML ones (´ `)?? Does every ADL > implementation need to have the table of standard unicode > names built in to be able to parse strings? Do angle brackets > need to be quoted - they do in XML but that is because they > have special meaning. Yet within ADL strings they don't. Of course, > the two characters that do need to be quoted are the \ and the > quotation mark. Are these quoted in XML? Not by default, and > so now the XML programmers are confused :) > yes, I also agree with this ;-) > >> choose the allowable encoding names (UTF-8 is the default in openEHR for >> true unicode; the other will presumably be ISO-8859-1); we then need to >> specify which encoding is assumed for an ADL file with no encoding >> marker; I propose that it is UTF-8, since we already have "cracked" that >> problem, and we say that it is only ISO-8859-1 if it actually says so. >> This might sound odd, but remember UTF-8 is a proper superset of ASCII >> anyway, so for all us western language people wondering if our files >> will look funny, they won't. However, we could do it the other way round >> - I don't see any terribly strong arguments one way or the other. >> > > I think you are right that it should default to UTF-8. I am not sure > the correct way of putting the encoding marker in - if its a standard > archetype field then the parser is obviously well into parsing the > file before it finds out what encoding the file is in? Which then > invalidates encodings such as UTF-16 because it would be impossible > to write even the first "archetype" keyword in such a way that the > parser could parse it. > It probably has to be on the first line, which is easy enough to deal with. At this stage, I think it s reasonable to just allow UTF-8 and ISO-8859-1 only. UTF-16 et al need byte order markers at the start of the file (which removes the need for the encoding indicator in the file I guess); but let's not go there yet. > I actually don't feel too strongly that ADL needs to be 7-bit safe > (i.e. I would be happy with UTF-8 as the default and leave it at that > - still including the \uxxxx rules to allow the insertion of characters > that are hard to _edit_, but assume UTF-8 can be read/transported). > Is there any web/email transport mechanism in existence now that > can't pass through an 8-bit stream untouched? Even moreso, is there > any modern environment that can't parse UTF-8?? (keeping in mind > that this is not saying that openEHR systems won't have to exchange > data with old legacy systems, but I doubt the openEHR system will be > sending the legacy systems ADL files to parse??) > well, Notepad and gvim on Windows don't get it right....but that may just be display...
- thomas _______________________________________________ openEHR-technical mailing list openEHR-technical at openehr.org http://www.chime.ucl.ac.uk/mailman/listinfo/openehr-technical

