Heath Frankel wrote: > Adam, > > >> i) The present situation/structure is dangerous. >> > > You need to get a better tool, Oxygen never splits an element value over > multiple lines or adds whitespace. A tool that automatically does this is > dangerous. I used to use XMLSpy and never experienced this, but after > hearing this I am glad I was convinced to move to Oxygen. > > I like oxygen but
A) XMLSpy is our std tool B) http://www.oxygenxml.com/xml_pretty_print.html C) Anything doing pretty print (inc Oxygen) does the same things. To quote from the oxygen xml page above: "Although writing documents with no indentation is a perfectly acceptable practice, it makes editing difficult and is error prone. It also makes the identification of exact error positions difficult. Formatting and Indenting, also called "Pretty Print", enables the XML documents to be neatly arranged in a manner that is consistent and promotes easier reading." >> ii) Pretty-print is the norm & even the ADL is pretty printed and has >> > adopted > >> a similar method to cope. >> > Sure, but the tool should never add whitespace to a value, that is not the > norm, it is simply wrong. > > Not true. See above wrt Oxygen XML's view. I can quote you the relevant sections from the XML docs e.g. http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/ http://www.w3.org/TR/xmlschema-2/#rf-whiteSpace >> iii) The solution simplifies the XML in terms of both processing and human >> readability. >> > I do not see this at all, in fact your solution breaks much processing which > is derived directly from the Archetype Model. > > Why is that? > Microsoft uses a lot of XML documents in its products and many of them use > elements to contain values. In fact if you go to W3Cschools you will see > the majority of examples using element values, and this is a resource > teaching the basics of XML. > > For instructional documents aimed at those learning XML it is nice and simple. If however you are looking to create a bullet proof serialization in XML where the values matter then it is a poor design. >> iv) The solution shrinks the file sizes. >> > Turning an element value into an attribute with name value saves a very > minimal set of characters, I find it hard to see how you save a third. In > some cases you might save a third (such as lower_included) but in others you > solution actually increase the size. Take you example of lower and upper, a > start tag of 5 characters, add the angle brackets and you have 7 characters. > Using your solution, you have the attribute name of value, which is 5 plus 2 > quotes, an equals sign and a space between the tag and the attribute, > totalling 9 characters. > > Run the XSLT on set of files so as to get a reasonable average. I have done so on the NHS ones. it is about 2/3'rds. > In the case of occurrences (or DV_INTERVALs in general), I think we should > treat the unbounded and included properties as attributes because they > provide meta data about how to interpret the real data, lower and upper. > You will never utilise the unbounded and included values in isolation, they > are always used in conjunction with the lower and upper. So I would suggest > a change as follows: > > <occurrences> > <lower included="true" >1</lower> > <upper unbounded="true"/> > </occurrences> > How about in a template e.g. <Items archetype_id="openEHR-EHR-CLUSTER.symptom.v2" path="/data[at0001]/events[at0002]/data[at0003]/items[at0005]/items" xsi:type="CLUSTER"> vs say in a archetype where the same thing would be shown as: <archetype_id><value>openEHR-EHR-ACTION.procedure.v1draft</value></archetype_id> So are templates wrong & archetypes right or vice versa? > The included and unbounded attributes exist for both lower and upper with > default values of false. Due to the openEHR assertions, you will never need > more than 1 attribute on each element as included and unbounded cannot be > both true. > > The thing is, if we start entertaining these kinds of changes we will end up > in endless debates based on the religious beliefs of XML style. This is not about style it's about safety. I have been involved in many large scale XML projects. I have seen this before & it ends up with ugly situations. You can not assume whitespace will not be added as it is legitimate to pretty print a document. If you are serious about a singular value it goes in an attribute. > Xml is just > another computer language, all computer professionals have different styles > when using those languages. There is no right and wrong style, just > guidelines, but these are usually employed for consistency purposes > assisting the readability, not that one style is more ready than another. > Currently, the schema is as consistent as you will ever get. > > If anything is going to be changed, then the representation of INTERVAL is > probably the only candidate (there may be another one or two in similar > vein, meta data assisting in the interpretation of the value). > > Regards > > Heath > > Then at each stage involving the use of Archetypes and templates you are going to have to build in text normalization routines as per: http://www.w3.org/TR/xpath#function-normalize-space & http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/ "[Definition:] The *normalized value* of an element or attribute information item is an ?initial value? <http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/#key-iv> whose white space, if any, has been normalized according to the value of the whiteSpace facet <http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/datatypes.html#rf-whiteSpace> of the simple type definition used in its ?validation? <http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/#key-vn>: *preserve* No normalization is done, the value is the ?normalized value? <http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/#key-nv> *replace* All occurrences of |#x9| (tab), |#xA| (line feed) and |#xD| (carriage return) are replaced with |#x20| (space). *collapse* Subsequent to the replacements specified above under *replace*, contiguous sequences of |#x20|s are collapsed to a single |#x20|, and initial and/or final |#x20|s are deleted." Also: http://www.w3.org/TR/xmlschema-2/#rf-whiteSpace & http://www.w3.org/TR/REC-xml/#NT-S " 2.3 Common Syntactic Constructs This section defines some symbols used widely in the grammar. S <http://www.w3.org/TR/REC-xml/#NT-S> (white space) consists of one or more space (#x20) characters, carriage returns, line feeds, or tabs. White Space [3] |S| ::= |(#x20 | #x9 | #xD | #xA)+| *Note:* The presence of #xD in the above production is maintained purely for backward compatibility with the First Edition <http://www.w3.org/TR/1998/REC-xml-19980210>. As explained in *2.11 End-of-Line Handling* <http://www.w3.org/TR/REC-xml/#sec-line-ends>, all #xD characters literally present in an XML document are either removed or replaced by #xA characters before any other processing is done. The only way to get a #xD character to match this production is to use a character reference in an entity value literal." Adam Note that that means that you would almost certainly have to specify collapse & that no values could ever start or end with a space or contain more than one contiguous space. ********************************************************************** This message may contain confidential and privileged information. If you are not the intended recipient please accept our apologies. Please do not disclose, copy or distribute information in this e-mail or take any action in reliance on its contents: to do so is strictly prohibited and may be unlawful. Please inform us that this message has gone astray before deleting it. Thank you for your co-operation. NHSmail is used daily by over 100,000 staff in the NHS. Over a million messages are sent every day by the system. To find out why more and more NHS personnel are switching to this NHS Connecting for Health system please visit www.connectingforhealth.nhs.uk/nhsmail **********************************************************************

