Suggestion wrt XML Archetypes & Templates

Adam Flinton Fri, 30 Nov 2007 15:15:48 +0000

Heath Frankel wrote:
> Adam,
>
>   
>> i) The present situation/structure is dangerous.
>>     
>
> You need to get a better tool, Oxygen never splits an element value over
> multiple lines or adds whitespace.  A tool that automatically does this is
> dangerous.  I used to use XMLSpy and never experienced this, but after
> hearing this I am glad I was convinced to move to Oxygen.
>
>   
I like oxygen but


A) XMLSpy is our std tool
B) http://www.oxygenxml.com/xml_pretty_print.html
C) Anything doing pretty print (inc Oxygen) does the same things.

To quote from the oxygen xml page above:

"Although writing documents with no indentation is a perfectly 
acceptable practice, it makes editing difficult and is error prone. It 
also makes the identification of exact error positions difficult. 
Formatting and Indenting, also called "Pretty Print", enables the XML 
documents to be neatly arranged in a manner that is consistent and 
promotes easier reading."



>> ii) Pretty-print is the norm & even the ADL is pretty printed and has
>>     
> adopted
>   
>> a similar method to cope.
>>     
> Sure, but the tool should never add whitespace to a value, that is not the
> norm, it is simply wrong.
>  
>   

Not true.

See above wrt Oxygen XML's view. I can quote you the relevant sections 
from the XML docs e.g.

http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/

http://www.w3.org/TR/xmlschema-2/#rf-whiteSpace

>> iii) The solution simplifies the XML in terms of both processing and human
>> readability.
>>     
> I do not see this at all, in fact your solution breaks much processing which
> is derived directly from the Archetype Model.
>
>   
Why is that?

> Microsoft uses a lot of XML documents in its products and many of them use
> elements to contain values.  In fact if you go to W3Cschools you will see
> the majority of examples using element values, and this is a resource
> teaching the basics of XML.
>  
>   
For instructional documents aimed at those learning XML it is nice and 
simple.

If however you are looking to create a bullet proof serialization in XML 
where the values matter then it is a poor design.


>> iv) The solution shrinks the file sizes.
>>     
> Turning an element value into an attribute with name value saves a very
> minimal set of characters, I find it hard to see how you save a third.  In
> some cases you might save a third (such as lower_included) but in others you
> solution actually increase the size.  Take you example of lower and upper, a
> start tag of 5 characters, add the angle brackets and you have 7 characters.
> Using your solution, you have the attribute name of value, which is 5 plus 2
> quotes, an equals sign and a space between the tag and the attribute,
> totalling 9 characters.  
>
>   
Run the XSLT on set of files so as to get a reasonable average.  I have 
done so on the NHS ones. it is about 2/3'rds.

> In the case of occurrences (or DV_INTERVALs in general), I think we should
> treat the unbounded and included properties as attributes because they
> provide meta data about how to interpret the real data, lower and upper.
> You will never utilise the unbounded and included values in isolation, they
> are always used in conjunction with the lower and upper.  So I would suggest
> a change as follows:
>
>         <occurrences>
>             <lower included="true" >1</lower>
>             <upper unbounded="true"/>
>         </occurrences> 
>   

How about in a template e.g.

<Items archetype_id="openEHR-EHR-CLUSTER.symptom.v2" 
path="/data[at0001]/events[at0002]/data[at0003]/items[at0005]/items" 
xsi:type="CLUSTER">

vs say in a archetype where the same thing would be shown as:

<archetype_id><value>openEHR-EHR-ACTION.procedure.v1draft</value></archetype_id>

So are templates wrong & archetypes right or vice versa?

> The included and unbounded attributes exist for both lower and upper with
> default values of false.  Due to the openEHR assertions, you will never need
> more than 1 attribute on each element as included and unbounded cannot be
> both true. 
>
> The thing is, if we start entertaining these kinds of changes we will end up
> in endless debates based on the religious beliefs of XML style. 
This is not about style it's about safety.

I have been involved in many large scale XML projects. I have seen this 
before & it ends up with ugly situations. You can not assume whitespace 
will not be added as it is legitimate to pretty print a document.

If you are serious about a singular value it goes in an attribute.



>  Xml is just
> another computer language, all computer professionals have different styles
> when using those languages.  There is no right and wrong style, just
> guidelines, but these are usually employed for consistency purposes
> assisting the readability, not that one style is more ready than another.
> Currently, the schema is as consistent as you will ever get.  
>
> If anything is going to be changed, then the representation of INTERVAL is
> probably the only candidate (there may be another one or two in similar
> vein, meta data assisting in the interpretation of the value).
>
> Regards
>
> Heath
>
>   
Then at each stage involving the use of Archetypes and templates you are 
going to have to build in text normalization routines as per:

http://www.w3.org/TR/xpath#function-normalize-space

&

http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/

"[Definition:]  The *normalized value* of an element or attribute 
information item is an ?initial value? 
<http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/#key-iv> whose white 
space, if any, has been normalized according to the value of the 
whiteSpace facet 
<http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/datatypes.html#rf-whiteSpace>
 
of the simple type definition used in its ?validation? 
<http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/#key-vn>:

*preserve*
   No normalization is done, the value is the ?normalized value?
   <http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/#key-nv> *replace*
   All occurrences of |#x9| (tab), |#xA| (line feed) and |#xD|
   (carriage return) are replaced with |#x20| (space). *collapse*
   Subsequent to the replacements specified above under *replace*,
   contiguous sequences of |#x20|s are collapsed to a single |#x20|,
   and initial and/or final |#x20|s are deleted."

Also:

http://www.w3.org/TR/xmlschema-2/#rf-whiteSpace

&

http://www.w3.org/TR/REC-xml/#NT-S

"


      2.3 Common Syntactic Constructs

This section defines some symbols used widely in the grammar.

S <http://www.w3.org/TR/REC-xml/#NT-S> (white space) consists of one or 
more space (#x20) characters, carriage returns, line feeds, or tabs.


          White Space

[3]     |S|        ::=          |(#x20 | #x9 | #xD | #xA)+|

*Note:*

The presence of #xD in the above production is maintained purely for 
backward compatibility with the First Edition 
<http://www.w3.org/TR/1998/REC-xml-19980210>. As explained in *2.11 
End-of-Line Handling* <http://www.w3.org/TR/REC-xml/#sec-line-ends>, all 
#xD characters literally present in an XML document are either removed 
or replaced by #xA characters before any other processing is done. The 
only way to get a #xD character to match this production is to use a 
character reference in an entity value literal."

Adam


Note that that means that you would almost certainly have to specify 
collapse & that no values could ever start or end with a space or 
contain more than one contiguous space.





**********************************************************************
This message  may  contain  confidential  and  privileged information.
If you are not  the intended  recipient please  accept our  apologies.
Please do not disclose, copy or distribute  information in this e-mail
or take any  action in reliance on its  contents: to do so is strictly
prohibited and may be unlawful. Please inform us that this message has
gone  astray  before  deleting it.  Thank  you for  your co-operation.

NHSmail is used daily by over 100,000 staff in the NHS. Over a million
messages  are sent every day by the system.  To find  out why more and
more NHS personnel are  switching to  this NHS  Connecting  for Health
system please visit www.connectingforhealth.nhs.uk/nhsmail
**********************************************************************

Suggestion wrt XML Archetypes & Templates

Reply via email to