Suggestion wrt XML Archetypes & Templates

Adam Flinton Thu, 29 Nov 2007 14:45:03 +0000

Dear All,

1) Problem statement
2) Solution
3) Points to Note
4) XSLT Sheet
5) Summary


1) Problem statement

I have been writing an OpenEHR publishing & QA routine which is 
basically Ant, which includes running  XSLT tasks for the NHS.

There is a problem with the current structure of the XML archetypes & 
templates which is that the values are contained as a text() child of an 
element & sometimes as the text() child of a value child of the element.

This is dangerous & (IMHO) wrong.
The reasons being that :
A) a single value of that sort should be contained in an attribute.
B) It leads to a world of pain wrt "pretty-print"/indentation.

As an example, XMLSpy will automatically pretty print XML because that 
makes it readable to the  (human) reader. Equally XSLT sheets often use the

indent="yes"in the output declaration.

    <xsl:output method="xml" version="1.0" encoding="utf-8"
        indent="yes" />

 
Firstly it means that what looks like
<rm_type_name>
                            ELEMENT
</rm_type_name>

is actually:

&#xA;&#x9;&#x9;&#x9;&#x9;&#x9;&#x9;&#x9;&#x9;&#x9;&#x9;&#x9;&#x9;&#x9;ELEMENT&#xA;&#x9;&#x9;&#x9;&#x9;&#x9;&#x9;&#x9;&#x9;&#x9;&#x9;&#x9;&#x9;

As a really quick example of this, get an XML Archetype, open it in 
XMLSpy, press save, now open it in the Ocean Archetypes editor.

Admire the way the text now is all over the place and has empty square 
boxes for the line endings (i.e. &#xA;)

Now try and save as an ADL.

If you save as an XML, the formatting etc is retained. Basically open up 
an xml archetype in XMLSpy, click save and you have a corrupt archetype.

Before people decry pretty printing per se bear in mind that :

i)  single long string in not readable
ii) the adl is pretty printed....i.e. your adl files do not come as one 
long string but are formatted in much the same way as XML is pretty 
printed. The adl takes care of this in basically the same way I am going 
to suggest that the XML does ie.

description = <"Clinical description of the meconium">
vs
description = Clinical description of the meconium
or
description =
                                Clinical description of the meconium

etc.

Any XLST which tries to extract values from the present structure must 
engage in code such as:

<xsl:variable name="tab">'&#x9;'</xsl:variable>
       <xsl:variable name="nl">'&#xA;'</xsl:variable>
       <xsl:variable name="v_rm_type_name_no_pp"
           
select="translate(translate($v_rm_type_name/text(),$tab,''),$nl,'')" />

& that in itself is dangerous as some editor might put in some 
formatting chars which are not being filtered out.

2) Solution:

Instead of using a text child, any value should go in a value attribute e.g.
<items  id="description">
Clinical description of the meconium
</items>

becomes:

  <items value="Clinical description of the meconium" id="description"/>

3) Points to Note:

A) The result is actually closer to the adl e.g.

         <items code="at0061">
            <items value="Clinical description of the meconium" 
id="description"/>
            <items value="Description" id="text"/>
         </items>
         <items code="at0062">
            <items value="Colour of meconium" id="description"/>
            <items value="Colour" id="text"/>
         </items>

vs

                ["at0061"] = <
                    description = <"Clinical description of the meconium">
                    text = <"Description">
                >
                ["at0062"] = <
                    description = <"Colour of meconium">
                    text = <"Colour">
                >

B) The files are approximately 2/3'rds the size of the originals. This 
could be reduced further by using a smaller attribute name (e.g. val or 
even v).

C) The Archetypes are much more readable to the average human e.g.

<details>
         <language>
            <terminology_id value="ISO_639-1"/>
            <code_string value="en"/>
         </language>
         <purpose value="To describe body fluids and secretions"/>
         <use/>
         <misuse/>
</details>

vs:

<details>
        <language>
                <terminology_id>
                    <value>ISO_639-1</value>
                </terminology_id>
                <code_string>en</code_string>
            </language>
            <purpose>To describe body fluids and secretions</purpose>
            <use/>
            <misuse/>
</details>

or
     <occurrences>
         <lower_included value="true"/>
         <upper_included value="true"/>
         <lower_unbounded value="false"/>
         <upper_unbounded value="false"/>
         <lower value="1"/>
         <upper value="1"/>
      </occurrences>

vs:

        <occurrences>
            <lower_included>true</lower_included>
            <upper_included>true</upper_included>
            <lower_unbounded>false</lower_unbounded>
            <upper_unbounded>false</upper_unbounded>
            <lower>1</lower>
            <upper>1</upper>
        </occurrences>

4) XSLT Sheet

I have attached a mini-xslt sheet which takes a template or XML 
Archetype & renders it into this fomat.

Run the XSLT with saxon as Xalan....shows how fragile the current 
situation is as it picks up the "pretty-print" chars as text children & 
puts them in where there is no text child except the formatting chars.

5) Summary

i) The present situation/structure is dangerous.
ii) Pretty-print is the norm & even the ADL is pretty printed and has 
adopted a similar method to cope.
iii) The solution simplifies the XML in terms of both processing and 
human readability.
iv) The solution shrinks the file sizes.



Yours

Adam Flinton

**********************************************************************
This message  may  contain  confidential  and  privileged information.
If you are not  the intended  recipient please  accept our  apologies.
Please do not disclose, copy or distribute  information in this e-mail
or take any  action in reliance on its  contents: to do so is strictly
prohibited and may be unlawful. Please inform us that this message has
gone  astray  before  deleting it.  Thank  you for  your co-operation.

NHSmail is used daily by over 100,000 staff in the NHS. Over a million
messages  are sent every day by the system.  To find  out why more and
more NHS personnel are  switching to  this NHS  Connecting  for Health
system please visit www.connectingforhealth.nhs.uk/nhsmail
**********************************************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: setTextAsVal.xslt
Type: text/xml
Size: 1588 bytes
Desc: not available
URL: 
<http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20071129/0dd558cc/attachment.xslt>

Suggestion wrt XML Archetypes & Templates

Reply via email to