Dear All,
1) Problem statement
2) Solution
3) Points to Note
4) XSLT Sheet
5) Summary
1) Problem statement
I have been writing an OpenEHR publishing & QA routine which is
basically Ant, which includes running XSLT tasks for the NHS.
There is a problem with the current structure of the XML archetypes &
templates which is that the values are contained as a text() child of an
element & sometimes as the text() child of a value child of the element.
This is dangerous & (IMHO) wrong.
The reasons being that :
A) a single value of that sort should be contained in an attribute.
B) It leads to a world of pain wrt "pretty-print"/indentation.
As an example, XMLSpy will automatically pretty print XML because that
makes it readable to the (human) reader. Equally XSLT sheets often use the
indent="yes"in the output declaration.
<xsl:output method="xml" version="1.0" encoding="utf-8"
indent="yes" />
Firstly it means that what looks like
<rm_type_name>
ELEMENT
</rm_type_name>
is actually:

													ELEMENT
												
As a really quick example of this, get an XML Archetype, open it in
XMLSpy, press save, now open it in the Ocean Archetypes editor.
Admire the way the text now is all over the place and has empty square
boxes for the line endings (i.e. 
)
Now try and save as an ADL.
If you save as an XML, the formatting etc is retained. Basically open up
an xml archetype in XMLSpy, click save and you have a corrupt archetype.
Before people decry pretty printing per se bear in mind that :
i) single long string in not readable
ii) the adl is pretty printed....i.e. your adl files do not come as one
long string but are formatted in much the same way as XML is pretty
printed. The adl takes care of this in basically the same way I am going
to suggest that the XML does ie.
description = <"Clinical description of the meconium">
vs
description = Clinical description of the meconium
or
description =
Clinical description of the meconium
etc.
Any XLST which tries to extract values from the present structure must
engage in code such as:
<xsl:variable name="tab">'	'</xsl:variable>
<xsl:variable name="nl">'
'</xsl:variable>
<xsl:variable name="v_rm_type_name_no_pp"
select="translate(translate($v_rm_type_name/text(),$tab,''),$nl,'')" />
& that in itself is dangerous as some editor might put in some
formatting chars which are not being filtered out.
2) Solution:
Instead of using a text child, any value should go in a value attribute e.g.
<items id="description">
Clinical description of the meconium
</items>
becomes:
<items value="Clinical description of the meconium" id="description"/>
3) Points to Note:
A) The result is actually closer to the adl e.g.
<items code="at0061">
<items value="Clinical description of the meconium"
id="description"/>
<items value="Description" id="text"/>
</items>
<items code="at0062">
<items value="Colour of meconium" id="description"/>
<items value="Colour" id="text"/>
</items>
vs
["at0061"] = <
description = <"Clinical description of the meconium">
text = <"Description">
>
["at0062"] = <
description = <"Colour of meconium">
text = <"Colour">
>
B) The files are approximately 2/3'rds the size of the originals. This
could be reduced further by using a smaller attribute name (e.g. val or
even v).
C) The Archetypes are much more readable to the average human e.g.
<details>
<language>
<terminology_id value="ISO_639-1"/>
<code_string value="en"/>
</language>
<purpose value="To describe body fluids and secretions"/>
<use/>
<misuse/>
</details>
vs:
<details>
<language>
<terminology_id>
<value>ISO_639-1</value>
</terminology_id>
<code_string>en</code_string>
</language>
<purpose>To describe body fluids and secretions</purpose>
<use/>
<misuse/>
</details>
or
<occurrences>
<lower_included value="true"/>
<upper_included value="true"/>
<lower_unbounded value="false"/>
<upper_unbounded value="false"/>
<lower value="1"/>
<upper value="1"/>
</occurrences>
vs:
<occurrences>
<lower_included>true</lower_included>
<upper_included>true</upper_included>
<lower_unbounded>false</lower_unbounded>
<upper_unbounded>false</upper_unbounded>
<lower>1</lower>
<upper>1</upper>
</occurrences>
4) XSLT Sheet
I have attached a mini-xslt sheet which takes a template or XML
Archetype & renders it into this fomat.
Run the XSLT with saxon as Xalan....shows how fragile the current
situation is as it picks up the "pretty-print" chars as text children &
puts them in where there is no text child except the formatting chars.
5) Summary
i) The present situation/structure is dangerous.
ii) Pretty-print is the norm & even the ADL is pretty printed and has
adopted a similar method to cope.
iii) The solution simplifies the XML in terms of both processing and
human readability.
iv) The solution shrinks the file sizes.
Yours
Adam Flinton
**********************************************************************
This message may contain confidential and privileged information.
If you are not the intended recipient please accept our apologies.
Please do not disclose, copy or distribute information in this e-mail
or take any action in reliance on its contents: to do so is strictly
prohibited and may be unlawful. Please inform us that this message has
gone astray before deleting it. Thank you for your co-operation.
NHSmail is used daily by over 100,000 staff in the NHS. Over a million
messages are sent every day by the system. To find out why more and
more NHS personnel are switching to this NHS Connecting for Health
system please visit www.connectingforhealth.nhs.uk/nhsmail
**********************************************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: setTextAsVal.xslt
Type: text/xml
Size: 1588 bytes
Desc: not available
URL:
<http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20071129/0dd558cc/attachment.xslt>