Suggestion wrt XML Archetypes & Templates

Thomas Beale Tue, 11 Dec 2007 13:35:47 +0000

Adam Flinton wrote:
>>     
>
> I would like though to enquire wrt the rationale of containing _id info 
> in a separate <value/> element.
>
> If you are being consistent
> instead of :
>
>        <terminology_id>
>            <value>ISO_639-1</value>
>        </terminology_id>
>
> it should be simply:
>
>        <terminology_id>ISO_639-1</terminology_id>
>
>
> or <terminology_id value="ISO_639-1"/>
>
>   
Adam,
when you say it 'should' be - either pulled up a level, with an object 
attribute removed OR represented as an XML attribute - what is the 
driver? Is it semantic (you think there is something wrong with the 
reprsentation of the object structure defined by the specification) or 
is it to do with space/signal-to-noise (using one of the last two 
methods uses less characters)?


The way it currently is is due to a direct machine-performed object 
serialisation process - in other words, it simply follows the same rules 
for transforming any object data into XML. Your suggestion (I presume) 
is a special case of the general idea of representing all so-called 
basic types (Strings, Integers, dates etc) as XML attributes rather than 
as XML elements. But we have already just discussed and agreed that long 
text strings (especially containing unicode, backslash quoting and 
whitespace) should be XML elements.

As I have said before, what I think is most important is regular 
encoding from data to and from XML, so that a) software is as simple and 
clean as possible and b) changes are not needed due to particular 
content (i.e. data). Now, ideally we would minimise use of bandwidth / 
space with the representation as well. The problem is that XML is pretty 
poorly designed for efficiently representing data, and has a poor signal 
to noise ratio...making data serialise in a way that is either 'more 
aesthetic' or smaller always implies more complex software containing 
exceptional rules. Further, although XML isn't well designed for data 
representation, in its original design, 'attributes' were intended for 
meta-data items, rather than 'data'. Whether this semantic needs to be 
retained in the XML we are talking about here is a question.

So the question is: at what level do we include exceptional processing 
to reduce space wastage, since this complicates the software? How much 
do we compromise the intended semantics of XML, where attributes are 
designed for holding meta-data (including real meta-data, e.g. things 
like xsi:TYPE etc)?

Any idea of saving space has to be done on the basis of a study of high 
volumes of representatively diverse data. Saving 10 bytes is not 
interesting, but saving 10Gb/minute in a large data processing system 
is. I will go out on a limb and say that 'style' has no place in good 
engineering, only good engineering does - correctness, performance, 
maintainability etc.

With all that in mind - if the community wants to make the appropriate 
analysis of data and propose a more space-efficient schema, I am not 
against it. But the needs of correctness (= patient safety) must be 
satisfied.

- thomas beale

Suggestion wrt XML Archetypes & Templates

Reply via email to