Archetype documentation using XML + XSLT

Adam Flinton Fri, 18 Apr 2008 11:31:40 +0100

Thomas Beale wrote:
> Adam Flinton wrote:
>   
>>   
>>     
>>> Other limitations on using XML - it's a no-show for enterprise scale 
>>> databases 
>>> or information processing. All that wasted space starts to count when you 
>>> have 
>>> to buy two ?20,000 high availability RAID disk arrays instead of one....and 
>>> plus 
>>> the bandwidth wastage when there are millions of messages rather than just 
>>> a 
>>> few. Yes, binary compression helps, but it just shifts disk and bandwidth 
>>> loss 
>>> to the CPU. There are many better ways to represent data for large-scale 
>>> deployments than XML (even the dADL syntax from ADL does 100% better in 
>>> space, 
>>> and represents all object-oriented constructs unambiguously).
>>>   
>>>     
>>>       
>> You have got to be kidding me on this one.
>>
>>   
>>     
> no kidding. You just have to do the maths. 2 years ago I was in a CfH 
> meeting where it was made clear that the volumetrics on the projected 
> number x size of HL7v3 prescription (and related) messages was going to 
> blow out the budget for telecomms expenditure by over 50%. Database 
> volume estimates made by Oracle for the Spine on the basis of whole of 
> UK, HL7 XML messages were shown to be simply uneconomic. In our own EHR 
> product we have had to resort to various kinds of compression, which 
> impact on performance, and we have to have larger disc arrays than would 
> be needed if the data were represented in a more efficient way. Our 
> engineers are currently looking at replacing XML altogether in the 
> persistence layer.
>
>   
You are mistaking a format for a design. I agree entirely that HL7 is 
vastly inflated. However that is in part mostly down to the huge levels 
of duplicate information. e.g. simply to send a "hello world" message in 
HL7 you need an astonishing array of elements/objects etc.


For a laugh create a blank CDA document carrying the text "hello world" 
& see how large it is.

But that is nothing to do with XML per se as I could create one as small as

<a b="hello world"/>

What is more you are being a little disingenuous on this as when I 
proposed using a value attribute to hold element values rather than an 
element's text child you were not very keen despite the fact that doing 
so shrunk the file sizes by 1/3rd & lead to greater consistency in 
accessing such values.

There are a variety of design patterns on XML such as the use of 
attributes, the careful layout wrt order of elements etc which can 
reduce file sizes and increase speed of data access. i.e. if you want 
simply some routing information from a message so as to direct it to the 
right place (& that's all you want), that information goes as one of the 
first structures (e.g. an attribute on the root element) & you use a 
streaming parser such as Sax.

Wrt file sizes per se though you're doomed as it flies in the face of 
progress e.g. medical imaging used to be a 2D Xray etc. Now it's a 3d 
multi-slice cat scan in vast detail.

Wrt ETP in particular.....I held views on that which stemmed from my 
time in "big retail" wrt the only "medical" part of the entire thing was 
the initial choice of the clinician wrt what drug. After that it was a 
std stock control/management issue & frankly I used to try & hide when 
clinicians/clinical IT types were there telling the likes of the large 
supermarkets how to do stock control.

> Of course organisations that have unlimited budgets may not notice this, 
> but for everyone else it's important.
>   
>> Having done XML messaging in very large retail systems (major
>> supermarket chains in the EU & US), mobile phone systems, home
>> office/criminal justice system & now the CFH....you simply have got to
>> be kidding.
>>   
>>     
> it's not the size of the organisation that matters, it's the amount of 
> data generated and rate of data generation. The more data (and users), 
> the more disk space / bandwidth / CPU (if using 
> compression/decomprssion) you need. Fact of life - if your data gets big 
> enough, you will hit a wall just using XML.
>
>   
Fact of life - if your data gets big enough, you will hit a wall period.

2 quick examples:

A) Medical imaging - how will the current network cope with CAT scans 
etc being transmitted let alone stored? But hey these are binary files 
not XML so from what you say they'll be no problem

B) Codesets. I created a system called TRUD  ( 
http://www.google.co.uk/search?hl=en&q=CFH+TRUD&btnG=Google+Search&meta=  
) whereby we distribute our codesets.

The management is done via XML (i.e. you register & have to have a soap 
server capable of receiving update notifications. You then have various 
stages  which you as a user pass through

 > New update  available & it's here  URL
< OK Downloading now
<OK Downloaded
< OK Processing
< OK Processed
< OK ready to go 

etc. until all subscribers are ready to go at which point:

 > Go live on datetime
< OK will do.

We are talking about large files & we had to set up a completely 
different network system of ftp servers etc to cope with the download 
demands

So what would you suggest? Shrink Snomed? Distribute each update to 
every doctor's surgery? or.....produce some central servers who (using 
verbose etc but std) .XML serve up the valuesets etc on demand?

So to recap...
A) HL7 is verbose. Don't confuse that with XML per se.
B) Large date volumes are a fact of life wrt medical IT. Get used to it 
& plan how to mitigate it (e.g. wrt Trud/snomed/terminologies per se).


>> ummmmmm.......where to even start....oh yes how about...
>>
>> "XML is a very commonly used standard with thousands of tools in
>> existence from routing engines, to processing engines to parsers to
>> database layers ...."
>>
>> or maybe
>>
>> "XML is THE standard in enterprise level messaging systems with
>> standards such as SOAP, EbXML, OAGI BODS etc.etc."
>>
>> or maybe :
>>
>> "XML integrates easily with existing web infrastructures by use of such
>> mechanisms as AJAX, Rest JSON etc".
>>
>>
>> Now wrt ADL.....
>>
>> Adam
>>
>>   
>>     
> Although the above are just marketing statements, the standards do 
> indeed exist - how else would industry even start to cope with XML? My 
> point is that it is so horribly ineffcient that that there is a general 
> need for solutions (and these are emerging) to the concomitant costs of 
> very large amounts of data. That's why all the binary XML work and work 
> on alternative representations - once you get over a certain size, you 
> have to use something else, or a serious compression approach.
>
>   
Zip works extremely well due to to the repetitive nature of XML (esp if 
you tune the window size). e.g. a repeated string of 
<AVeryLongElementName  can be compressed to a few bytes in the zip index.

Either way the file size is actually down to a conjunction of the amount 
of data to be sent and the design of the document type which is required 
to hold that data.

e.g. at one stage we looked at various means of slimming down the 
messages & the top one wrt HL7 was to have a templated instance which 
contained that information/structures which would not change from 
message to message & then an "on the wire" message which only contained 
that information/structure which would/could change. Upon arrival the 
sink system could either use that message as is or if it required a 
"complete" HL7 message it could stitch the 2 together & then process the 
"whole" message.

This resulted in massive message size decreases but.......no one ever 
said that implementer's concerns were any concern of HL7.....so we have 
instead pushed the new XML ITS (XML ITS R2) which includes folding etc 
which includes many of these ideas.

However the fact remains that if you have 10 mb of data to send then 
neither ADL nor XML will reduce that.

Wrt persistent storage Ronaold Bourret & I wrote the first usable XML 
<JavaObjects>SQL engine called XML-DBMS back in Y2K

http://www.rpbourret.com/xmldbms/

It is bi-directional (you can create SQL structures to store existing  
XML structures or create XML from existing SQL structures)  & formed the 
basis for the DB2 & SQL server XML storage engines & influenced the 
Oracle one.

http://sourceforge.net/projects/xmldbms/

You will notice I am still a project admin though I have not done much 
with it for a while now.

Ron went on to write the IBM RedBook on the issue:

http://www.redbooks.ibm.com/abstracts/sg246994.html

It is worth a read.


> - thomas
>
> BTW you only need one good open source parser for any language on each 
> platform. I'm not quite sure of the value of having 20 competing ones, 
> all with different sets of bugs, maintainers and conformance statements....
>
>   

Which is why we use multiple schema checking engines in our schema checker.

Adam

**********************************************************************
This message  may  contain  confidential  and  privileged information.
If you are not  the intended  recipient please  accept our  apologies.
Please do not disclose, copy or distribute  information in this e-mail
or take any  action in reliance on its  contents: to do so is strictly
prohibited and may be unlawful. Please inform us that this message has
gone  astray  before  deleting it.  Thank  you for  your co-operation.

NHSmail is used daily by over 100,000 staff in the NHS. Over a million
messages  are sent every day by the system.  To find  out why more and
more NHS personnel are  switching to  this NHS  Connecting  for Health
system please visit www.connectingforhealth.nhs.uk/nhsmail
**********************************************************************

Archetype documentation using XML + XSLT

Reply via email to