Thomas Beale wrote: > Adam Flinton wrote: > >> >> >>> Other limitations on using XML - it's a no-show for enterprise scale >>> databases >>> or information processing. All that wasted space starts to count when you >>> have >>> to buy two ?20,000 high availability RAID disk arrays instead of one....and >>> plus >>> the bandwidth wastage when there are millions of messages rather than just >>> a >>> few. Yes, binary compression helps, but it just shifts disk and bandwidth >>> loss >>> to the CPU. There are many better ways to represent data for large-scale >>> deployments than XML (even the dADL syntax from ADL does 100% better in >>> space, >>> and represents all object-oriented constructs unambiguously). >>> >>> >>> >> You have got to be kidding me on this one. >> >> >> > no kidding. You just have to do the maths. 2 years ago I was in a CfH > meeting where it was made clear that the volumetrics on the projected > number x size of HL7v3 prescription (and related) messages was going to > blow out the budget for telecomms expenditure by over 50%. Database > volume estimates made by Oracle for the Spine on the basis of whole of > UK, HL7 XML messages were shown to be simply uneconomic. In our own EHR > product we have had to resort to various kinds of compression, which > impact on performance, and we have to have larger disc arrays than would > be needed if the data were represented in a more efficient way. Our > engineers are currently looking at replacing XML altogether in the > persistence layer. > > You are mistaking a format for a design. I agree entirely that HL7 is vastly inflated. However that is in part mostly down to the huge levels of duplicate information. e.g. simply to send a "hello world" message in HL7 you need an astonishing array of elements/objects etc.
For a laugh create a blank CDA document carrying the text "hello world" & see how large it is. But that is nothing to do with XML per se as I could create one as small as <a b="hello world"/> What is more you are being a little disingenuous on this as when I proposed using a value attribute to hold element values rather than an element's text child you were not very keen despite the fact that doing so shrunk the file sizes by 1/3rd & lead to greater consistency in accessing such values. There are a variety of design patterns on XML such as the use of attributes, the careful layout wrt order of elements etc which can reduce file sizes and increase speed of data access. i.e. if you want simply some routing information from a message so as to direct it to the right place (& that's all you want), that information goes as one of the first structures (e.g. an attribute on the root element) & you use a streaming parser such as Sax. Wrt file sizes per se though you're doomed as it flies in the face of progress e.g. medical imaging used to be a 2D Xray etc. Now it's a 3d multi-slice cat scan in vast detail. Wrt ETP in particular.....I held views on that which stemmed from my time in "big retail" wrt the only "medical" part of the entire thing was the initial choice of the clinician wrt what drug. After that it was a std stock control/management issue & frankly I used to try & hide when clinicians/clinical IT types were there telling the likes of the large supermarkets how to do stock control. > Of course organisations that have unlimited budgets may not notice this, > but for everyone else it's important. > >> Having done XML messaging in very large retail systems (major >> supermarket chains in the EU & US), mobile phone systems, home >> office/criminal justice system & now the CFH....you simply have got to >> be kidding. >> >> > it's not the size of the organisation that matters, it's the amount of > data generated and rate of data generation. The more data (and users), > the more disk space / bandwidth / CPU (if using > compression/decomprssion) you need. Fact of life - if your data gets big > enough, you will hit a wall just using XML. > > Fact of life - if your data gets big enough, you will hit a wall period. 2 quick examples: A) Medical imaging - how will the current network cope with CAT scans etc being transmitted let alone stored? But hey these are binary files not XML so from what you say they'll be no problem B) Codesets. I created a system called TRUD ( http://www.google.co.uk/search?hl=en&q=CFH+TRUD&btnG=Google+Search&meta= ) whereby we distribute our codesets. The management is done via XML (i.e. you register & have to have a soap server capable of receiving update notifications. You then have various stages which you as a user pass through > New update available & it's here URL < OK Downloading now <OK Downloaded < OK Processing < OK Processed < OK ready to go etc. until all subscribers are ready to go at which point: > Go live on datetime < OK will do. We are talking about large files & we had to set up a completely different network system of ftp servers etc to cope with the download demands So what would you suggest? Shrink Snomed? Distribute each update to every doctor's surgery? or.....produce some central servers who (using verbose etc but std) .XML serve up the valuesets etc on demand? So to recap... A) HL7 is verbose. Don't confuse that with XML per se. B) Large date volumes are a fact of life wrt medical IT. Get used to it & plan how to mitigate it (e.g. wrt Trud/snomed/terminologies per se). >> ummmmmm.......where to even start....oh yes how about... >> >> "XML is a very commonly used standard with thousands of tools in >> existence from routing engines, to processing engines to parsers to >> database layers ...." >> >> or maybe >> >> "XML is THE standard in enterprise level messaging systems with >> standards such as SOAP, EbXML, OAGI BODS etc.etc." >> >> or maybe : >> >> "XML integrates easily with existing web infrastructures by use of such >> mechanisms as AJAX, Rest JSON etc". >> >> >> Now wrt ADL..... >> >> Adam >> >> >> > Although the above are just marketing statements, the standards do > indeed exist - how else would industry even start to cope with XML? My > point is that it is so horribly ineffcient that that there is a general > need for solutions (and these are emerging) to the concomitant costs of > very large amounts of data. That's why all the binary XML work and work > on alternative representations - once you get over a certain size, you > have to use something else, or a serious compression approach. > > Zip works extremely well due to to the repetitive nature of XML (esp if you tune the window size). e.g. a repeated string of <AVeryLongElementName can be compressed to a few bytes in the zip index. Either way the file size is actually down to a conjunction of the amount of data to be sent and the design of the document type which is required to hold that data. e.g. at one stage we looked at various means of slimming down the messages & the top one wrt HL7 was to have a templated instance which contained that information/structures which would not change from message to message & then an "on the wire" message which only contained that information/structure which would/could change. Upon arrival the sink system could either use that message as is or if it required a "complete" HL7 message it could stitch the 2 together & then process the "whole" message. This resulted in massive message size decreases but.......no one ever said that implementer's concerns were any concern of HL7.....so we have instead pushed the new XML ITS (XML ITS R2) which includes folding etc which includes many of these ideas. However the fact remains that if you have 10 mb of data to send then neither ADL nor XML will reduce that. Wrt persistent storage Ronaold Bourret & I wrote the first usable XML <JavaObjects>SQL engine called XML-DBMS back in Y2K http://www.rpbourret.com/xmldbms/ It is bi-directional (you can create SQL structures to store existing XML structures or create XML from existing SQL structures) & formed the basis for the DB2 & SQL server XML storage engines & influenced the Oracle one. http://sourceforge.net/projects/xmldbms/ You will notice I am still a project admin though I have not done much with it for a while now. Ron went on to write the IBM RedBook on the issue: http://www.redbooks.ibm.com/abstracts/sg246994.html It is worth a read. > - thomas > > BTW you only need one good open source parser for any language on each > platform. I'm not quite sure of the value of having 20 competing ones, > all with different sets of bugs, maintainers and conformance statements.... > > Which is why we use multiple schema checking engines in our schema checker. Adam ********************************************************************** This message may contain confidential and privileged information. If you are not the intended recipient please accept our apologies. Please do not disclose, copy or distribute information in this e-mail or take any action in reliance on its contents: to do so is strictly prohibited and may be unlawful. Please inform us that this message has gone astray before deleting it. Thank you for your co-operation. NHSmail is used daily by over 100,000 staff in the NHS. Over a million messages are sent every day by the system. To find out why more and more NHS personnel are switching to this NHS Connecting for Health system please visit www.connectingforhealth.nhs.uk/nhsmail **********************************************************************

