The Truth About XML was: openEHR Subversion => Github move progress

Ian McNicoll Thu, 4 Apr 2013 12:40:40 +0100

Hi Tim,

I will leave the XML vs. ADL aspects for others to discuss and address
your comments re namespacing.


I think we all agree that openEHR artefacts need proper identification
control, almost certainly something like namespacing. Some draft
suggestions have been made and are carried on documentation on the
openEHR website.

Where I would take issue is that this implies some sort of
sociological approach to enforce people to use the archetypes in the
international CKM space. As an editor of these archetypes but also a
consumer, I can assure you that I have no hesitation in using, or
recommending the use of local archetypes wherever and whenever
necessary. The only issue is that people must be aware of potential
name conflicts, which I resolve by adding some kind of simple suffix
to the archetypeID e.g. OBSERVATION.blood_pressure_simi.v1. Of course
namespacing would be a better way of doing this but I see this a a
technical issue and I do not recognise the kind of top-down mentality
that you are suggesting is a blocker.

Of course we would like to develop international archetypes of
sufficient quality and general applicability that developers can pick
these up without needing to redevelop locally. Apart from enhancing
interoperability , there are clear efficiencies in re-using something
that has already been developed. For every suggestion, like yours,
that we are trying to enforce a top-down process, we get twice as many
implementers asking why there is not an international archetype  for
x,y,z. IMO both perspectives are valid.

I think the point re the deficiences in archetype naming is well-made,
is non-contentious and is being addressed. It is easy to work around,
in my experience, and is certainly not a blocker to their inclusion in
real-world implementations. It is absolutely not a blocker to the
development and use of local/ regional or national archetypes where
the international equivalents are felt to be unsuitable, whether for
technical or sociological reasons.

The last 3 projects I have worked on (all major) have successfully
blended international, national and vendor archetypes without any
technical or sociological impediments. If the international ones fit,
use them, if they don't, don't. Over time as the international ones
improve, and the need to interoperate more widely also grows, I would
expect to see the balance change, but it is up to the consumers to
decide, where the balance of benefit lies

You are conflating two quite different issues.

Regards,

Ian

On 4 April 2013 12:09, Tim Cook <tim at mlhim.org> wrote:
> Hi Tom,
>
> On Fri, Mar 29, 2013 at 1:14 PM, Thomas Beale
> <thomas.beale at oceaninformatics.com> wrote:
>
>>> However, you seem to promote the idea that object oriented modelling
>>> is the only information modelling approach[1].
>>> This is a critical failure. The are many ways to engineer software
>>> using many different modelling approaches.
>>>
>
>> I don't see any problem here. The extant open 'reference implementation' of
>> openEHR has been in Java for years now, and secondarily in Ruby (openEHR.jp)
>> and C# (codeplex.com). The original Eiffel prototype was from nearly 10
>> years ago and was simply how I prototyped things from the GEHR project,
>> while other OO languages matured.
>>
>> I am not sure that we have suffered any critical failure - can you point it
>> out?
>
> If you re-read the paragraph you will note that I said that the assumption
> that OO modelling is mandatory, is a critical failure; not any type of
> language count.
>
>
>>> well, since the primary openEHR projects are in Java, Ruby, C#, PHP, etc, I
>>> don't see where the disconnect between the projects and the talent pool is.
>>> I think if you look at the 'who is using it' pages, and also the openEHR
>>> Github projects, you won't find much that doesn't connect to the mainstream.
>
> The discussion about talent pool is about the data representation and
> constraint languages.
> XML and ADL. The development languages are common across the application 
> domain.
> I know that you believe that ADL is superior because it was designed
> specifically to support
> the openEHR information model. It is an impressive piece of work, but
> this is where its value falls off.
> XML has widespread industry acceptance and plethora of development and
> validation tools against a global
> standard.
>
>
>>
>> <NB: in the below I am talking about the industry standard XSD 1.0, not the
>> 9-month old XML Schema 1.1 spec>
>
> The industry standard XML Schema Language is 1.1. The first draft was
> published in April 2004
> making it nine years old,
>
>
>> well I don't really have anything to add to any of that. For the moment,
>> industry (including openEHR, which publishes XSDs for all its models for
>> years now) is still using XML, although one has to wonder how long that will
>> go on.
>
> A curious prognostication indeed.
>
>
>> But XML schema as an information modelling language has been of no serious
>> use, primarily because its inheritance model is utterly broken. There are
>> two competing notions of specialisation - restriction and extension.
>
> Interesting.  I believe that the broader industry sees them as
> complimentary, not competing.
>
>> Restriction is not a tool you can use in object-land because the semantics
>> are additive down the inheritance hierarchy, but you can of course try and
>> use it for constraint modelling.
>
> Restriction, as its name implies, is exactly intended and very useful
> for constraint modelling.
> Constraint modelling by restriction is, as you know, the corner-stone
> of multi-level modelling.
> Not OO modelling. Which is, of course, why openEHR has a reference
> model and a constraint model.
> They are used for the two complimetary aspects of multi-level modelling.
>
>> Although it is generally too weak for
>> anything serious, and most projects I have seen going this route eventually
>> give in and build tools to interpolate Schematron statements to do the job
>> properly. Now you have two languages, plus you are mixing object (additive)
>> and constraint (subtractive) modelling.
>>
>
> Those examples you are referring to are not using XML Schema 1.1.
> Or at least not in its specified capacity. There is no longer a need
> for RelaxNG or Schematron to be mixed-in.
> Your information on XML technologies seems to be quite a bit out of date.
>
>
>> Add to this the fact that the inheritance rules for XML attributes and
>> Elements are different, and you have a modelling disaster area.
>>
>
> I will confess that XML attributes are, IMHO, over used and inject
> semantics into a model
> that shouldn't be there.  For example HL7v3 and FHIR use them extensively.
>
>
>> James Clark, designer of Relax NG, sees inheritance in XML as a design flaw
>> (from http://www.thaiopensource.com/relaxng/design.html#section:15 ):
>
> Of course! But then you are referencing an undated document by the
> author of a competing/complimentary tool,
> that is announcingannounces RelaxNG as new AND its most recent
> reference is 2001.
> So, my guess is that it is at least a decade old. Hardly a valid opinion 
> today.
>
>
>>
>> Difficulties in using type restriction (i.e. subtyping) in XSD seem
>> well-known - here. Not to mention the inability to deal with generic types
>> of any kind, e.g. Interval<Date>, necessitating the creation of numerous
>> fake types.
>>
>
> Hmmmmm, what is wrong with xs:duration?
> I don't think I understand what you mean by "fake types".
>
>
>> And of course, the underlying inefficiency and messiness of the data are
>> serious problems as well. Google and Facebook (and I think Amazon) don't
>> move data around internally in XML for this reason.
>
> That is kind of vague. Can you expand on this?
> The fact taht any other domain does or doesn't use XML is really pretty
> irrelevant to multi-level modelling in healthcare. I am comfortable in 
> assuming
> that none of them use ADL for anything. So the comparison is quite the
> red-herring.
> I think that limiting the conversation to multi-level modelling in
> healthcare is
> an appropriate approach. Otherwise, it is kind of pointless.
>
>
>> None of this is to say that XML or XML-schema can't be 'used' - I don't know
>> of any product or project in openEHR space that doesn't use it somewhere,
>> and of course it's completely ubiquitous in the general IT world. What I am
>> saying here is that the minute you try to express your information model
>> primarily in XSD, you are in a world of pain.
>
>
> I will admit that expressing the MLHIM information model in XML Schema
> 1.1 terms
> and then developing the actual implementation was a challenge at first.
> But if you take a look today you will see that it is quite easy to understand,
> standards compliant and fully functional.
> The original challenge was to overcome my predjudice against XML.
>
>
>> My lessons from projects using XSD are:
>>
>> XSDs are good for one thing: describing the contents of XML documents.
>> That's it.
>
> Seems to be a pretty useful goal if you have XML documents that
> contain your data.
>
>
>>
>> but what we need are models that can describe data, software, documents,
>> documentation, interfaces, etc
>>
>
> But these are all VERY different artifacts and require different
> models, tools and langauges.
>
>
>
>> get imported data out of XML as soon as possible, and into a tractable
>> computational formalism
>
> Very much like get data out of ADL as soon as possible. Once you build
> some tools to do that.
>
>> treat XSDs as interface specifications, to be generated from the underlying
>> primary information models, not as any kind of primary expression in their
>> own right
>> Define XSDs with as little inheritance as possible, avoid subtyping, i.e.
>> define types as standalone, regardless of the duplication.
>
> I am not sure you understand XML Schema 1.1. Again you seem to be
> approaching multi-level
> modelling in healthcare as if OO modelling were the only choice. This
> is the "I have a hammer,
> everything is a nail" approach. It isn't very effective in the real
> world where various tools
> are need to solve various problems.
>
>
>> Maximise the space optimisation of the data, no matter what it takes. It
>> usually requires all kinds of tricks, heavy use of XML attributes, structure
>> flattening from the object model and so on. If you don't do this, any XML
>> data storage or will cost twice what it should and web services using XML be
>> horribly slow.
>
> So, in your opinion, should you build your APIs in ADL?  Of course not.
> I fail to see your arguments against using XML for what it is designed for;
> data representation and constraint modelling. Of course then you have all of
> the related tools such as XSLT, SOAP, WSDL, XSLT, XPath, XQuery, etc.
> for other tasks.
> A fairly complete suite. There isn't a real, practical reason to
> re-invent them. They make the interactions
> smoother and easier understood by the IT industry as compared to using
> a domain specific language.
>
>
>> I know there are all kinds of tricks to mitigate these problems, I've seen a
>> lot of them. The fact that there is a mini-tech sector around XSD problem
>> mitigation / optimisation testifies to the difficulty of this technology.
>>
>
>
> I do not consider anything inside the specification to be a "trick".
> It seems pretty straight forward to me.  Use cases were presented,
> solutions specified and documented and industry adopted.  What is
> tricky about that?
>
>
>> XML Schema 1.1 introduces useful things that may reduce some of the above
>> problems (good overview here), however as far as I can tell, its inheritance
>> model is not much better than XSD 1.0 (although you can now inherit
>> attributes properly, so that's good).
>
> It is not an OO language. If you are judging it based on those
> characteriscs; please
> see my critical failure comment above. OO is not the be all, end all
> solution in
> computer science. Much less, in multilevel modeling.
>
>
>> well I guess the main thing is seamlessness between your information model
>> and your programming model view. I am not saying it's the only way, but the
>> approach in openEHR was oriented towards making sure that expressions of the
>> information model, including all its semantics, are as close as possible to
>> the software developer's programming model. If we had done the primary
>> specifications in XML, there would always be a significant disconnect
>> between the models and the software (actually, the specs would have been
>> nearly impossible to write). Not to mention, life would be hard for working
>> with all the other data formats now in use, including JSON, and various
>> binary formats.
>
> At the point in time when you developed ADL, you had no choice. In the
> late 1990s, XML Schema 1.0 was broken.
> It was only slightly better than using DTDs. But the IT industry
> advances very rapidly.
> Keeping in touch with technology changes is crucial.
>
>
>> An approach that has emerged in industrial openEHR systems in the last few
>> years is to generate message XSDs from templates - 1 XSD per template, and
>> write a generic XML <=> canonical data conversion gateway. This means we can
>> do all modelling in powerful formalisms like UML
>> 2, EMF/Ecore (for the
>> information models) and all constraint modelling in ADL / AOM 1.5, and treat
>> XML as one possible data transport.
>
> EMF/Ecore will be nice when they finally can generate standards
> compliant XSDs without
> Ecore cruft in them. At this point, once you use Ecore, you are
> infected and can't leave.
> I really wanted to use EMF for MDD. Maybe, someday.
>
>> From what I can see, the major direction in information modelling for the
>> future will be Eclipse Modelling Framework, using Ecore-based models. This
>> is where I think the computational expression of openEHR's Reference Model
>> will move to. The OHT Model Driven Health Tools (MDHT) project is already
>> showing the way on this, at the same time adopting ADL 1.5 concepts for
>> constraint modelling.
>
> (see above comment)
>
>>
>> I have no experience with XSD 1.1, and I think it will be years before
>> mainstream industry catches up with it. But it may be that it does what is
>> needed.
>
> Well, I can't predicate how long it will take it to be used on a broader 
> basis.
> Probably, like most things, as people need the capability. Sometimes,
> people resist change.
> It takes them outside their comfort zone and they don't like it.
>
> I can tell you that XML Schema 1.1 is very functional.  It is
> supported by open source and proprietary
> tools and it is working quite well, without tricks, in MLHIM.
>
>
>> we'll obviously differ on our analysis of what is the best modelling
>> formalism. The above are the conclusions I have come to over the years.
>
> I am not looking for "the best" modelling formalism. I am looking for
> what works and is simple
> to implement in order to move forward the main and necessary concept
> of multi-level modelling
> so that we can solve the semantic interoperability issue between
> healthcare applications from purpose
> specific mobile apps to enterprise systems.
>
>
>> Others may have other, better ideas, and it may be that an XSD 1.1 modelling
>> effort in openEHR could make sense.
>>
>> I think the key thing would have been to ensure that the archetypes could be
>> shared across openEHR and MLHIM. Archetypes are pretty widely used these
>> days, and there are many projects now creating them. I don't know if this is
>> still possible; if not, it presents clinicians with the dilemma: model in
>> ADL/AOM, or model in MLHIM? Replicated models aren't fun to maintain...
>>
>
> I am not sure that there is any requirement for mapping.
>
> While there are a number of people producing openEHR archetypes.
> AFAICT tell there are only a dozen or so
> that are in compliance with the openEHR specifications.
> Specifically the "Knowledge Artefact Identification" document.
> To address a couple of issues I have with the current openEHR eco-system:
>
> Section 2.2 says:
> "It is possible to define an identification scheme in which either or
> both ontological and machine iden-
> tifiers are used. If machine identification only is used, all human
> artefact 'identification' is relegated
> to meta-data description, such as names, purpose, and so on. One
> problem with such schemes is that
> meta-data characteristics are informal, and therefore can clash ?
> preventing any formalisation of the
> ontological space occupied by the artefacts. Discovery of overlaps and
> in fact any comparative fea-
> ture of artefacts cannot be formalised, and therefore cannot be made
> properly computable."
>
>
> I will argue that UUIDs are very definitely "computable"; without ambiguity.
> Metadata characteristics are very definitely formalized and have been
> since at least 1995 (DCMI) and has been an
> ISO standard since at least 2003. Therefore this paragraph is
> inaccurate in the description of the usefulness
> of machine processable identifiers and using metadata for formal descriptions.
>
>
> Section 3.1 says:
> "The general approach for identifying source artefacts is with an
> ontological identifier, prefixed by a
> namespace identifier if the artefact is managed within a Publishing
> Organisation or in some other pro-
> duction environment. The lack of a namespace in the identifier
> indicates an ad hoc, uncontrolled arte-
> fact, but its presence does not guarantee any particular kind of
> ?control? - quality can only be inferred
> if the PO is accredited with a central governance body as using a
> minimum quality process."
>
> As far as I can find, there is no reference as to what this
> accreditation process looks like or who it is managed
> by outside of the openEHR CKM; that may be deployed locally. However ...
>
> Section 3.2 says:
> "Note that the name_space_id is constructed from a publisher
> organisation identifier plus at least one
> level of library/package identification. The latter condition ensures
> that a PO that starts with only one
> ?library? can always evolve to having more than one.
> All archetypes and templates should be identified with this style of
> identifier. Any archetype or tem-
> plate missing the name_space_id part is deemed to be an uncontrolled
> artefact of unknown quality."
>
>
> Archetypes on the NEHTA CKM http://dcm.nehta.org.au/ckm/ carry only
> the openEHR RM namespace.
> So, are therefore uncontrolled and of unknown quality; by openEHR definition.
>
> There have been, in the past, archetypes that carried a nhs-dev
> designator.  I can't find them now. But they were
> obviously in development and not deployed. If you browse the internet
> looking for openEHR archetypes you can find
> hundreds, maybe thousands.  This shows that there are people
> interested in building knowledge models.
>
> They just aren't interested in the top-down, consensus controlled,
> openEHR approach. This creates a chaotic, dangerous
> environment for healthcare data. There can easily be multiple
> archetypes with the same ID that have different
> structures and therefore different instance data. Each instance of
> data will not be able to determine which of
> the competing archetypes it is supposed to be validated against
> without unambiguos identification. I believe that
> Dr. Dipak Kalra used the word "unacceptable" when Bert Verhees
> confronted him with this issue on "Healthcare IT
> Live!" in December 2012: http://goo.gl/UP2Z1
>
> The openEHR eco-system is well engineered. It just isn't
> sociologically acceptable. People want to be free to
> design their concept models without top-down consensus. MLHIM allows
> that with industry standard, off the shelf
> tooling.
>
>
> XML Schema 1.1, Concept Constraint Definitions and MLHIM; "Try it,
> You'll like it".
> http://gplus.to/MLHIM and http://gplus.to/MLHIMComm for more
> information. You may also enjoy the website
> at www.mlhim.org and the GitHub point at https://github.com/mlhim
>
> Also, be sure to enjoy Healthcare IT Live! on YouTube
> https://www.youtube.com/watch?v=HG7rRPT9KY0&list=PL5BDmBjSV7CsBYbzNBw-D03WEqSJcWxbP
> where our guest today is Mr. Alex Fair, CEO of MedStartr.com
> "Crowd-funding for healthcare."
> https://plus.google.com/events/cof6sdrpjll3ca3stp0440k6ihc
> DATE: 2013-04-04    1900 BRT, 2200 UTC, 1800 EDT
> Local time and date finder: http://goo.gl/orcJU
> You must RSVP "Yes" to receive a panel invitation to the hangout.
>
>
> Regards,
> Tim
>
>
> ============================================
> Timothy Cook, MSc           +55 21 94711995
> MLHIM http://www.mlhim.org
> Like Us on FB: https://www.facebook.com/mlhim2
> Circle us on G+: http://goo.gl/44EV5
> Google Scholar: http://goo.gl/MMZ1o
> LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
>
> _______________________________________________
> openEHR-clinical mailing list
> openEHR-clinical at lists.openehr.org
> http://lists.openehr.org/mailman/listinfo/openehr-clinical_lists.openehr.org



-- 
Dr Ian McNicoll
office +44 (0)1536 414 994
fax +44 (0)1536 516317
mobile +44 (0)775 209 7859
skype ianmcnicoll
ian.mcnicoll at oceaninformatics.com

Clinical Modelling Consultant, Ocean Informatics, UK
Director openEHR Foundation  www.openehr.org/knowledge
Honorary Senior Research Associate, CHIME, UCL
SCIMP Working Group, NHS Scotland
BCS Primary Health Care  www.phcsg.org

The Truth About XML was: openEHR Subversion => Github move progress

Reply via email to