Could YAML replace dADL as human readable AOM serialization format?

Seref Arikan Tue, 6 Dec 2011 11:44:25 +0000

A bunch of responses, most of which should actually go to a wiki page for
Bosphorus

I've used binary serialization for AOM because although Eiffel is a very
impressive language, I am not happy about its libraries. Some of them are
mature, but for XML, I could not find anything that'd be guaranteed to be
maintained. Protocol buffers is a technology that is used very heavily in
Google, and has a large community.
Performance is the key aspect of protocol buffers. It is very, very fast.
When I'm exchanging simple messages over ZeroMQ (a very fast queue
framework that is used in Bosphorus) I can achieve microsecond level
performance (not even millisecond!) for Java to Eiffel communication. For
desktop tooling purposes, this is much faster than XML.

You need to instantiate concrete instances of abstract types every time you
use single or multiple attributes in AOM. Both classes descend from
CAttribute. So AOM specification gives you a field with type CAttribute
(abstract), and instances of this type always have either a single or
multiple attribute object assigned to this field. The Eiffel parser creates
an AOM Object when it parses an archetype, On the other side of the bridge,
a Java object awaits to be filled with the data in the Eiffel object. Both
Java and Eiffel know the relationship between these types but protocol
buffers does not have inheritance. So when you're defining a protocol
buffer message with its language, you have a problem: What should be the
type of the field that represents CAttribute? I've had to come up with a
method of handling this case. Someone may use another method and that is my
point: when we have to do these things, they become source of bugs and
obstacles to implementation. So we may benefit from format and readability
of JSON, but the type of issues I've been describing would introduce a lot
more problems than bandwidth efficiency or human friendliness. Hence, my
priorities are slightly different when it comes to what makes a formalism
convenient in openEHR implementation.

With this view: I find XML seriously crippled for OO support, but at least
there is some inheritance support and there is huge tooling and framework
support. My job would be to find ways of walking around issues using these
frameworks. I'd prefer this to having less tooling and less OO support (for
JSON) I can't speak for YAML, but in terms of maturity and support for
mechanisms such as schemas, I'd be surprised if it ends up better than XML.
For XML, I have JAXB, support in JAVA, Python, .NET, you name it...

dADL has the advantage of being designed in a strong openEHR context. I
guess both YAML (based on the feature you've mentioned) and XML can match
dADL to the extend that any required workarounds  could be justified based
on industry adoption. I do not know YAML good enough to compare it in
detail, but I'd love to hear from someone the type of things I've been
sharing here, only with YAML this time instead of JSON and XML.

Given this, if you or someone else thinks that YAML can be an alternative
to dADL, there is nothing stopping anyone than implementing it and using
it. Absolutely nothing. This is what I do. If I think that and XML form of
ADL would help, then I take what is out there (Tom's Eiffel code), use it,
and move on.

I have a feeling that all these discussions about if this or that could
replace dADL are too hypothetical. Most of the time they are academic
discussions. There is nothing wrong with academic discussions, I am doing a
PhD here, but if the openEHR community is spending its time and resources
for academic discussions which do not necessarily connect to real life
implementations in the near term, then I think we have a problem.

Thomas is heroically responding to all queries without judgement, and he is
even implementing a lot of code, to give grounded answers, to provide
proofs. I guess I am not as mature and as dedicated as he is. I'd rather
have him working on adl 1.5 XSD schemas than proving people that openEHR
can do JSON if necessary. Because having XSDs for ADL 1.5 is going to
increase adoption of openEHR a lot more than having JSON output. If anybody
out there does not agree, please come forward and talk about your JSON
usage in your project which is about an actual information system that is
running, or is supposed to run in a clinical setting.

Please do not get me wrong, all the discussion we are having here is
useful, it is just that in my humble opinion, some discussions are more
useful than others if this standard into which I am heavily investing is to
go forward.

Best regards
Seref

On Mon, Dec 5, 2011 at 2:52 PM, Erik Sundvall <erik.sundvall at liu.se> wrote:

> Hi Seref!
>
> On Mon, Dec 5, 2011 at 13:32, Seref Arikan <
> serefarikan at kurumsalteknoloji.com> wrote:
>
>> I'll repeat a point I've tried to make before, since it is relevant in
>> the context of binary serialization.
>> I've used protocol buffers serialization of AOM in Bosphorus
>
>
> Why do you use binary serialization for AOM? (Just curious, I thought text
> formats would cater for most AOM use cases.)
>
> I have not looked deeply into protobuf so I'll take your word on the lack
> of OO support. Looking at http://wiki.apache.org/thrift/ThriftTypes their
> "Structs" also seem to lack inheritance. So I'll try to keep quiet about
> cross-platform binary formats at least until I have tried applying any of
> them to openEHR for real.
>
> ... you'll have to find non standard ways of dealing with the simplicity
>> of the formalism.
>
>
> For JSON I would agree that the formalism is sometimes too simple and one
> may need to make an openEHR specification for how to convey object type
> when needed, perhaps inspired by something like
> - http://flexjson.sourceforge.net/ that adds a "class" attribute or
> - by exploring if introspection of the target object type like
> http://code.google.com/p/google-gson/ does is enough for openEHR data.
>
> Here is the simplest example from Bosphorus: Eiffel is an object oriented
>> language, Java is also an object oriented language. openEHR specs use
>> interitance, which is reflected into type hierarchies of both Eiffel and
>> Java classes. You have the protocol buffers language which does not support
>> inheritance. How do you represent instances of abstract types in protocol
>> buffers?
>
>
> Sorry if I'm dense, but when do you need to instantiate abstract types in
> RM data?
>
> In a way, it is a conceptual distance from OO. Every alternative mentioned
>> here is at a particular position to a particular level of OO support (take
>> it as a point in a multidimensional space). Every alternative has values
>> higher than the rest in a particular dimension, but none of them is
>> absolutely closer to the OO support point (represented by
>> Java/Eiffel/C#/Python etc) In my opinion, without this evaluation of OO
>> support, which is what we use in the actual languages of system
>> development, other discussions are not really relevant. What if protocol
>> buffers are fast? What if YAML, ADL, or JSON are easier to read, space
>> efficient?
>>
>
> Do you bundle YAML and XML into that opinion (lacking of OO-support the
> same way as protobuf)?
>
> Do you think that dADL can carry everything needed for openEHR (both AM
> and RM)? If so why wouldn't YAML? What in basic dADL semantics is missing
> in YAML? YAML (using a !-prefixed syntax) and partly XML (using e.g.
> xsi:Type) have ways of conveying object type in the case it cannot be
> inferred from data.
>
> Maybe I'm being too rigid about this particular issue, but the programming
>> language, its tools and frameworks built on it is what determines industry
>> adoption more than everything else today. I don't think this is being
>> considered in these discussions, but that is just me.
>>
>
> I guess language-specific binary formats (like serialized java objects)
> may be better for binary representation then. Thanks for the word of
> warning regarding protobuf.
>
> Do you think that all openEHR instance serializations really need to be
> "object oriented" themselves or is it enough that the classes of
> the receiving application are object oriented and that the deserialization
> code (or the transfer format) is clever enough to put the data into the
> right objects?
>
> There are some cases where different openEHR datatypes may have the same
> attribute signature and for those cases even transport formats aiming
> reduce verbosity will need to explicitly declare class type since they
> cannot be safely inferred.
>
> Best regards,
> Erik Sundvall
> erik.sundvall at liu.se http://www.imt.liu.se/~erisu/  Tel: +46-13-286733
>
>
>
>
>> On Mon, Dec 5, 2011 at 11:36 AM, Erik Sundvall <erik.sundvall at 
>> liu.se>wrote:
>>
>>> Hi!
>>>
>>> On Mon, Dec 5, 2011 at 00:10, Heath Frankel <
>>> heath.frankel at oceaninformatics.com> wrote:
>>>
>>>> I think previously I had indicated I had no problem with the
>>>> stringified interval approach in XML, but I am reverting my thinking on
>>>> this and feel that it would be counter intuitive for those who what to use
>>>> the XML schemas for code generation purposes.  I think in this case the
>>>> computable requirement outweighs the human readable requirement.
>>>>
>>>
>>> You are probably right regarding XML, and maybe this is valid also for
>>> most JSON use-cases where the desire for an as simple as possible
>>> object-serialization-mapping outweighs human readability.
>>>
>>> I think the openEHR community is best served by having different
>>> archetype serialization format categories with different priorities for
>>> different purposes. E.g.:
>>>
>>> 1a. An XML format optimized for mapping to XML-schema generated code.
>>> 1b. A JSON format optimized for mapping to AOM object models handcrafted
>>> or generated from AOM-specifications.
>>>
>>> 2. A cADL-variant wrapped in YAML optimized for human readability. It
>>> could be used for archetype files stored in version control systems (making
>>> version diffs readable) and as textual format when you need textual
>>> examples in documentation, teaching etc.
>>>
>>> In 1a & 1b easy implementation should be prioritized over readability
>>> but in #2 human readability should be prioritized. Prioritizing both in the
>>> same format would likely fail. Things like default ordering of nodes and
>>> attributes could be recommended but optional for #1 but should be mandatory
>>> for #2 (otherwise readability suffers and diffs get messed up).
>>>
>>> I think we can come up with a much more concise representation of these
>>>> intervals without compromising the computable requirement, something
>>>> similar to XML schema maxOccurs/minOccurs.
>>>>
>>>
>>> Probably, but for #1 maybe being close to the AOM should be prioritized
>>> over being concise. After all, archetypes will not be sent over the wire at
>>> the same scale as patient data (RM instances).
>>>
>>> By the way, is the AOM open for changes (like renaming attributes) if
>>> that would increase clarity?
>>>
>>> If we would change subject and discuss RM instance serialization, then
>>> binary formats (like Protobuf and Thrift) could form a third category where
>>> message size and speed of conversion would be prioritized over ease of
>>> implementation or readability. XML and JSON would likely be good to have
>>> also for interoperability and debugging purposes. YAML for the RM would not
>>> be an obvious "over the wire"-format, but can be very useful for compact
>>> human readable long term EHR archiving storage as plain text files and for
>>> documentation examples.
>>>
>>> Best regards,
>>> Erik Sundvall
>>> erik.sundvall at liu.se http://www.imt.liu.se/~erisu/  Tel: +46-13-286733
>>>
>>>
>>>> please everyone remember that the dADL, JSON and XML generated from AWB
>>>> all currently use the stringified expression of cardinality / occurrences /
>>>> existence. Now, these are usually the most numerous constraints in an
>>>> archetype and if expressed in the orthodox way, take up 6 lines of text,
>>>> hence the giant files (e.g. AOM 1.4 based XML we currently use) - and thus
>>>> the much reduced files you see on Erik's page, because we are using ADL 1.5
>>>> flavoured serialisations not the ADL 1.4 one.
>>>>
>>>>
>>>> Now, I think we should probably go with the stringified form in all of
>>>> these formalisms. The cost of doing this is a small micro-parser, but it is
>>>> the same microparser for everyone, which seems attractive to me.
>>>>
>>>> The alternative that Erik mentioned was more native, but still
>>>> efficient interval expressions, e.g. dADL has it built in (0..* is |>=0| in
>>>> dADL), and YAML and JSON could probably be persuaded to make some sort of
>>>> array of integer-like things be used. XML still doesn't have any such
>>>> support. In theory this approach would be the best if each syntax supported
>>>> it properly, but XML doesn't at all, and the others don't support Intervals
>>>> with unbounded upper limit (i.e. the '*' in '0..*'). *
>>>> *
>>>> But Erik's exercise certainly proved that efficient representation of
>>>> the humble Interval <Integer> is actually worthwhile. (Once again thanks
>>>> for that page, its quite a good way to get a good feel for these syntaxes
>>>> very quickly).*
>>>> *
>>>> - thomas
>>>>
>>>
>
> _______________________________________________
> openEHR-technical mailing list
> openEHR-technical at openehr.org
> http://lists.chime.ucl.ac.uk/mailman/listinfo/openehr-technical
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111206/b74db506/attachment.html>

Could YAML replace dADL as human readable AOM serialization format?

Reply via email to