Could YAML replace dADL as human readable AOM serialization format?

Erik Sundvall Mon, 5 Dec 2011 15:52:34 +0100

Hi Seref!

On Mon, Dec 5, 2011 at 13:32, Seref Arikan <
serefarikan at kurumsalteknoloji.com> wrote:
>
> I'll repeat a point I've tried to make before, since it is relevant in the
> context of binary serialization.
> I've used protocol buffers serialization of AOM in Bosphorus



Why do you use binary serialization for AOM? (Just curious, I thought text
formats would cater for most AOM use cases.)

I have not looked deeply into protobuf so I'll take your word on the lack
of OO support. Looking at http://wiki.apache.org/thrift/ThriftTypes their
"Structs" also seem to lack inheritance. So I'll try to keep quiet about
cross-platform binary formats at least until I have tried applying any of
them to openEHR for real.

... you'll have to find non standard ways of dealing with the simplicity of
> the formalism.


For JSON I would agree that the formalism is sometimes too simple and one
may need to make an openEHR specification for how to convey object type
when needed, perhaps inspired by something like
- http://flexjson.sourceforge.net/ that adds a "class" attribute or
- by exploring if introspection of the target object type like
http://code.google.com/p/google-gson/ does is enough for openEHR data.

Here is the simplest example from Bosphorus: Eiffel is an object oriented
> language, Java is also an object oriented language. openEHR specs use
> interitance, which is reflected into type hierarchies of both Eiffel and
> Java classes. You have the protocol buffers language which does not support
> inheritance. How do you represent instances of abstract types in protocol
> buffers?


Sorry if I'm dense, but when do you need to instantiate abstract types in
RM data?

In a way, it is a conceptual distance from OO. Every alternative mentioned
> here is at a particular position to a particular level of OO support (take
> it as a point in a multidimensional space). Every alternative has values
> higher than the rest in a particular dimension, but none of them is
> absolutely closer to the OO support point (represented by
> Java/Eiffel/C#/Python etc) In my opinion, without this evaluation of OO
> support, which is what we use in the actual languages of system
> development, other discussions are not really relevant. What if protocol
> buffers are fast? What if YAML, ADL, or JSON are easier to read, space
> efficient?
>

Do you bundle YAML and XML into that opinion (lacking of OO-support the
same way as protobuf)?

Do you think that dADL can carry everything needed for openEHR (both AM and
RM)? If so why wouldn't YAML? What in basic dADL semantics is missing in
YAML? YAML (using a !-prefixed syntax) and partly XML (using e.g. xsi:Type)
have ways of conveying object type in the case it cannot be inferred from
data.

Maybe I'm being too rigid about this particular issue, but the programming
> language, its tools and frameworks built on it is what determines industry
> adoption more than everything else today. I don't think this is being
> considered in these discussions, but that is just me.
>

I guess language-specific binary formats (like serialized java objects) may
be better for binary representation then. Thanks for the word of warning
regarding protobuf.

Do you think that all openEHR instance serializations really need to be
"object oriented" themselves or is it enough that the classes of
the receiving application are object oriented and that the deserialization
code (or the transfer format) is clever enough to put the data into the
right objects?

There are some cases where different openEHR datatypes may have the same
attribute signature and for those cases even transport formats aiming
reduce verbosity will need to explicitly declare class type since they
cannot be safely inferred.

Best regards,
Erik Sundvall
erik.sundvall at liu.se http://www.imt.liu.se/~erisu/  Tel: +46-13-286733




> On Mon, Dec 5, 2011 at 11:36 AM, Erik Sundvall <erik.sundvall at liu.se>wrote:
>
>> Hi!
>>
>> On Mon, Dec 5, 2011 at 00:10, Heath Frankel <
>> heath.frankel at oceaninformatics.com> wrote:
>>
>>> I think previously I had indicated I had no problem with the stringified
>>> interval approach in XML, but I am reverting my thinking on this and feel
>>> that it would be counter intuitive for those who what to use the XML
>>> schemas for code generation purposes.  I think in this case the computable
>>> requirement outweighs the human readable requirement.
>>>
>>
>> You are probably right regarding XML, and maybe this is valid also for
>> most JSON use-cases where the desire for an as simple as possible
>> object-serialization-mapping outweighs human readability.
>>
>> I think the openEHR community is best served by having different
>> archetype serialization format categories with different priorities for
>> different purposes. E.g.:
>>
>> 1a. An XML format optimized for mapping to XML-schema generated code.
>> 1b. A JSON format optimized for mapping to AOM object models handcrafted
>> or generated from AOM-specifications.
>>
>> 2. A cADL-variant wrapped in YAML optimized for human readability. It
>> could be used for archetype files stored in version control systems (making
>> version diffs readable) and as textual format when you need textual
>> examples in documentation, teaching etc.
>>
>> In 1a & 1b easy implementation should be prioritized over readability but
>> in #2 human readability should be prioritized. Prioritizing both in the
>> same format would likely fail. Things like default ordering of nodes and
>> attributes could be recommended but optional for #1 but should be mandatory
>> for #2 (otherwise readability suffers and diffs get messed up).
>>
>> I think we can come up with a much more concise representation of these
>>> intervals without compromising the computable requirement, something
>>> similar to XML schema maxOccurs/minOccurs.
>>>
>>
>> Probably, but for #1 maybe being close to the AOM should be prioritized
>> over being concise. After all, archetypes will not be sent over the wire at
>> the same scale as patient data (RM instances).
>>
>> By the way, is the AOM open for changes (like renaming attributes) if
>> that would increase clarity?
>>
>> If we would change subject and discuss RM instance serialization, then
>> binary formats (like Protobuf and Thrift) could form a third category where
>> message size and speed of conversion would be prioritized over ease of
>> implementation or readability. XML and JSON would likely be good to have
>> also for interoperability and debugging purposes. YAML for the RM would not
>> be an obvious "over the wire"-format, but can be very useful for compact
>> human readable long term EHR archiving storage as plain text files and for
>> documentation examples.
>>
>> Best regards,
>> Erik Sundvall
>> erik.sundvall at liu.se http://www.imt.liu.se/~erisu/  Tel: +46-13-286733
>>
>>
>>> please everyone remember that the dADL, JSON and XML generated from AWB
>>> all currently use the stringified expression of cardinality / occurrences /
>>> existence. Now, these are usually the most numerous constraints in an
>>> archetype and if expressed in the orthodox way, take up 6 lines of text,
>>> hence the giant files (e.g. AOM 1.4 based XML we currently use) - and thus
>>> the much reduced files you see on Erik's page, because we are using ADL 1.5
>>> flavoured serialisations not the ADL 1.4 one.
>>>
>>>
>>> Now, I think we should probably go with the stringified form in all of
>>> these formalisms. The cost of doing this is a small micro-parser, but it is
>>> the same microparser for everyone, which seems attractive to me.
>>>
>>> The alternative that Erik mentioned was more native, but still efficient
>>> interval expressions, e.g. dADL has it built in (0..* is |>=0| in dADL),
>>> and YAML and JSON could probably be persuaded to make some sort of array of
>>> integer-like things be used. XML still doesn't have any such support. In
>>> theory this approach would be the best if each syntax supported it
>>> properly, but XML doesn't at all, and the others don't support Intervals
>>> with unbounded upper limit (i.e. the '*' in '0..*'). *
>>> *
>>> But Erik's exercise certainly proved that efficient representation of
>>> the humble Interval <Integer> is actually worthwhile. (Once again thanks
>>> for that page, its quite a good way to get a good feel for these syntaxes
>>> very quickly).*
>>> *
>>> - thomas
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111205/1024b5d8/attachment.html>

Could YAML replace dADL as human readable AOM serialization format?

Reply via email to