Could YAML replace dADL as human readable AOM serialization format?

Seref Arikan Mon, 5 Dec 2011 12:32:25 +0000

Hi Erik,
I'll repeat a point I've tried to make before, since it is relevant in the
context of binary serialization.
I've used protocol buffers serialization of AOM in Bosphorus (I'll put the
source code under Opereffa's svn soon, it appears I don't even have time to
clean it up)


These are very fast, but much more simplistic formalisms to represent data.
You can use them to improve the performance of many things, but you'll be
writing a lot of code, and you'll have to find non standard ways of dealing
with the simplicity of the formalism. Here is the simplest example from
Bosphorus: Eiffel is an object oriented language, Java is also an object
oriented language. openEHR specs use interitance, which is reflected into
type hierarchies of both Eiffel and Java classes. You have the protocol
buffers language which does not support inheritance. How do you represent
instances of abstract types in protocol buffers? How do you read/write them
from/to Eiffel/Java? I've done these in my own way, but it will be a
problem every time someone uses formalisms which are not designed for oo
languages and frameworks.

In a way, it is a conceptual distance from OO. Every alternative mentioned
here is at a particular position to a particular level of OO support (take
it as a point in a multidimensional space). Every alternative has values
higher than the rest in a particular dimension, but none of them is
absolutely closer to the OO support point (represented by
Java/Eiffel/C#/Python etc) In my opinion, without this evaluation of OO
support, which is what we use in the actual languages of system
development, other discussions are not really relevant. What if protocol
buffers are fast? What if YAML, ADL, or JSON are easier to read, space
efficient?

Maybe I'm being too rigid about this particular issue, but the programming
language, its tools and frameworks built on it is what determines industry
adoption more than everything else today. I don't think this is being
considered in these discussions, but that is just me.

Kind regards
Seref


On Mon, Dec 5, 2011 at 11:36 AM, Erik Sundvall <erik.sundvall at liu.se> wrote:

> Hi!
>
> On Mon, Dec 5, 2011 at 00:10, Heath Frankel <
> heath.frankel at oceaninformatics.com> wrote:
>
>> I think previously I had indicated I had no problem with the stringified
>> interval approach in XML, but I am reverting my thinking on this and feel
>> that it would be counter intuitive for those who what to use the XML
>> schemas for code generation purposes.  I think in this case the computable
>> requirement outweighs the human readable requirement.
>>
>
> You are probably right regarding XML, and maybe this is valid also for
> most JSON use-cases where the desire for an as simple as possible
> object-serialization-mapping outweighs human readability.
>
> I think the openEHR community is best served by having different archetype
> serialization format categories with different priorities for different
> purposes. E.g.:
>
> 1a. An XML format optimized for mapping to XML-schema generated code.
> 1b. A JSON format optimized for mapping to AOM object models handcrafted
> or generated from AOM-specifications.
>
> 2. A cADL-variant wrapped in YAML optimized for human readability. It
> could be used for archetype files stored in version control systems (making
> version diffs readable) and as textual format when you need textual
> examples in documentation, teaching etc.
>
> In 1a & 1b easy implementation should be prioritized over readability but
> in #2 human readability should be prioritized. Prioritizing both in the
> same format would likely fail. Things like default ordering of nodes and
> attributes could be recommended but optional for #1 but should be mandatory
> for #2 (otherwise readability suffers and diffs get messed up).
>
> I think we can come up with a much more concise representation of these
>> intervals without compromising the computable requirement, something
>> similar to XML schema maxOccurs/minOccurs.
>>
>
> Probably, but for #1 maybe being close to the AOM should be prioritized
> over being concise. After all, archetypes will not be sent over the wire at
> the same scale as patient data (RM instances).
>
> By the way, is the AOM open for changes (like renaming attributes) if that
> would increase clarity?
>
> If we would change subject and discuss RM instance serialization, then
> binary formats (like Protobuf and Thrift) could form a third category where
> message size and speed of conversion would be prioritized over ease of
> implementation or readability. XML and JSON would likely be good to have
> also for interoperability and debugging purposes. YAML for the RM would not
> be an obvious "over the wire"-format, but can be very useful for compact
> human readable long term EHR archiving storage as plain text files and for
> documentation examples.
>
> Best regards,
> Erik Sundvall
> erik.sundvall at liu.se http://www.imt.liu.se/~erisu/  Tel: +46-13-286733
>
>
>> please everyone remember that the dADL, JSON and XML generated from AWB
>> all currently use the stringified expression of cardinality / occurrences /
>> existence. Now, these are usually the most numerous constraints in an
>> archetype and if expressed in the orthodox way, take up 6 lines of text,
>> hence the giant files (e.g. AOM 1.4 based XML we currently use) - and thus
>> the much reduced files you see on Erik's page, because we are using ADL 1.5
>> flavoured serialisations not the ADL 1.4 one.
>>
>>
>> Now, I think we should probably go with the stringified form in all of
>> these formalisms. The cost of doing this is a small micro-parser, but it is
>> the same microparser for everyone, which seems attractive to me.
>>
>> The alternative that Erik mentioned was more native, but still efficient
>> interval expressions, e.g. dADL has it built in (0..* is |>=0| in dADL),
>> and YAML and JSON could probably be persuaded to make some sort of array of
>> integer-like things be used. XML still doesn't have any such support. In
>> theory this approach would be the best if each syntax supported it
>> properly, but XML doesn't at all, and the others don't support Intervals
>> with unbounded upper limit (i.e. the '*' in '0..*'). *
>> *
>> But Erik's exercise certainly proved that efficient representation of the
>> humble Interval <Integer> is actually worthwhile. (Once again thanks for
>> that page, its quite a good way to get a good feel for these syntaxes very
>> quickly).*
>> *
>> - thomas
>>
>>
> _______________________________________________
> openEHR-technical mailing list
> openEHR-technical at openehr.org
> http://lists.chime.ucl.ac.uk/mailman/listinfo/openehr-technical
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111205/32c7c109/attachment.html>

Could YAML replace dADL as human readable AOM serialization format?

Reply via email to