Could YAML replace dADL as human readable AOM serialization format?

Thomas Beale Tue, 22 Nov 2011 12:24:19 +0000

On 22/11/2011 11:51, Erik Sundvall wrote:
> Hi!
>
> A little suggestion/thought (that might be of value also for 
> CIMI-folks and others looking at "archetyping" using ADL and AOM and 
> wondering if a specific language is needed).
>
> *Limitations:*
> For efficient handling of RM (Reference Model) instances (patient 
> data) flying back and forth between systems you'd probably want some 
> binary format (protobuf <http://code.google.com/p/protobuf/>, thrift 
> datatypes <http://thrift.apache.org/>, serialized Java objects or 
> whatever), this is NOT what this suggestion is about. For development 
> and debugging RM-instance exchange you may also want some fairly 
> human-readable serialization that is supported by many platforms (Like 
> JSON <http://www.json.org/>, YAML <http://www.yaml.org/>, XML or 
> whatever) this is NOT what the suggestion is about either. Also note 
> that the current suggestion only aims at looking for replacement of 
> dADL not cADL. Also note that the AOM and XML serialisations of the 
> AOM are not affected by this suggestion.
>
> *Background:*
> cADL (Constraint ADL) is a compact DSL 
> <http://en.wikipedia.org/wiki/Domain-specific_language> that is aimed 
> at defining constraints on an object model, while dADL (Data ADL) on 
> the other hand is mainly a general object-graph serialization format.
> If I understand section 1.7.5 in the ADL 1.5 spec 
> <http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/adl1.5.pdf>
>  
> correctly, ADL 2.0 will allow the option to define *all *parts of an 
> archetype (including what is now done in cADL) as a dADL serialization 
> of the AOM 
> <http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/aom1.5.pdf>
>  
> (Archetype Object Model). Is that correct Tom?


actually, ADL 2.0 as reported in this document is now obsolete. The ADL 
1.5 compiler already does this, and will use it as a fast save/retrieve 
format. See below for example, or download the current release of the 
ADL Workbench to play. I am intending to document the 'P_' classes on 
which this serialisation is based, and on which I think any JSON / YAML 
/ XML serialisation should be based - when we can agree on it. It is in 
these classes that things like occurrences are changed from 
MULTIPLICITY_INTERVAL to String.

>
> *Suggestion:*
> Investigate if YAML can replace or complement dADL 
> as object-graph serialization format for archetypes. (Perhaps there is 
> interest from people using an openEHR AOM implementation in a language 
> that already has YAML serializers to make a quick experiment?)

My motivation for making pure dADL archetypes is to have a fast, 
efficient serialisation of the object graph of an archteype, so that 
when an archetype compiles successfully, it can be saved in this form 
and later retrieved, bypassing the ADL compiler. The value in this is 
that formats like dADL / JSON / YAML are low-level graph serialisations, 
and that really fast parsers can be written for them for use on 
persisted files */known to be correct /*(i.e. generated by a serialiser 
in a previous save). My own dADL parser is not such a fast parser, but 
that's only a matter of time ;-)

So the same arguments would apply to JSON or YAML in my view. At least 
for this purpose (fast save & retrieve of previously compiled 
archetypes), any such format could be used.

>
> *Motivation:*
>
>   * YAML parsers converting YAML documents to native object graphs
>     already exist for a number of languages
>     <http://www.yaml.org/> (C/C++, Ruby, Python, Java, Perl, C#/.NET,
>     PHP, OCaml, Javascript, Actionscript, Haskell) so there would be
>     less work creating and maintaining archetype parsers that turn
>     archetype files into in-memory object graphs. (If you write an
>     archetype authoring tool an need to validate archetypes, not
>     just instantiate already validated archetypes, then the "Validity
>     Rules" (such as the ones in blue under 4.3.1.1  in the AOM spec.)
>     will of course still need to be implemented in software.
>   * Having an archetype specific object-serialization language like
>     dADL might make "archetyping" look more mysterious and suspect and
>     might hide the fact that the semantics expressed in the AOM is the
>     interesting thing that can be serialised in many different ways.
>  *
>     And (admittedly subjective) YAML lists and objects look slightly
>     better and more readable than dADL. A notable exception is
>     probably intervals/ranges that have a compact representation in
>     dADL (see section 4.5.2 of the ADL 1.5 spec
>     
> <http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/adl1.5.pdf>)
>     but not natively in YAML.
>
> *Observations:*
> YAML is extensible, so data types for intervals etc can be added like 
> in http://yaml.org/YAML_for_ruby.html#ranges, also see discussion at 
> http://stackoverflow.com/questions/3337020/how-to-specify-ranges-in-yaml. 
> A similar approach could be taken to dADLs "Plug-in Syntaxes" (see 
> section 4.6 
> <http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/adl1.5.pdf>)
>  
> using YAML. A number of language-independent extra YAML datatypes 
> (timestamp <http://yaml.org/type/timestamp.html>for example) are 
> listed at http://yaml.org/type/index.html and you can define your own 
> if you need more.
>
>

One area where dADL beats JSON and YAML (I think) is its better support 
for Xpath-like paths. Plus its much more compact than JSON. Personally I 
find YAML hard to read because there are so many syntax elements (triple 
'-', triple '.' etc) but that might just be me.

- thomas


~~~~~~~~~~~~~~~~~~~ openEHR-DEMOGRAPHIC-PERSON.person.v1 as dADL via 
P_XX classes ~~~~~~~~~~~~~~~~~

(P_ARCHETYPE) <
     original_language = <[ISO_639-1::pt-br]>
     translations = <
         ["en"] = <
             language = <[ISO_639-1::en]>
             author = <
                 ["name"] = <"Sergio Miranda Freire">
                 ["organisation"] = <"Universidade do Estado do Rio de 
Janeiro - UERJ">
                 ["email"] = <"sergio at lampada.uerj.br">
 >
 >
 >
     description = <
         original_author = <
             ["name"] = <"Sergio Miranda Freire & Rigoleta Dutra Mediano 
Dias">
             ["organisation"] = <"Universidade do Estado do Rio de 
Janeiro - UERJ">
             ["email"] = <"sergio at lampada.uerj.br">
             ["date"] = <"22/05/2009">
 >
         details = <
             ["en"] = <
                 language = <[ISO_639-1::en]>
                 purpose = <"Representation of a person's demographic 
data.">
                 use = <"Used in demographic service to collect a 
person's data.">
                 keywords = <"demographic service", "person's data">
                 misuse = <"">
                 copyright = <"? 2011 openEHR Foundation">
 >
             ["pt-br"] = <
                 language = <[ISO_639-1::pt-br]>
                 purpose = <"Representa??o dos dados demogr?ficos de uma 
pessoa.">
                 use = <"Usado em servi?o demogr?ficos para coletar os 
dados de uma pessoa.">
                 keywords = <"servi?o demogr?fico", "dados de uma pessoa">
                 misuse = <"">
                 copyright = <"? 2011 openEHR Foundation">
 >
 >
         lifecycle_state = <"Authordraft">
         other_contributors = <"Sebastian Garde, Ocean Informatics, 
Germany (Editor)", "Omer Hotomaroglu, Turkey (Editor)", "Heather Leslie, 
Ocean Informatics, Australia (Editor)">
         other_details = <
             ["references"] = <"ISO/TS 22220:2008(E) - Identification of 
Subject of Care - Technical Specification - International Organization 
for Standardization.">
 >
 >
     artefact_object_type = <"DIFFERENTIAL_ARCHETYPE">
     archetype_id = <"openEHR-DEMOGRAPHIC-PERSON.person.v1">
     adl_version = <"1.5">
     artefact_type = <"archetype">
     definition = <
         rm_type_name = <"PERSON">
         node_id = <"at0000">
         attributes = <
             ["1"] = <
                 rm_attribute_name = <"details">
                 children = <
                     ["1"] = (P_ARCHETYPE_SLOT) <
                         rm_type_name = <"ITEM_TREE">
                         node_id = <"at0001">
                         occurrences = <"1">
                         includes = <
                             ["1"] = <
                                 expression = (EXPR_BINARY_OPERATOR) <
                                     type = <"Boolean">
                                     operator = <
                                         value = <2007>
 >
                                     left_operand = (EXPR_LEAF) <
                                         type = <"String">
                                         reference_type = <"attibute">
                                         item = <"archetype_id/value">
 >
                                     right_operand = (EXPR_LEAF) <
                                         type = <"C_STRING">
                                         reference_type = <"constraint">
                                         item = (C_STRING) <
                                             regexp = 
<"(person_details)[a-zA-Z0-9_-]*\\.v1">
                                             is_open = <False>
                                             regexp_default_delimiter = 
<True>
 >
 >
                                     precedence_overridden = <False>
 >
 >
 >
                         is_closed = <False>
 >
 >
                 is_multiple = <False>
 >
             ["2"] = <
                 rm_attribute_name = <"identities">
                 children = <
                     ["1"] = (P_ARCHETYPE_SLOT) <
                         rm_type_name = <"PARTY_IDENTITY">
                         node_id = <"at0002">
                         occurrences = <"1">
                         includes = <
                             ["1"] = <
                                 expression = (EXPR_BINARY_OPERATOR) <
                                     type = <"Boolean">
                                     operator = <
                                         value = <2007>
 >
                                     left_operand = (EXPR_LEAF) <
                                         type = <"String">
                                         reference_type = <"attibute">
                                         item = <"archetype_id/value">
 >
                                     right_operand = (EXPR_LEAF) <
                                         type = <"C_STRING">
                                         reference_type = <"constraint">
                                         item = (C_STRING) <
                                             regexp = 
<"(person_name)[a-zA-Z0-9_-]*\\.v1">
                                             is_open = <False>
                                             regexp_default_delimiter = 
<True>
 >
 >
                                     precedence_overridden = <False>
 >
 >
 >
                         is_closed = <False>
 >
 >
                 is_multiple = <True>
 >
             ["3"] = <
                 rm_attribute_name = <"contacts">
                 children = <
                     ["1"] = (P_C_COMPLEX_OBJECT) <
                         rm_type_name = <"CONTACT">
                         node_id = <"at0003">
                         occurrences = <"1">
                         attributes = <
                             ["1"] = <
                                 rm_attribute_name = <"addresses">
                                 children = <
                                     ["1"] = (P_ARCHETYPE_SLOT) <
                                         rm_type_name = <"ADDRESS">
                                         node_id = <"at0030">
                                         occurrences = <"1">
                                         includes = <
                                             ["1"] = <
                                                 expression = 
(EXPR_BINARY_OPERATOR) <
                                                     type = <"Boolean">
                                                     operator = <
                                                         value = <2007>
 >
                                                     left_operand = 
(EXPR_LEAF) <
                                                         type = <"String">
                                                         reference_type 
= <"attibute">
                                                         item = 
<"archetype_id/value">
 >
                                                     right_operand = 
(EXPR_LEAF) <
                                                         type = <"C_STRING">
                                                         reference_type 
= <"constraint">
                                                         item = (C_STRING) <
                                                             regexp = 
<"(address)([a-zA-Z0-9_-]+)*\\.v1">
                                                             is_open = 
<False>
                                                             
regexp_default_delimiter = <True>
 >
 >
                                                     
precedence_overridden = <False>
 >
 >
                                             ["2"] = <
                                                 expression = 
(EXPR_BINARY_OPERATOR) <
                                                     type = <"Boolean">
                                                     operator = <
                                                         value = <2007>
 >
                                                     left_operand = 
(EXPR_LEAF) <
                                                         type = <"String">
                                                         reference_type 
= <"attibute">
                                                         item = 
<"archetype_id/value">
 >
                                                     right_operand = 
(EXPR_LEAF) <
                                                         type = <"C_STRING">
                                                         reference_type 
= <"constraint">
                                                         item = (C_STRING) <
                                                             regexp = 
<"(electronic_communication)[a-zA-Z0-9_-]*\\.v1">
                                                             is_open = 
<False>
                                                             
regexp_default_delimiter = <True>
 >
 >
                                                     
precedence_overridden = <False>
 >
 >
 >
                                         is_closed = <False>
 >
 >
                                 is_multiple = <True>
 >
 >
 >
 >
                 is_multiple = <True>
 >
             ["4"] = <
                 rm_attribute_name = <"relationships">
                 children = <
                     ["1"] = (P_C_COMPLEX_OBJECT) <
                         rm_type_name = <"PARTY_RELATIONSHIP">
                         node_id = <"at0004">
                         attributes = <
                             ["1"] = <
                                 rm_attribute_name = <"details">
                                 children = <
                                     ["1"] = (P_C_COMPLEX_OBJECT) <
                                         rm_type_name = <"ITEM_TREE">
                                         attributes = <
                                             ["1"] = <
                                                 rm_attribute_name = 
<"items">
                                                 children = <
                                                     ["1"] = 
(P_C_COMPLEX_OBJECT) <
                                                         rm_type_name = 
<"ELEMENT">
                                                         node_id = 
<"at0040">
                                                         attributes = <
                                                             ["1"] = <
                                                                 
rm_attribute_name = <"value">
                                                                 
children = <
                                                                     
["1"] = (P_C_COMPLEX_OBJECT) <
                                                                         
rm_type_name = <"DV_TEXT">
 >
                                                                     
["2"] = (P_C_COMPLEX_OBJECT) <
                                                                         
rm_type_name = <"DV_CODED_TEXT">
                                                                         
attributes = <
                                                                         
     ["1"] = <
                                                                         
         rm_attribute_name = <"defining_code">
                                                                         
         children = <
                                                                         
             ["1"] = (P_CONSTRAINT_REF) <
                                                                         
                 rm_type_name = <"CODE_PHRASE">
                                                                         
                 target = <"ac0000">
 >
 >
                                                                         
         is_multiple = <False>
 >
 >
 >
 >
                                                                 
is_multiple = <False>
 >
 >
 >
 >
                                                 is_multiple = <True>
 >
 >
 >
 >
                                 is_multiple = <False>
 >
 >
 >
 >
                 is_multiple = <True>
 >
 >
 >
     ontology = <
         term_definitions = <
             ["pt-br"] = <
                 ["at0000"] = <
                     text = <"Dados da pessoa">
                     description = <"Dados da pessoa.">
 >
                 ["at0001"] = <
                     text = <"Detalhes">
                     description = <"Detalhes demogr?ficos da pessoa.">
 >
                 ["at0002"] = <
                     text = <"Nome">
                     description = <"Conjunto de dados que especificam o 
nome da pessoa.">
 >
                 ["at0003"] = <
                     text = <"Contatos">
                     description = <"Contatos da pessoa.">
 >
                 ["at0004"] = <
                     text = <"Relacionamentos">
                     description = <"Relacionamentos de uma pessoa, 
especialmente la?os familiares.">
 >
                 ["at0030"] = <
                     text = <"Endere?o">
                     description = <"Endere?os vinculados a um ?nico 
contato, ou seja, com o mesmo per?odo de validade.">
 >
                 ["at0040"] = <
                     text = <"Grau de parentesco">
                     description = <"Define o grau de parentesco entre 
as pessoas envolvidas.">
 >
 >
             ["en"] = <
                 ["at0000"] = <
                     text = <"Person">
                     description = <"Personal demographic data.">
 >
                 ["at0001"] = <
                     text = <"Demographic details">
                     description = <"A person's demographic details.">
 >
                 ["at0002"] = <
                     text = <"Name">
                     description = <"A person's name.">
 >
                 ["at0003"] = <
                     text = <"Contacts">
                     description = <"A person's contacts.">
 >
                 ["at0004"] = <
                     text = <"Relationships">
                     description = <"A person's relationships, 
especially family ties.">
 >
                 ["at0030"] = <
                     text = <"Addresses">
                     description = <"Addresses linked to a single 
contact, i.e. with the same time validity.">
 >
                 ["at0040"] = <
                     text = <"Relationship type">
                     description = <"Defines the type of relationship 
between related persons.">
 >
 >
 >
         constraint_definitions = <
             ["pt-br"] = <
                 ["ac0000"] = <
                     text = <"C?digos para tipo de parentesco">
                     description = <"c?digos v?lidos para tipo de 
parentesco.">
 >
 >
             ["en"] = <
                 ["ac0000"] = <
                     text = <"Codes for type of relationship">
                     description = <"Valid codes for type of relationship.">
 >
 >
 >
 >
     is_controlled = <False>
     is_generated = <True>
     is_valid = <True>
 >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111122/91a18b27/attachment.html>

Could YAML replace dADL as human readable AOM serialization format?

Reply via email to