Could YAML replace dADL as human readable AOM serialization format?
I have to say, the more I look at YAML, the more I wonder what the designers were thinking. For example, in this section of the spec, multi-line quoted strings are only allowed if the 'key' is also quoted (the strange looking JSON approach); if the key is not quoted (i.e. 'simple') then the value can't be quoted either. That's just nonsense! I am glad I am only implementing a serialiser, not a parser... - thomas
Could YAML replace dADL as human readable AOM serialization format?
On 15/12/2011 11:31, Thomas Beale wrote: I have to say, the more I look at YAML, the more I wonder what the designers were thinking. For example, in this section of the spec, http://yaml.org/spec/current.html#id2532720 multi-line quoted strings are only allowed if the 'key' is also quoted (the strange looking JSON approach); if the key is not quoted (i.e. 'simple') then the value can't be quoted either. That's just nonsense! I am glad I am only implementing a serialiser, not a parser... - thomas ___ openEHR-technical mailing list openEHR-technical at openehr.org http://lists.chime.ucl.ac.uk/mailman/listinfo/openehr-technical
Could YAML replace dADL as human readable AOM serialization format?
Hi! On Thu, Dec 15, 2011 at 12:44, Thomas Beale thomas.beale at oceaninformatics.com wrote: I have to say, the more I look at YAML, the more I wonder what the designers were thinking. For example, in this section of the spec, http://yaml.org/spec/current.html#id2532720 multi-line quoted strings are only allowed if the 'key' is also quoted (the strange looking JSON approach); if the key is not quoted (i.e. 'simple') then the value can't be quoted either. That's just nonsense! Are you sure that is what it says? Double quoted scalars are restricted to a single line when contained inside a simple key. Is it not rather that you may not use a multiline double quoted string as a KEY (at all). It does NOT forbid you to use multiline?double quoted strings in the value, no matter if or how you quote your keys. I have certainly seen?double?quoted values for unquoted keys coming from serializers claiming to be specification conformant. Are any of your keys so long and complicated that they would need multiline quoted strings? I am glad I am only implementing a serialiser, not a parser... In many less exotic languages they are already implemented :-) Then you configure them and then throw your object trees at them. An example of very unfinished work in progress, using poorly readable ordering and based on the openEHR java-ref-impl (and probably exposing too many fields) is attached below. Best regards, Erik Sundvall erik.sundvall at liu.se?http://www.imt.liu.se/~erisu/? Tel: +46-13-286733 !http://www.openehr.org/releases/1.0.2/class/openehr.am.archetype.ARCHETYPE adl_version: '1.4' archetype_id: openEHR-DEMOGRAPHIC-PERSON.person.v1 concept: at original_language: ISO_639-1::pt-br translations: en: language: ISO_639-1::en author: {email: sergio at lampada.uerj.br, organisation: Universidade do Estado do Rio de Janeiro - UERJ, name: Sergio Miranda Freire} description: original_author: {email: sergio at lampada.uerj.br, organisation: Universidade do Estado do Rio de Janeiro - UERJ, name: Sergio Miranda Freire Rigoleta Dutra Mediano Dias, date: 22/05/2009} other_contributors: ['Sebastian Garde, Ocean Informatics, Germany (Editor)', 'Omer Hotomaroglu, Turkey (Editor)', 'Heather Leslie, Ocean Informatics, Australia (Editor)'] lifecycle_state: Authordraft details: - language: ISO_639-1::en purpose: Representation of a person's demographic data. keywords: [demographic service, person's data] use: Used in demographic service to collect a person's data. copyright: ? openEHR Foundation original_resource_uri: {} - language: ISO_639-1::pt-br purpose: Representa??o dos dados demogr?ficos de uma pessoa. keywords: [servi?o demogr?fico, dados de uma pessoa] use: Usado em servi?o demogr?ficos para coletar os dados de uma pessoa. copyright: ? openEHR Foundation original_resource_uri: {} other_details: {references: 'ISO/TS 0:2008(E) - Identification of Subject of Care - Technical Specification - International Organization for Standardization.'} definition: attributes: - rm_attribute_name: details children: - includes: - expression: left_operand: {item: archetype_id/value, reference_type: CONSTANT, type: STRING} right_operand: item: {pattern: '(person_details)[a-zA-Z0-9_-]*\.v1'} reference_type: CONSTANT type: String operator: OP_MATCHES precedence_overridden: false type: BOOLEAN rm_type_name: ITEM_TREE occurrences: [1, 1] node_i_d: at0001 any_allowed: false path: /details[at0001] any_allowed: false path: /details - rm_attribute_name: identities children: - includes: - expression: left_operand: {item: archetype_id/value, reference_type: CONSTANT, type: STRING} right_operand: item: {pattern: '(person_name)[a-zA-Z0-9_-]*\.v1'} reference_type: CONSTANT type: String operator: OP_MATCHES precedence_overridden: false type: BOOLEAN rm_type_name: PARTY_IDENTITY occurrences: [1, 1] node_i_d: at0002 any_allowed: false path: /identities[at0002] any_allowed: false path: /identities - rm_attribute_name: contacts children: - attributes: - rm_attribute_name: addresses children: - includes: - expression: left_operand: {item: archetype_id/value, reference_type: CONSTANT, type: STRING} right_operand: item: {pattern: '(address)([a-zA-Z0-9_-]+)*\.v1'} reference_type: CONSTANT type: String operator: OP_MATCHES precedence_overridden: false type: BOOLEAN - expression: left_operand: {item: archetype_id/value, reference_type: CONSTANT, type: STRING} right_operand: item:
Could YAML replace dADL as human readable AOM serialization format?
On 15/12/2011 12:51, Erik Sundvall wrote: Hi! Are you sure that is what it says? Double quoted scalars are restricted to a single line when contained inside a simple key. well I read this to say: * if you double quote a long String containing line breaks (if you don't yet get into different trouble) THEN * this scalar cannot be the value of a 'simple key'; * a 'simple key' is defined as: o A /simple key/ http://yaml.org/spec/current.html#index-entry-simple%20key has no identifying mark. It is recognized as being a key either due to being inside a flow mapping, or by being followed by an explicit value. Hence, to avoid unbound lookahead in YAML processors http://yaml.org/spec/current.html#processor/, simple keys are restricted to a single line and must not span more than 1024 stream http://yaml.org/spec/current.html#stream/syntax characters (hence the need for the /flow-key context/ http://yaml.org/spec/current.html#index-entry-flow-key%20context). Note the 1024 character limit is in terms of Unicode characters rather than stream octets, and that it includes the separation http://yaml.org/spec/current.html#separation%20space/ following the key itself. maybe I misunderstood that a 'simple key' can't have quotes, but in any case, the concept of a 'simple key', if the object of YAML is object data serialisation is ... pretty strange (if they are hash keys, then they are normal strings, there should be no problem. Not distingishing between hash keys and attribute names seems to be a problem in YAML as for JSON. Very odd design IMO). Why the syntactic structure of a 'value' should have any dependence on the syntactic structure of a 'key' is beyond me. Anyway, for the moment I will stick with the format (for Strings): unquoted_key: double quoted string this format passes the online parser tests, and handles multi-line strings better. Otherwise you have to use '|', '' and or '\' markers all over the place. - thomas -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111215/05fa19d7/attachment.html
Could YAML replace dADL as human readable AOM serialization format?
Oh, just my personal thoughts without any sanity check - should have read the whole thread first! My reaction was just to what was written in the subject line of the thread and after reading Seref's comments about the need to focus on outstanding/high priority issues. Sorry if I have offended - I can't possibly be against free discussions here - even the most blue sky ones which I seldom broadcast myself ;) Cheers, -koray From: openehr-technical-bounces at openehr.org [mailto:openehr-technical-boun...@openehr.org] On Behalf Of Erik Sundvall Sent: Wednesday, 7 December 2011 11:30 p.m. To: For openEHR technical discussions Subject: Re: Could YAML replace dADL as human readable AOM serialization format? Oh sigh... Trying to be open minded, thinking a few steps ahead, sharing thoughts and regularly reevaluating design decisions does not seem to be appreciated by all on this list. Perhaps we need to mark some discussions or sections with... [Warning: may contain new thoughts] ...so that those of us that enjoy such discussions may continue to have them and those that get distracted by them or can't stand them could filter out those parts. On Tue, Dec 6, 2011 at 22:23, Koray Atalag k.atalag at auckland.ac.nzmailto:k.atalag at auckland.ac.nz wrote: Yeah I was also wondering what is the driver/motivation/aspiration behind using JSON, YAML etc. instead of good old ADL? Good old which ADL? Please go back in the thread and note the difference between dADL and cADL in the reasoning, dADL is a reinvention of the wheel (object tree serialization) cADL is an optimized DSL that I have not seen any obvious widespread alternative to if brevity and readability is sought for. Regarding the motivation you ask for, I would recommend going back in the thread again to the first message... http://www.openehr.org/mailarchives/openehr-technical/msg06186.html ...under the boldface heading Motivation:, that you may have missed, and read the three bullet points. You may not agree but that and the rest of this current message might reduce your wondering about the discussion origins. I also think that we as a community should look at getting more organised and get our efforts in tune Yes, a bit of diversity is good in order to best explore design space, but duplicating work is a waste of time. If we are allowed to discuss future-directed thoughts on this list (without people getting too upset) that may also help us tune our efforts. If we must implement first and then discuss it will be a lot harder to avoid duplication of work. as I know that quite interesting and though times are about to come... Are you referring to the CIMI-discusions or is it a general observation about how the future usually is :-) Regarding CIMI I think it is valuable to try to look upon openEHR with the eyes of newcomers. If there is unnecessary legacy in models or formats that we don't easily see because we have gotten used to it, then now is a good time to try reducing it while the amount of patient data using openEHR is limited. It will be harder to change things later. Getting the template formalism integrated with the AOM 1.5 was great in this sense, and so is Tom's experimentation with RM 2.0 constructs that may reduce the ITEM_STRUCTURE hierarchy. From: ... On Behalf Of Stef Verlinden +1 +/- infinity Yay, I love flame wars :-) On Tue, Dec 6, 2011 at 12:44, Seref Arikan serefarikan at kurumsalteknoloji.commailto:serefarikan at kurumsalteknoloji.com wrote: Given this, if you or someone else thinks that YAML can be an alternative to dADL, there is nothing stopping anyone than implementing it and using it. Absolutely nothing. Do you assume that if somebody is talking about a subject, then they can't possibly be in the middle of implementing it and wanting to share thoughts at an early stage? Please try to be a bit more open minded, I did not ask you to be the first to implement YAML support. You are not the the only one implementing openEHR stuff, but I will admit that you deserve credit for, and are great at release early, release often and I am not (yet). Thomas is heroically responding to all queries without judgement... I think that is an unfair description of Tom's judgment. I have a feeling that all these discussions about if this or that could replace dADL are too hypothetical. Most of the time they are academic discussions. There is nothing wrong with academic discussions, I am doing a PhD here, but if the openEHR community is spending its time and resources for academic discussions which do not necessarily connect to real life implementations in the near term, then I think we have a problem. So if something is not on your personal implementation agenda in near time, then it is academic and a waste of resources since it can not possibly be on the implementation agenda of somebody else... :-) The reason I started looking into both JSON and YAML is that they are part of our current
Could YAML replace dADL as human readable AOM serialization format?
After reading Pablo's post on domain types I am curious about how should they be represented on each one of the different formats. I feel they should be 'expanded' before trying to represent them in any other format, but I might be wrong. Any ideas or opinions? 2011/12/8 Koray Atalag k.atalag at auckland.ac.nz: Oh, just my personal thoughts without any sanity check ? should have read the whole thread first! My reaction was just to what was written in the subject line of the thread and after reading Seref?s comments about the need to focus on outstanding/high priority issues. Sorry if I have offended ? I can?t possibly be against free discussions here ? even the most blue sky ones which I seldom broadcast myself ;) Cheers, -koray From: openehr-technical-bounces at openehr.org [mailto:openehr-technical-bounces at openehr.org] On Behalf Of Erik Sundvall Sent: Wednesday, 7 December 2011 11:30 p.m. To: For openEHR technical discussions Subject: Re: Could YAML replace dADL as human readable AOM serialization format? Oh sigh... Trying to be open minded, thinking a few steps ahead, sharing thoughts and regularly reevaluating design decisions does not seem to be appreciated by all on this list. Perhaps we need to mark some discussions or sections with... [Warning: may contain new thoughts] ...so that those of us that enjoy such discussions may continue to have them and those that get distracted by them or can't stand them could filter out those parts. On Tue, Dec 6, 2011 at 22:23, Koray Atalag k.atalag at auckland.ac.nz wrote: Yeah I was also wondering what is the driver/motivation/aspiration behind using JSON, YAML etc. instead of good old ADL? Good old which ADL? Please go back in the thread and note the difference between dADL and cADL in the reasoning, dADL is a reinvention of the wheel (object tree serialization) cADL is an optimized DSL that I have not seen any obvious widespread alternative to if brevity and readability is sought for. Regarding the motivation you ask for, I would recommend going back in the thread again to the first message... http://www.openehr.org/mailarchives/openehr-technical/msg06186.html ...under the boldface heading Motivation:, that you may have missed, and read the three bullet points. You may not agree but that and the rest of this current message might reduce your wondering about the discussion origins. I also think that we as a community should look at getting more organised and get our efforts in tune Yes, a bit of diversity is good in order to best explore design space, but duplicating work is a waste of time. If we are allowed to discuss future-directed thoughts on this list (without people getting too upset) that may also help us tune our efforts. If we must implement first and then discuss it will be a lot harder to avoid duplication of work. as I know that quite interesting and though times are about to come? Are you referring to the CIMI-discusions or is it a general observation about how the future usually is :-) Regarding CIMI I think it is valuable to try to look upon openEHR with the eyes of newcomers. If there is unnecessary legacy in models or formats that we don't easily see because we have gotten used to it, then now is a good time to try reducing it while the amount of patient data using openEHR is limited. It will be harder to change things later. Getting the template formalism integrated with the AOM 1.5 was great in this sense, and so is?Tom's experimentation with RM 2.0 constructs that may reduce the ITEM_STRUCTURE hierarchy. From:?...?On Behalf Of Stef Verlinden +1 +/- infinity ?Yay, I love flame wars :-) On Tue, Dec 6, 2011 at 12:44, Seref Arikan?serefarikan at kurumsalteknoloji.com?wrote: Given this, if you or someone else thinks that YAML can be an alternative to dADL, there is nothing stopping anyone than implementing it and using it. Absolutely nothing. Do you assume that if somebody is talking about a subject, then they can't possibly be in the middle of implementing it and wanting to share thoughts at an early stage? Please try to be a bit more open minded, I did not ask you to be the first to implement YAML support.?You are not the the only one implementing openEHR stuff, but I will admit that you deserve credit for, and are great at release early, release often and I am not (yet). Thomas is heroically responding to all queries without judgement... I think that is an unfair description of Tom's judgment. I have a feeling that all these discussions about if this or that could replace dADL are too hypothetical. Most of the time they are academic discussions. There is nothing wrong with academic discussions, I am doing a PhD here, but if the openEHR community is spending its time and resources for academic discussions which do not necessarily connect to real life implementations in the near term
Could YAML replace dADL as human readable AOM serialization format?
I have no problems on having different representations. In fact, having different representations means more happy people, not less (for example, people has been using RDF to describe archetypes for some time). Anyway I love this kind of threads, as are great to see new perspectives and technologies. P.s. I like your idea of a GIT based distributed concept repository. If you want to start an off-list discussion please count us in, as we are also working on a reference model independent concept repository :D 2011/12/7 Erik Sundvall erik.sundvall at liu.se: Oh sigh... Trying to be open minded, thinking a few steps ahead, sharing thoughts and regularly reevaluating design decisions does not seem to be appreciated by all on this list. Perhaps we need to mark some discussions or sections with... [Warning: may contain new thoughts] ...so that those of us that enjoy such discussions may continue to have them and those that get distracted by them or can't stand them could filter out those parts. On Tue, Dec 6, 2011 at 22:23, Koray Atalag k.atalag at auckland.ac.nz wrote: Yeah I was also wondering what is the driver/motivation/aspiration behind using JSON, YAML etc. instead of good old ADL? Good old which ADL? Please go back in the thread and note the difference between dADL and cADL in the reasoning, dADL is a reinvention of the wheel (object tree serialization) cADL is an optimized DSL that I have not seen any obvious widespread alternative to if brevity and readability is sought for. Regarding the motivation you ask for, I would recommend going back in the thread again to the first message... http://www.openehr.org/mailarchives/openehr-technical/msg06186.html ...under the boldface heading Motivation:, that you may have missed, and read the three bullet points. You may not agree but that and the rest of this current message might reduce your wondering about the discussion origins. I also think that we as a community should look at getting more organised and get our efforts in tune Yes, a bit of diversity is good in order to best explore design space, but duplicating work is a waste of time. If we are allowed to discuss future-directed thoughts on this list (without people getting too upset) that may also help us tune our efforts. If we must implement first and then discuss it will be a lot harder to avoid duplication of work. as I know that quite interesting and though times are about to come? Are you referring to the CIMI-discusions or is it a general observation about how the future usually is :-) Regarding CIMI I think it is valuable to try to look upon openEHR with the eyes of newcomers. If there is unnecessary legacy in models or formats that we don't easily see because we have gotten used to it, then now is a good time to try reducing it while the amount of patient data using openEHR is limited. It will be harder to change things later. Getting the template formalism integrated with the AOM 1.5 was great in this sense, and so is?Tom's experimentation with RM 2.0 constructs that may reduce the ITEM_STRUCTURE hierarchy. From:?...?On Behalf Of Stef Verlinden +1 +/- infinity ?Yay, I love flame wars :-) On Tue, Dec 6, 2011 at 12:44, Seref Arikan?serefarikan at kurumsalteknoloji.com?wrote: Given this, if you or someone else thinks that YAML can be an alternative to dADL, there is nothing stopping anyone than implementing it and using it. Absolutely nothing. Do you assume that if somebody is talking about a subject, then they can't possibly be in the middle of implementing it and wanting to share thoughts at an early stage? Please try to be a bit more open minded, I did not ask you to be the first to implement YAML support.?You are not the the only one implementing openEHR stuff, but I will admit that you deserve credit for, and are great at release early, release often and I am not (yet). Thomas is heroically responding to all queries without judgement... I think that is an unfair description of Tom's judgment. I have a feeling that all these discussions about if this or that could replace dADL are too hypothetical. Most of the time they are academic discussions. There is nothing wrong with academic discussions, I am doing a PhD here, but if the openEHR community is spending its time and resources for academic discussions which do not necessarily connect to real life implementations in the near term, then I think we have a problem. So if something is not on your personal implementation agenda in near time, then it is academic and a waste of resources since it can not possibly be on the implementation agenda of somebody else... :-) The reason I started looking into both JSON and YAML is that they are part of our current implementation (partly using JSON, Javascript etc) (primarily for RM objects) in this process I happened to see that YAML might do the job of dADL and that we then we could reuse
Could YAML replace dADL as human readable AOM serialization format?
Erik, Add my sigh next to yours... Lots of misunderstandings, will try to respond to most obvious ones. I have clearly expressed that all discussions here are useful. I've made no connection to my agenda. My academic work does not even require the things I've mentioned as high priority for openEHR. I've been enjoying the discussions, and will continue to do so. Your comments about dADL below, as well as your original motivations is hinting at what I'm opposing to. Your own words: *Having an archetype specific object-serialization language like dADL might make archetyping look more mysterious and suspect and might hide the fact that the semantics expressed in the AOM is the interesting thing that can be serialised in many different ways.* This is a negative statement about ADL, right? Nothing wrong with negative statements with ADL, I have a bunch of them in my pocket. But if this is your motivation to discuss YAML, and if the thread you've started is about replacing adl, you're talking about replacing something that has taken lots of time and effort to create. This is where we have our difference, I agree with many of the criticisms of ADL, and it is exactly at this point I try to be open minded. I can see that there are also significant advantages of ADL, and rather than suggesting that is replaced, I first hypothesize and then go ahead and prove that it can co exist with xml, json, yaml etc. My work is out there showing that adl can co exist alongside with these formalisms. From my point of view, this is quite an open minded approach, at least more open minded than replacing it, without considering what it would actually mean in other contexts. This is not the first time I'm having these types of discussions, and won't be the last. I make my point whenever I see a discussion that seems to suggest switching horses midstream. I'm sorry if I'm being a buzz killer, but I'm in favor of discussing things in a larger context, including consequences for the openEHR standard and its adoption. Reminding these consequences does not mean I'm ruling out other options. I have been discussing them in light of all the proof I have (through my work) and I've asked others to do so. I can not know about your work in advance, can I ? Let us try to eliminate the misunderstanding at this point: If this discussion concludes with the common view that yaml can be an alternative to dADL, do you think openEHR specification should replace ADL? If the answer to the previous question is yes, then do you realize that this would mean replacing all the software that uses ADL, both open source and proprietary ? If the answer to the previous question is yes, then do you have a suggestion for funding these changes? I think this is the best I can do to explain what I'm trying to include in the discussions. Best regards Seref On Wed, Dec 7, 2011 at 10:29 AM, Erik Sundvall erik.sundvall at liu.se wrote: Oh sigh... Trying to be open minded, thinking a few steps ahead, sharing thoughts and regularly reevaluating design decisions does not seem to be appreciated by all on this list. Perhaps we need to mark some discussions or sections with... [Warning: may contain new thoughts] ...so that those of us that enjoy such discussions may continue to have them and those that get distracted by them or can't stand them could filter out those parts. On Tue, Dec 6, 2011 at 22:23, Koray Atalag k.atalag at auckland.ac.nzwrote: Yeah I was also wondering what is the driver/motivation/aspiration behind using JSON, YAML etc. instead of good old ADL? Good old which ADL? Please go back in the thread and note the difference between dADL and cADL in the reasoning, dADL is a reinvention of the wheel (object tree serialization) cADL is an optimized DSL that I have not seen any obvious widespread alternative to if brevity and readability is sought for. Regarding the motivation you ask for, I would recommend going back in the thread again to the first message... http://www.openehr.org/mailarchives/openehr-technical/msg06186.html ...under the boldface heading *Motivation:*, that you may have missed, and read the three bullet points. You may not agree but that and the rest of this current message might reduce your wondering about the discussion origins. I also think that we as a community should look at getting more organised and get our efforts in tune Yes, a bit of diversity is good in order to best explore design space, but duplicating work is a waste of time. If we are allowed to discuss future-directed thoughts on this list (without people getting too upset) that may also help us tune our efforts. If we must implement first and then discuss it will be a lot harder to avoid duplication of work. as I know that quite interesting and though times are about to come? Are you referring to the CIMI-discusions or is it a general observation about how the future usually is :-) Regarding CIMI I think it is valuable
Could YAML replace dADL as human readable AOM serialization format?
On 05/12/2011 12:36, Erik Sundvall wrote: Hi! On Mon, Dec 5, 2011 at 00:10, Heath Frankel heath.frankel at oceaninformatics.com mailto:heath.frankel at oceaninformatics.com wrote: I think previously I had indicated I had no problem with the stringified interval approach in XML, but I am reverting my thinking on this and feel that it would be counter intuitive for those who what to use the XML schemas for code generation purposes. I think in this case the computable requirement outweighs the human readable requirement. You are probably right regarding XML, and maybe this is valid also for most JSON use-cases where the desire for an as simple as possible object-serialization-mapping outweighs human readability. I think the openEHR community is best served by having different archetype serialization format categories with different priorities for different purposes. E.g.: 1a. An XML format optimized for mapping to XML-schema generated code. 1b. A JSON format optimized for mapping to AOM object models handcrafted or generated from AOM-specifications. 2. A cADL-variant wrapped in YAML optimized for human readability. It could be used for archetype files stored in version control systems (making version diffs readable) and as textual format when you need textual examples in documentation, teaching etc. I had never thought of that but the AWB has a multi-part serialiser component, so it would be possible. When I get a bit of time ;-) In 1a 1b easy implementation should be prioritized over readability but in #2 human readability should be prioritized. Erik, You didn't answer the question a while ago - who are the 'readers'? I am just asking to know if you are talking about some particular kind of educational usage, and what your criteria are for 'readability'. Prioritizing both in the same format would likely fail. Things like default ordering of nodes and attributes could be recommended but optional for #1 but should be mandatory for #2 (otherwise readability suffers and diffs get messed up). good point, you reminded me I have to fix the order in the AWB serialisations. I think we can come up with a much more concise representation of these intervals without compromising the computable requirement, something similar to XML schema maxOccurs/minOccurs. Probably, but for #1 maybe being close to the AOM should be prioritized over being concise. After all, archetypes will not be sent over the wire at the same scale as patient data (RM instances). how can a string like 1 or 2..* be more concise? I think this is the most concise possible format (or some slight variation, e.g. the dADL interval syntax). By the way, is the AOM open for changes (like renaming attributes) if that would increase clarity? well the AOM 1.5 is a draft, so in principle yes. But we need to assess the impact. Breaking archetype authoring tools probably does not matter so much - there are not many, so we can deal with that. Impacts on EHR system software will have to be more closely evaluated before we agreed to any such changes. But let us know your fantasies anyway ;-) - thomas -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111207/5dbff115/attachment.html
Could YAML replace dADL as human readable AOM serialization format?
On 06/12/2011 12:44, Seref Arikan wrote: A bunch of responses, most of which should actually go to a wiki page for Bosphorus I've used binary serialization for AOM because although Eiffel is a very impressive language, I am not happy about its libraries. Some of them are mature, but for XML, I could not find anything that'd be guaranteed to be maintained. I don't think there is any problem with them being maintained, they are part of the main Eiffel tool. The choice of Protocl buffers (or maybe there is another better variant?) makes sense on the basis of performance Protocol buffers is a technology that is used very heavily in Google, and has a large community. Performance is the key aspect of protocol buffers. It is very, very fast. When I'm exchanging simple messages over ZeroMQ (a very fast queue framework that is used in Bosphorus) I can achieve microsecond level performance (not even millisecond!) for Java to Eiffel communication. For desktop tooling purposes, this is much faster than XML. orders of magnitude... Thomas is heroically responding to all queries without judgement, and he is even implementing a lot of code, to give grounded answers, to provide proofs. don't give me too much credit: my lightweight serialisation library allowed me to implement JSON output in about 4 hours, plus two days background debugging of the {[]} I guess I am not as mature and as dedicated as he is. I'd rather have him working on adl 1.5 XSD schemas than proving people that openEHR can do JSON if necessary. Because having XSDs for ADL 1.5 is going to increase adoption of openEHR a lot more than having JSON output. If anybody out there does not agree, please come forward and talk about your JSON usage in your project which is about an actual information system that is running, or is supposed to run in a clinical setting. yes, I think it is about time we posted a proposed AOM 1.5 XSD... - thomas
Could YAML replace dADL as human readable AOM serialization format?
On 07/12/2011 11:29, Erik Sundvall wrote: Good old which ADL? Please go back in the thread and note the difference between dADL and cADL in the reasoning, dADL is a reinvention of the wheel (object tree serialization) Erik, out of academic interest: was either YAML or JSON around in 2000, when I made a first version of dADL (I'm in a plane typing this, can't check)? If they were, I look silly ;-) If not... In any case, JSON is seriously semantically deficient for proper serialisation purposes and is in need of at least 2 basic enhancements to work correctly on any realistic data. I agree it is fairly readable, although why attribute names are in quotes is completely beyond me...I have not yet looked at YAML properly, but it looks like it probably does the job properly. Yes, a bit of diversity is good in order to best explore design space, but duplicating work is a waste of time. If we are allowed to discuss future-directed thoughts on this list (without people getting too upset) that may also help us tune our efforts. If we must implement first and then discuss it will be a lot harder to avoid duplication of work. I don't actually think there is any harm in messing around with variations on serialisation - it's not hard to implement (XML being the hardest), but at some point I think a wiki page with a summary of real world requirements behind each variant would be useful. Are you referring to the CIMI-discusions or is it a general observation about how the future usually is :-) Regarding CIMI I think it is valuable to try to look upon openEHR with the eyes of newcomers. If there is unnecessary legacy in models or formats that we don't easily see because we have gotten used to it, then now is a good time to try reducing it while the amount of patient data using openEHR is limited. It will be harder to change things later. Getting the template formalism integrated with the AOM 1.5 was great in this sense, and so is Tom's experimentation with RM 2.0 constructs that may reduce the ITEM_STRUCTURE hierarchy. I have to do a bit more work to get the first proposal defined properly - there is a half done wiki page on that. Should have it fixed in a couple of days, then we can discuss. (I'm not online but if others find the page, feel free to put your own RM 2.0 variations on there somewhere). +/- infinity Yay, I love flame wars :-) you can't win like that. Godel or someone showed that there are different sizes of infinity :) The reason I started looking into both JSON and YAML is that they are part of our current implementation (partly using JSON, Javascript etc) (primarily for RM objects) in this process I happened to see that YAML might do the job of dADL and that we then we could reuse parser/serializer work of others (for many programming languages) instead of maintaining dADL frameworks. I wanted to share this thought at an early stage and I do appreciate that some have at least responded with positive interest and curiosity. at some point I intend to finalise the ultimate dADL grammar and publish dADL as a standalone with at least C#, Java, Eiffel and possibly C/C++ fast full parsers + serialisers. This is less work than you might think, and it would make dADL just as available as YAML. Well, ok it won't be in Erlang or Haskell for a while, but I doubt if that will make much difference. - thomas * * -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111207/01b62bc5/attachment.html
Could YAML replace dADL as human readable AOM serialization format?
A bunch of responses, most of which should actually go to a wiki page for Bosphorus I've used binary serialization for AOM because although Eiffel is a very impressive language, I am not happy about its libraries. Some of them are mature, but for XML, I could not find anything that'd be guaranteed to be maintained. Protocol buffers is a technology that is used very heavily in Google, and has a large community. Performance is the key aspect of protocol buffers. It is very, very fast. When I'm exchanging simple messages over ZeroMQ (a very fast queue framework that is used in Bosphorus) I can achieve microsecond level performance (not even millisecond!) for Java to Eiffel communication. For desktop tooling purposes, this is much faster than XML. You need to instantiate concrete instances of abstract types every time you use single or multiple attributes in AOM. Both classes descend from CAttribute. So AOM specification gives you a field with type CAttribute (abstract), and instances of this type always have either a single or multiple attribute object assigned to this field. The Eiffel parser creates an AOM Object when it parses an archetype, On the other side of the bridge, a Java object awaits to be filled with the data in the Eiffel object. Both Java and Eiffel know the relationship between these types but protocol buffers does not have inheritance. So when you're defining a protocol buffer message with its language, you have a problem: What should be the type of the field that represents CAttribute? I've had to come up with a method of handling this case. Someone may use another method and that is my point: when we have to do these things, they become source of bugs and obstacles to implementation. So we may benefit from format and readability of JSON, but the type of issues I've been describing would introduce a lot more problems than bandwidth efficiency or human friendliness. Hence, my priorities are slightly different when it comes to what makes a formalism convenient in openEHR implementation. With this view: I find XML seriously crippled for OO support, but at least there is some inheritance support and there is huge tooling and framework support. My job would be to find ways of walking around issues using these frameworks. I'd prefer this to having less tooling and less OO support (for JSON) I can't speak for YAML, but in terms of maturity and support for mechanisms such as schemas, I'd be surprised if it ends up better than XML. For XML, I have JAXB, support in JAVA, Python, .NET, you name it... dADL has the advantage of being designed in a strong openEHR context. I guess both YAML (based on the feature you've mentioned) and XML can match dADL to the extend that any required workarounds could be justified based on industry adoption. I do not know YAML good enough to compare it in detail, but I'd love to hear from someone the type of things I've been sharing here, only with YAML this time instead of JSON and XML. Given this, if you or someone else thinks that YAML can be an alternative to dADL, there is nothing stopping anyone than implementing it and using it. Absolutely nothing. This is what I do. If I think that and XML form of ADL would help, then I take what is out there (Tom's Eiffel code), use it, and move on. I have a feeling that all these discussions about if this or that could replace dADL are too hypothetical. Most of the time they are academic discussions. There is nothing wrong with academic discussions, I am doing a PhD here, but if the openEHR community is spending its time and resources for academic discussions which do not necessarily connect to real life implementations in the near term, then I think we have a problem. Thomas is heroically responding to all queries without judgement, and he is even implementing a lot of code, to give grounded answers, to provide proofs. I guess I am not as mature and as dedicated as he is. I'd rather have him working on adl 1.5 XSD schemas than proving people that openEHR can do JSON if necessary. Because having XSDs for ADL 1.5 is going to increase adoption of openEHR a lot more than having JSON output. If anybody out there does not agree, please come forward and talk about your JSON usage in your project which is about an actual information system that is running, or is supposed to run in a clinical setting. Please do not get me wrong, all the discussion we are having here is useful, it is just that in my humble opinion, some discussions are more useful than others if this standard into which I am heavily investing is to go forward. Best regards Seref On Mon, Dec 5, 2011 at 2:52 PM, Erik Sundvall erik.sundvall at liu.se wrote: Hi Seref! On Mon, Dec 5, 2011 at 13:32, Seref Arikan serefarikan at kurumsalteknoloji.com wrote: I'll repeat a point I've tried to make before, since it is relevant in the context of binary serialization. I've used protocol buffers serialization of AOM in Bosphorus Why do you use binary
Could YAML replace dADL as human readable AOM serialization format?
+1 Cheers, Stef Op 6 dec. 2011, om 12:44 heeft Seref Arikan het volgende geschreven: Please do not get me wrong, all the discussion we are having here is useful, it is just that in my humble opinion, some discussions are more useful than others if this standard into which I am heavily investing is to go forward. -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111206/b6cb5b37/attachment.html
Could YAML replace dADL as human readable AOM serialization format?
Yeah I was also wondering what is the driver/motivation/aspiration behind using JSON, YAML etc. instead of good old ADL? Is this to do with making openEHR easier to digest for the 'traditional' IT community because perhaps they don't want to let go everything at once and leverage some existing skills like these? I also think that we as a community should look at getting more organised and get our efforts in tune as I know that quite interesting and though times are about to come... Cheers, -koray From: openehr-technical-bounces at openehr.org [mailto:openehr-technical-boun...@openehr.org] On Behalf Of Stef Verlinden Sent: Wednesday, 7 December 2011 1:01 a.m. To: For openEHR technical discussions Subject: Re: Could YAML replace dADL as human readable AOM serialization format? +1 Cheers, Stef Op 6 dec. 2011, om 12:44 heeft Seref Arikan het volgende geschreven: Please do not get me wrong, all the discussion we are having here is useful, it is just that in my humble opinion, some discussions are more useful than others if this standard into which I am heavily investing is to go forward. -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111206/40e2c36b/attachment.html
Could YAML replace dADL as human readable AOM serialization format?
I think previously I had indicated I had no problem with the stringified interval approach in XML, but I am reverting my thinking on this and feel that it would be counter intuitive for those who what to use the XML schemas for code generation purposes. I think in this case the computable requirement outweighs the human readable requirement. I think we can come up with a much more concise representation of these intervals without compromising the computable requirement, something similar to XML schema maxOccurs/minOccurs. Heath please everyone remember that the dADL, JSON and XML generated from AWB all currently use the stringified expression of cardinality / occurrences / existence. Now, these are usually the most numerous constraints in an archetype and if expressed in the orthodox way, take up 6 lines of text, hence the giant files (e.g. AOM 1.4 based XML we currently use) - and thus the much reduced files you see on Erik's page, because we are using ADL 1.5 flavoured serialisations not the ADL 1.4 one. Now, I think we should probably go with the stringified form in all of these formalisms. The cost of doing this is a small micro-parser, but it is the same microparser for everyone, which seems attractive to me. The alternative that Erik mentioned was more native, but still efficient interval expressions, e.g. dADL has it built in (0..* is |=0| in dADL), and YAML and JSON could probably be persuaded to make some sort of array of integer-like things be used. XML still doesn't have any such support. In theory this approach would be the best if each syntax supported it properly, but XML doesn't at all, and the others don't support Intervals with unbounded upper limit (i.e. the '*' in '0..*'). But Erik's exercise certainly proved that efficient representation of the humble Interval Integer is actually worthwhile. (Once again thanks for that page, its quite a good way to get a good feel for these syntaxes very quickly). - thomas -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111205/e752fdcd/attachment.html
Could YAML replace dADL as human readable AOM serialization format?
Hi All I am going to say it once more: If there is an expression on occurrences of '0..*' anywhere in ADL then it is an error, for that is not a constraint - and can only be wrong (ie the RM may have a narrower constraint). We just need a max int and a min int - both optional. I won't say it again - but it does keep it simple and it is correct! Cheers, Sam From: openehr-technical-boun...@openehr.org [mailto:openehr-technical-bounces at openehr.org] On Behalf Of Heath Frankel Sent: Monday, 5 December 2011 8:40 AM To: 'For openEHR technical discussions' Subject: RE: Could YAML replace dADL as human readable AOM serialization format? I think previously I had indicated I had no problem with the stringified interval approach in XML, but I am reverting my thinking on this and feel that it would be counter intuitive for those who what to use the XML schemas for code generation purposes. I think in this case the computable requirement outweighs the human readable requirement. I think we can come up with a much more concise representation of these intervals without compromising the computable requirement, something similar to XML schema maxOccurs/minOccurs. Heath please everyone remember that the dADL, JSON and XML generated from AWB all currently use the stringified expression of cardinality / occurrences / existence. Now, these are usually the most numerous constraints in an archetype and if expressed in the orthodox way, take up 6 lines of text, hence the giant files (e.g. AOM 1.4 based XML we currently use) - and thus the much reduced files you see on Erik's page, because we are using ADL 1.5 flavoured serialisations not the ADL 1.4 one. Now, I think we should probably go with the stringified form in all of these formalisms. The cost of doing this is a small micro-parser, but it is the same microparser for everyone, which seems attractive to me. The alternative that Erik mentioned was more native, but still efficient interval expressions, e.g. dADL has it built in (0..* is |=0| in dADL), and YAML and JSON could probably be persuaded to make some sort of array of integer-like things be used. XML still doesn't have any such support. In theory this approach would be the best if each syntax supported it properly, but XML doesn't at all, and the others don't support Intervals with unbounded upper limit (i.e. the '*' in '0..*'). But Erik's exercise certainly proved that efficient representation of the humble Interval Integer is actually worthwhile. (Once again thanks for that page, its quite a good way to get a good feel for these syntaxes very quickly). - thomas -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111205/21dc776f/attachment.html
Could YAML replace dADL as human readable AOM serialization format?
and if you want to express something like 'a set with all the past test results for this patient' (that could have none)? it would be a constraint as you are only allowing some kinds of entries (children of a certain Snomed code for example) 2011/12/5 Sam Heard sam.heard at oceaninformatics.com: Hi All I am going to say it once more: If there is an expression on occurrences of ?0..*? anywhere in ADL then it is an error, for that is not a constraint ? and can only be wrong (ie the RM may have a narrower constraint). We just need a max int and a min int ? both optional. I won?t say it again ? but it does keep it simple and it is correct! Cheers, Sam From: openehr-technical-bounces at openehr.org [mailto:openehr-technical-bounces at openehr.org] On Behalf Of Heath Frankel Sent: Monday, 5 December 2011 8:40 AM To: 'For openEHR technical discussions' Subject: RE: Could YAML replace dADL as human readable AOM serialization format? I think previously I had indicated I had no problem with the stringified interval approach in XML, but I am reverting my thinking on this and feel that it would be counter intuitive for those who what to use the XML schemas for code generation purposes.? I think in this case the computable requirement outweighs the human readable requirement.? I think we can come up with a much more concise representation of these intervals without compromising the computable requirement, something similar to XML schema maxOccurs/minOccurs. Heath please everyone remember that the dADL, JSON and XML generated from AWB all currently use the stringified expression of cardinality / occurrences / existence. Now, these are usually the most numerous constraints in an archetype and if expressed in the orthodox way, take up 6 lines of text, hence the giant files (e.g. AOM 1.4 based XML we currently use) - and thus the much reduced files you see on Erik's page, because we are using ADL 1.5 flavoured serialisations not the ADL 1.4 one. Now, I think we should probably go with the stringified form in all of these formalisms. The cost of doing this is a small micro-parser, but it is the same microparser for everyone, which seems attractive to me. The alternative that Erik mentioned was more native, but still efficient interval expressions, e.g. dADL has it built in (0..* is |=0| in dADL), and YAML and JSON could probably be persuaded to make some sort of array of integer-like things be used. XML still doesn't have any such support. In theory this approach would be the best if each syntax supported it properly, but XML doesn't at all, and the others don't support Intervals with unbounded upper limit (i.e. the '*' in '0..*'). But Erik's exercise certainly proved that efficient representation of the humble Interval Integer is actually worthwhile. (Once again thanks for that page, its quite a good way to get a good feel for these syntaxes very quickly). - thomas ___ openEHR-technical mailing list openEHR-technical at openehr.org http://lists.chime.ucl.ac.uk/mailman/listinfo/openehr-technical
Could YAML replace dADL as human readable AOM serialization format?
On 05/12/2011 00:23, Sam Heard wrote: Hi All I am going to say it once more: If there is an expression on occurrences of '0..*' anywhere in ADL then it is an error, for that is not a constraint -- and can only be wrong (ie the RM may have a narrower constraint). We just need a max int and a min int -- both optional. I won't say it again -- but it does keep it simple and it is correct! sure but constraints like 1..* or 2..* are perfectly valid (and do occur), and are of the same structure, so i don't think this changes anything. - thomas -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111205/78a81e1d/attachment.html
Could YAML replace dADL as human readable AOM serialization format?
Hi! On Mon, Dec 5, 2011 at 00:10, Heath Frankel heath.frankel at oceaninformatics.com wrote: I think previously I had indicated I had no problem with the stringified interval approach in XML, but I am reverting my thinking on this and feel that it would be counter intuitive for those who what to use the XML schemas for code generation purposes. I think in this case the computable requirement outweighs the human readable requirement. You are probably right regarding XML, and maybe this is valid also for most JSON use-cases where the desire for an as simple as possible object-serialization-mapping outweighs human readability. I think the openEHR community is best served by having different archetype serialization format categories with different priorities for different purposes. E.g.: 1a. An XML format optimized for mapping to XML-schema generated code. 1b. A JSON format optimized for mapping to AOM object models handcrafted or generated from AOM-specifications. 2. A cADL-variant wrapped in YAML optimized for human readability. It could be used for archetype files stored in version control systems (making version diffs readable) and as textual format when you need textual examples in documentation, teaching etc. In 1a 1b easy implementation should be prioritized over readability but in #2 human readability should be prioritized. Prioritizing both in the same format would likely fail. Things like default ordering of nodes and attributes could be recommended but optional for #1 but should be mandatory for #2 (otherwise readability suffers and diffs get messed up). I think we can come up with a much more concise representation of these intervals without compromising the computable requirement, something similar to XML schema maxOccurs/minOccurs. Probably, but for #1 maybe being close to the AOM should be prioritized over being concise. After all, archetypes will not be sent over the wire at the same scale as patient data (RM instances). By the way, is the AOM open for changes (like renaming attributes) if that would increase clarity? If we would change subject and discuss RM instance serialization, then binary formats (like Protobuf and Thrift) could form a third category where message size and speed of conversion would be prioritized over ease of implementation or readability. XML and JSON would likely be good to have also for interoperability and debugging purposes. YAML for the RM would not be an obvious over the wire-format, but can be very useful for compact human readable long term EHR archiving storage as plain text files and for documentation examples. Best regards, Erik Sundvall erik.sundvall at liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733 please everyone remember that the dADL, JSON and XML generated from AWB all currently use the stringified expression of cardinality / occurrences / existence. Now, these are usually the most numerous constraints in an archetype and if expressed in the orthodox way, take up 6 lines of text, hence the giant files (e.g. AOM 1.4 based XML we currently use) - and thus the much reduced files you see on Erik's page, because we are using ADL 1.5 flavoured serialisations not the ADL 1.4 one. Now, I think we should probably go with the stringified form in all of these formalisms. The cost of doing this is a small micro-parser, but it is the same microparser for everyone, which seems attractive to me. The alternative that Erik mentioned was more native, but still efficient interval expressions, e.g. dADL has it built in (0..* is |=0| in dADL), and YAML and JSON could probably be persuaded to make some sort of array of integer-like things be used. XML still doesn't have any such support. In theory this approach would be the best if each syntax supported it properly, but XML doesn't at all, and the others don't support Intervals with unbounded upper limit (i.e. the '*' in '0..*'). * * But Erik's exercise certainly proved that efficient representation of the humble Interval Integer is actually worthwhile. (Once again thanks for that page, its quite a good way to get a good feel for these syntaxes very quickly).* * - thomas -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111205/e7161629/attachment.html
Could YAML replace dADL as human readable AOM serialization format?
Hi Erik, I'll repeat a point I've tried to make before, since it is relevant in the context of binary serialization. I've used protocol buffers serialization of AOM in Bosphorus (I'll put the source code under Opereffa's svn soon, it appears I don't even have time to clean it up) These are very fast, but much more simplistic formalisms to represent data. You can use them to improve the performance of many things, but you'll be writing a lot of code, and you'll have to find non standard ways of dealing with the simplicity of the formalism. Here is the simplest example from Bosphorus: Eiffel is an object oriented language, Java is also an object oriented language. openEHR specs use interitance, which is reflected into type hierarchies of both Eiffel and Java classes. You have the protocol buffers language which does not support inheritance. How do you represent instances of abstract types in protocol buffers? How do you read/write them from/to Eiffel/Java? I've done these in my own way, but it will be a problem every time someone uses formalisms which are not designed for oo languages and frameworks. In a way, it is a conceptual distance from OO. Every alternative mentioned here is at a particular position to a particular level of OO support (take it as a point in a multidimensional space). Every alternative has values higher than the rest in a particular dimension, but none of them is absolutely closer to the OO support point (represented by Java/Eiffel/C#/Python etc) In my opinion, without this evaluation of OO support, which is what we use in the actual languages of system development, other discussions are not really relevant. What if protocol buffers are fast? What if YAML, ADL, or JSON are easier to read, space efficient? Maybe I'm being too rigid about this particular issue, but the programming language, its tools and frameworks built on it is what determines industry adoption more than everything else today. I don't think this is being considered in these discussions, but that is just me. Kind regards Seref On Mon, Dec 5, 2011 at 11:36 AM, Erik Sundvall erik.sundvall at liu.se wrote: Hi! On Mon, Dec 5, 2011 at 00:10, Heath Frankel heath.frankel at oceaninformatics.com wrote: I think previously I had indicated I had no problem with the stringified interval approach in XML, but I am reverting my thinking on this and feel that it would be counter intuitive for those who what to use the XML schemas for code generation purposes. I think in this case the computable requirement outweighs the human readable requirement. You are probably right regarding XML, and maybe this is valid also for most JSON use-cases where the desire for an as simple as possible object-serialization-mapping outweighs human readability. I think the openEHR community is best served by having different archetype serialization format categories with different priorities for different purposes. E.g.: 1a. An XML format optimized for mapping to XML-schema generated code. 1b. A JSON format optimized for mapping to AOM object models handcrafted or generated from AOM-specifications. 2. A cADL-variant wrapped in YAML optimized for human readability. It could be used for archetype files stored in version control systems (making version diffs readable) and as textual format when you need textual examples in documentation, teaching etc. In 1a 1b easy implementation should be prioritized over readability but in #2 human readability should be prioritized. Prioritizing both in the same format would likely fail. Things like default ordering of nodes and attributes could be recommended but optional for #1 but should be mandatory for #2 (otherwise readability suffers and diffs get messed up). I think we can come up with a much more concise representation of these intervals without compromising the computable requirement, something similar to XML schema maxOccurs/minOccurs. Probably, but for #1 maybe being close to the AOM should be prioritized over being concise. After all, archetypes will not be sent over the wire at the same scale as patient data (RM instances). By the way, is the AOM open for changes (like renaming attributes) if that would increase clarity? If we would change subject and discuss RM instance serialization, then binary formats (like Protobuf and Thrift) could form a third category where message size and speed of conversion would be prioritized over ease of implementation or readability. XML and JSON would likely be good to have also for interoperability and debugging purposes. YAML for the RM would not be an obvious over the wire-format, but can be very useful for compact human readable long term EHR archiving storage as plain text files and for documentation examples. Best regards, Erik Sundvall erik.sundvall at liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733 please everyone remember that the dADL, JSON and XML generated from AWB all currently use the
Could YAML replace dADL as human readable AOM serialization format?
Hi Seref! On Mon, Dec 5, 2011 at 13:32, Seref Arikan serefarikan at kurumsalteknoloji.com wrote: I'll repeat a point I've tried to make before, since it is relevant in the context of binary serialization. I've used protocol buffers serialization of AOM in Bosphorus Why do you use binary serialization for AOM? (Just curious, I thought text formats would cater for most AOM use cases.) I have not looked deeply into protobuf so I'll take your word on the lack of OO support. Looking at http://wiki.apache.org/thrift/ThriftTypes their Structs also seem to lack inheritance. So I'll try to keep quiet about cross-platform binary formats at least until I have tried applying any of them to openEHR for real. ... you'll have to find non standard ways of dealing with the simplicity of the formalism. For JSON I would agree that the formalism is sometimes too simple and one may need to make an openEHR specification for how to convey object type when needed, perhaps inspired by something like - http://flexjson.sourceforge.net/ that adds a class attribute or - by exploring if introspection of the target object type like http://code.google.com/p/google-gson/ does is enough for openEHR data. Here is the simplest example from Bosphorus: Eiffel is an object oriented language, Java is also an object oriented language. openEHR specs use interitance, which is reflected into type hierarchies of both Eiffel and Java classes. You have the protocol buffers language which does not support inheritance. How do you represent instances of abstract types in protocol buffers? Sorry if I'm dense, but when do you need to instantiate abstract types in RM data? In a way, it is a conceptual distance from OO. Every alternative mentioned here is at a particular position to a particular level of OO support (take it as a point in a multidimensional space). Every alternative has values higher than the rest in a particular dimension, but none of them is absolutely closer to the OO support point (represented by Java/Eiffel/C#/Python etc) In my opinion, without this evaluation of OO support, which is what we use in the actual languages of system development, other discussions are not really relevant. What if protocol buffers are fast? What if YAML, ADL, or JSON are easier to read, space efficient? Do you bundle YAML and XML into that opinion (lacking of OO-support the same way as protobuf)? Do you think that dADL can carry everything needed for openEHR (both AM and RM)? If so why wouldn't YAML? What in basic dADL semantics is missing in YAML? YAML (using a !-prefixed syntax) and partly XML (using e.g. xsi:Type) have ways of conveying object type in the case it cannot be inferred from data. Maybe I'm being too rigid about this particular issue, but the programming language, its tools and frameworks built on it is what determines industry adoption more than everything else today. I don't think this is being considered in these discussions, but that is just me. I guess language-specific binary formats (like serialized java objects) may be better for binary representation then. Thanks for the word of warning regarding protobuf. Do you think that all openEHR instance serializations really need to be object oriented themselves or is it enough that the classes of the receiving application are object oriented and that the deserialization code (or the transfer format) is clever enough to put the data into the right objects? There are some cases where different openEHR datatypes may have the same attribute signature and for those cases even transport formats aiming reduce verbosity will need to explicitly declare class type since they cannot be safely inferred. Best regards, Erik Sundvall erik.sundvall at liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733 On Mon, Dec 5, 2011 at 11:36 AM, Erik Sundvall erik.sundvall at liu.sewrote: Hi! On Mon, Dec 5, 2011 at 00:10, Heath Frankel heath.frankel at oceaninformatics.com wrote: I think previously I had indicated I had no problem with the stringified interval approach in XML, but I am reverting my thinking on this and feel that it would be counter intuitive for those who what to use the XML schemas for code generation purposes. I think in this case the computable requirement outweighs the human readable requirement. You are probably right regarding XML, and maybe this is valid also for most JSON use-cases where the desire for an as simple as possible object-serialization-mapping outweighs human readability. I think the openEHR community is best served by having different archetype serialization format categories with different priorities for different purposes. E.g.: 1a. An XML format optimized for mapping to XML-schema generated code. 1b. A JSON format optimized for mapping to AOM object models handcrafted or generated from AOM-specifications. 2. A cADL-variant wrapped in YAML optimized for human readability. It could be used for
Could YAML replace dADL as human readable AOM serialization format?
On 01/12/2011 21:37, Erik Sundvall wrote: Hi! Let the battle begin :-) see: http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html http://www.imt.liu.se/%7Eerisu/2011/AOM-beauty-contest.html nice page - that's quite fun to see them all pasted up there. My question is: what's the/your purpose for human readability. Is it: * education e.g. in some kind of class-room / training situation * debugging * self-learning * something else Just a question On Tue, Nov 22, 2011 at 13:24, Thomas Beale thomas.beale at oceaninformatics.com mailto:thomas.beale at oceaninformatics.com wrote: actually, ADL 2.0 as reported in this document is now obsolete. The ADL 1.5 compiler already does this, and will use it as a fast save/retrieve format. Will cADL become optional or go away somehow? its not my intention. To be honest, I am not sure if a streaming cADL parser that knows it is parsing guranteed correct cADL might not be faster than the equivalent dADL parser for the archetype definition. But either way, cADL is a notation that really gives you a direct feel for the implicated semantics, so for understanding what you are looking at it has to be better. dADL / XML / JSON etc don't give you a direct picture, they give you a serialised object picture from which your brain has to infer an object structure (but admittedly this is unambiguous, so your brain will probably get it right). In my view 'proper syntax' is nicer for direct comprehension and therefore learning. One area where dADL beats JSON and YAML (I think) is its better support for Xpath-like paths. Why would that be different? I guess most path queries will run on instantiated object trees rather than on documents and then there is no difference - and if paths were run directly on documents, then please explain why dADL would support them better. Looking at the JSON again, I might have to eat my words... I guess if the attribute names / hash tags are turned into Xpath predicates the implied set of paths has to be the same. Plus its much more compact than JSON. Much? Less noisy I would agree to though. Personally I find YAML hard to read because there are so many syntax elements (triple '-', triple '.' etc) but that might just be me. Have a look at... http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html http://www.imt.liu.se/%7Eerisu/2011/AOM-beauty-contest.html ...again. The triple '-' and triple '.' are (mostly optional) start and end markers of documents that make life easier when concatenating streams/documents, see the YAML specification. Am I the only one that thinks YAML is more readable than dADL? when I get a moment I will add YAML to the serialiser club in the tool and we can then see if proper YAML is is or isn't better to read (I am assuming that it will be somewhat different from the inferred YAML you generated with that web tool). I think 'readability' is starting to come down to congitive and linguistic / semiotic issues, which is very interestinng. There may be no objective answer to this question; if there is it will be interesting to know what the criteria are. Nice work on the contest! - thomas -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111202/9a958ca7/attachment.html
Could YAML replace dADL as human readable AOM serialization format?
Thanks Erik, Interesting to see the line up. Can't believe that XML wasn't the longest file in the list, that kills one of the arguments for JSON vs XML. For someone that is not aware of YAML, are the white space significant. If so, this kinds of kills it for me, otherwise for a Human reader its fairly natural to read without lots of brackets of various kinds. Heath From: openehr-technical-boun...@openehr.org [mailto:openehr-technical-bounces at openehr.org] On Behalf Of Erik Sundvall Sent: Friday, 2 December 2011 8:07 AM To: For openEHR technical discussions Subject: Re: Could YAML replace dADL as human readable AOM serialization format? Hi! Let the battle begin :-) see: http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html On Tue, Nov 22, 2011 at 13:24, Thomas Beale thomas.beale at oceaninformatics.com wrote: actually, ADL 2.0 as reported in this document is now obsolete. The ADL 1.5 compiler already does this, and will use it as a fast save/retrieve format. Will cADL become optional or go away somehow? One area where dADL beats JSON and YAML (I think) is its better support for Xpath-like paths. Why would that be different? I guess most path queries will run on instantiated object trees rather than on documents and then there is no difference - and if paths were run directly on documents, then please explain why dADL would support them better. Plus its much more compact than JSON. Much? Less noisy I would agree to though. Personally I find YAML hard to read because there are so many syntax elements (triple '-', triple '.' etc) but that might just be me. Have a look at... http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html ...again. The triple '-' and triple '.' are (mostly optional) start and end markers of documents that make life easier when concatenating streams/documents, see the YAML specification. Am I the only one that thinks YAML is more readable than dADL? Best regards, Erik Sundvall erik.sundvall at liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733 -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111202/92bf7a06/attachment.html
Could YAML replace dADL as human readable AOM serialization format?
On Thu, Dec 1, 2011 at 22:37, Erik Sundvall erik.sundvall at liu.se wrote: Hi! Let the battle begin :-) see: http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html Hi Erik, is the Javascript Object Dump missing regexps for 'address' and 'electronic_communications'? Or is that irrelevant? In the YAML, some comma separated key-value pairs are condensed into 1 line; it would be nicer if they could all be on their own line: makes it lengthier, but more readable and a fairer comparison to the other formats. Cheers, Roger
Could YAML replace dADL as human readable AOM serialization format?
On 02/12/2011 01:35, Heath Frankel wrote: Thanks Erik, Interesting to see the line up. Can't believe that XML wasn't the longest file in the list, that kills one of the arguments for JSON vs XML. For someone that is not aware of YAML, are the white space significant. If so, this kinds of kills it for me, otherwise for a Human reader its fairly natural to read without lots of brackets of various kinds. Heath * *please everyone remember that the dADL, JSON and XML generated from AWB all currently use the stringified expression of cardinality / occurrences / existence. Now, these are usually the most numerous constraints in an archetype and if expressed in the orthodox way, take up 6 lines of text, hence the giant files (e.g. AOM 1.4 based XML we currently use) - and thus the much reduced files you see on Erik's page, because we are using ADL 1.5 flavoured serialisations not the ADL 1.4 one. Now, I think we should probably go with the stringified form in all of these formalisms. The cost of doing this is a small micro-parser, but it is the same microparser for everyone, which seems attractive to me. The alternative that Erik mentioned was more native, but still efficient interval expressions, e.g. dADL has it built in (0..* is |=0| in dADL), and YAML and JSON could probably be persuaded to make some sort of array of integer-like things be used. XML still doesn't have any such support. In theory this approach would be the best if each syntax supported it properly, but XML doesn't at all, and the others don't support Intervals with unbounded upper limit (i.e. the '*' in '0..*'). * * But Erik's exercise certainly proved that efficient representation of the humble Interval Integer is actually worthwhile. (Once again thanks for that page, its quite a good way to get a good feel for these syntaxes very quickly).* * - thomas * * -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111202/2faed937/attachment.html
Could YAML replace dADL as human readable AOM serialization format?
Hi! Let the battle begin :-) see: http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html On Tue, Nov 22, 2011 at 13:24, Thomas Beale thomas.beale at oceaninformatics.com wrote: actually, ADL 2.0 as reported in this document is now obsolete. The ADL 1.5 compiler already does this, and will use it as a fast save/retrieve format. Will cADL become optional or go away somehow? One area where dADL beats JSON and YAML (I think) is its better support for Xpath-like paths. Why would that be different? I guess most path queries will run on instantiated object trees rather than on documents and then there is no difference - and if paths were run directly on documents, then please explain why dADL would support them better. Plus its much more compact than JSON. Much? Less noisy I would agree to though. Personally I find YAML hard to read because there are so many syntax elements (triple '-', triple '.' etc) but that might just be me. Have a look at... http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html ...again. The triple '-' and triple '.' are (mostly optional) start and end markers of documents that make life easier when concatenating streams/documents, see the YAML specification. Am I the only one that thinks YAML is more readable than dADL? Best regards, Erik Sundvall erik.sundvall at liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733 -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20111201/2dfec204/attachment.html
Could YAML replace dADL as human readable AOM serialization format?
Hi! A little suggestion/thought (that might be of value also for CIMI-folks and others looking at archetyping using ADL and AOM and wondering if a specific language is needed). *Limitations:* For efficient handling of RM (Reference Model) instances (patient data) flying back and forth between systems you'd probably want some binary format (protobuf http://code.google.com/p/protobuf/, thrift datatypeshttp://thrift.apache.org/, serialized Java objects or whatever), this is NOT what this suggestion is about. For development and debugging RM-instance exchange you may also want some fairly human-readable serialization that is supported by many platforms (Like JSON http://www.json.org/, YAML http://www.yaml.org/, XML or whatever) this is NOT what the suggestion is about either. Also note that the current suggestion only aims at looking for replacement of dADL not cADL. Also note that the AOM and XML serialisations of the AOM are not affected by this suggestion. *Background:* cADL (Constraint ADL) is a compact DSLhttp://en.wikipedia.org/wiki/Domain-specific_language that is aimed at defining constraints on an object model, while dADL (Data ADL) on the other hand is mainly a general object-graph serialization format. If I understand section 1.7.5 in the ADL 1.5 spechttp://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/adl1.5.pdfcorrectly, ADL 2.0 will allow the option to define *all *parts of an archetype (including what is now done in cADL) as a dADL serialization of the AOMhttp://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/aom1.5.pdf(Archetype Object Model). Is that correct Tom? *Suggestion:* Investigate if YAML can replace or complement dADL as object-graph serialization format for archetypes. (Perhaps there is interest from people using an openEHR AOM implementation in a language that already has YAML serializers to make a quick experiment?) *Motivation:* - YAML parsers converting YAML documents to native object graphs already exist for a number of languages http://www.yaml.org/ (C/C++, Ruby, Python, Java, Perl, C#/.NET, PHP, OCaml, Javascript, Actionscript, Haskell) so there would be less work creating and maintaining archetype parsers that turn archetype files into in-memory object graphs. (If you write an archetype authoring tool an need to validate archetypes, not just instantiate already validated archetypes, then the Validity Rules (such as the ones in blue under 4.3.1.1 in the AOM spec.) will of course still need to be implemented in software. - Having an archetype specific object-serialization language like dADL might make archetyping look more mysterious and suspect and might hide the fact that the semantics expressed in the AOM is the interesting thing that can be serialised in many different ways. - And (admittedly subjective) YAML lists and objects look slightly better and more readable than dADL. A notable exception is probably intervals/ranges that have a compact representation in dADL (see section 4.5.2 of the ADL 1.5 spechttp://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/adl1.5.pdf) but not natively in YAML. *Observations:* YAML is extensible, so data types for intervals etc can be added like in http://yaml.org/YAML_for_ruby.html#ranges, also see discussion at http://stackoverflow.com/questions/3337020/how-to-specify-ranges-in-yaml. A similar approach could be taken to dADLs Plug-in Syntaxes (see section 4.6http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/adl1.5.pdf) using YAML. A number of language-independent extra YAML datatypes (timestamp http://yaml.org/type/timestamp.htmlfor example) are listed at http://yaml.org/type/index.html and you can define your own if you need more. It seems like specification 1.1 (http://yaml.org/spec/1.1/) is the most implemented, so any dADL comparisons should probably be done towards that version to be fair. Best regards, Erik Sundvall erik.sundvall at liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733 P.s. Tom Beale and I sort of started a brief off-list discussion about YAML, here is now an attempt to get input from more people. -- next part -- An HTML attachment was scrubbed... URL: http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/2022/a48d185a/attachment.html
Could YAML replace dADL as human readable AOM serialization format?
On 22/11/2011 11:51, Erik Sundvall wrote: Hi! A little suggestion/thought (that might be of value also for CIMI-folks and others looking at archetyping using ADL and AOM and wondering if a specific language is needed). *Limitations:* For efficient handling of RM (Reference Model) instances (patient data) flying back and forth between systems you'd probably want some binary format (protobuf http://code.google.com/p/protobuf/, thrift datatypes http://thrift.apache.org/, serialized Java objects or whatever), this is NOT what this suggestion is about. For development and debugging RM-instance exchange you may also want some fairly human-readable serialization that is supported by many platforms (Like JSON http://www.json.org/, YAML http://www.yaml.org/, XML or whatever) this is NOT what the suggestion is about either. Also note that the current suggestion only aims at looking for replacement of dADL not cADL. Also note that the AOM and XML serialisations of the AOM are not affected by this suggestion. *Background:* cADL (Constraint ADL) is a compact DSL http://en.wikipedia.org/wiki/Domain-specific_language that is aimed at defining constraints on an object model, while dADL (Data ADL) on the other hand is mainly a general object-graph serialization format. If I understand section 1.7.5 in the ADL 1.5 spec http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/adl1.5.pdf correctly, ADL 2.0 will allow the option to define *all *parts of an archetype (including what is now done in cADL) as a dADL serialization of the AOM http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/aom1.5.pdf (Archetype Object Model). Is that correct Tom? actually, ADL 2.0 as reported in this document is now obsolete. The ADL 1.5 compiler already does this, and will use it as a fast save/retrieve format. See below for example, or download the current release of the ADL Workbench to play. I am intending to document the 'P_' classes on which this serialisation is based, and on which I think any JSON / YAML / XML serialisation should be based - when we can agree on it. It is in these classes that things like occurrences are changed from MULTIPLICITY_INTERVAL to String. *Suggestion:* Investigate if YAML can replace or complement dADL as object-graph serialization format for archetypes. (Perhaps there is interest from people using an openEHR AOM implementation in a language that already has YAML serializers to make a quick experiment?) My motivation for making pure dADL archetypes is to have a fast, efficient serialisation of the object graph of an archteype, so that when an archetype compiles successfully, it can be saved in this form and later retrieved, bypassing the ADL compiler. The value in this is that formats like dADL / JSON / YAML are low-level graph serialisations, and that really fast parsers can be written for them for use on persisted files */known to be correct /*(i.e. generated by a serialiser in a previous save). My own dADL parser is not such a fast parser, but that's only a matter of time ;-) So the same arguments would apply to JSON or YAML in my view. At least for this purpose (fast save retrieve of previously compiled archetypes), any such format could be used. *Motivation:* * YAML parsers converting YAML documents to native object graphs already exist for a number of languages http://www.yaml.org/ (C/C++, Ruby, Python, Java, Perl, C#/.NET, PHP, OCaml, Javascript, Actionscript, Haskell) so there would be less work creating and maintaining archetype parsers that turn archetype files into in-memory object graphs. (If you write an archetype authoring tool an need to validate archetypes, not just instantiate already validated archetypes, then the Validity Rules (such as the ones in blue under 4.3.1.1 in the AOM spec.) will of course still need to be implemented in software. * Having an archetype specific object-serialization language like dADL might make archetyping look more mysterious and suspect and might hide the fact that the semantics expressed in the AOM is the interesting thing that can be serialised in many different ways. * And (admittedly subjective) YAML lists and objects look slightly better and more readable than dADL. A notable exception is probably intervals/ranges that have a compact representation in dADL (see section 4.5.2 of the ADL 1.5 spec http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/adl1.5.pdf) but not natively in YAML. *Observations:* YAML is extensible, so data types for intervals etc can be added like in http://yaml.org/YAML_for_ruby.html#ranges, also see discussion at http://stackoverflow.com/questions/3337020/how-to-specify-ranges-in-yaml. A similar approach could be taken to dADLs Plug-in Syntaxes (see section 4.6