On 03/04/2014 09:21, Athanasios Anastasiou wrote: > > > POINTS OF NOTICE REGARDING ADL 1.5: > > 1) Are there any ADL 1.5-specific files available out there for testing > purposes?
Hopefully the starting point for all resources is now here <http://www.openehr.org/wiki/pages/viewpage.action?pageId=196633>. Regression test archetypes are here <https://github.com/openEHR/adl-archetypes/tree/master/ADL15-reference>. > > 2) How significant is whitespace such as "[ \t\n\r]*" for ADL? > If no assumptions are made (like, "New line marks the start of a new > statement"), i would like to include a universal rule that skips such > whitespace. it isn't. However, the current tools in some places assume that the outer level of keywords - the section names 'languages', 'description', 'definition' etc are all against the left hand edge of the file and that no other content is. This is to make it easier to detect the separate sections in a simple way, so that the 'description' section keyword is not mistaken for the word 'description' elsewhere. Of course this is a hack, and no-one should repeat it, but I'm mentioning it here, since you will see it in the ADL lexical scanner spec <https://github.com/openEHR/adl-tools/blob/master/components/adl_compiler/src/syntax/adl/parser/adl_15_scanner.l>. At some point, I'll remove it, hopefully copying some nicer production rules that someone else comes up with. > > 3) Is ID_CODE_LEADER always going to be 'id'? Is it then considered a > SYMbol? good question. Although the usual computer science thinking is: make everything generic all the time, in this case, the precise reason for these code 'leaders' (i.e. 'id', 'ac', 'at') is to separate codes into different semantic groups (i.e. ids, value set codes and value codes). In the current conception of ADL I think they should be preserved, because they make parsing (and error reporting) much easier, and make archetypes much easier to read. In the long term future, I think we might move to a different system where all the codes are external, and archetypes have a managed online terminology. We are quite some way off from that technology, and I would therefore assume that moving to it, if we ever get there, means a /conversion/ of existing archetypes. Consequently, I think the current coding approach should be treated as reliable for now (this is not to say that the current system can't be improved). > > 4) There is something odd about the definition of > V_ISO8601_DURATION_CONSTRAINT_PATTERN. The definition seems to include a > trailing '}' but then the parser puts it back in the stream. In the > ANTLR definition i have omitted the '}', would this be a problem? Can we > clarify this rule a bit? I guess you mean this rule: ----------/* V_ISO8601_DURATION_CONSTRAINT_PATTERN */ ------------------------------------------------- -- the following is an erroroneous form of the one below to cope with the AE bug that causes a duration -- constraint of the form {pattern/} to be written out, i.e. trailing '/'. For now we provide a dedicated -- syntax error on this, so at least modellers know what to fix P[yY]?[mM]?[Ww]?[dD]?(T[hH]?[mM]?[sS]?)?\/\} { last_token := V_ISO8601_DURATION_CONSTRAINT_PATTERN_ERR unread_character(last_string_value.item(last_string_value.count)) -- put back the last character last_string_value := text } This is a scanner rule to get around a bug in the Archetype Editor which may or may not still exist. You should not replicate this one, or any other rule that has similar comments! > > 5) Some rules contain comments such as: "rule to be removed once > archetypes containing "T" are gone"...Are these archetypes gone by now? > Can i clean up those rules? yes, I would not replicate any such rules. I have not had the time to determine which tool errors have been fixed, but in any case, I think you should go 'clean'. > > 6) Throughout the yacc definitions there are some conditionals whose > purpose i do not entirely understand. For example, during the definition > of V_REGEXP, there are conditional definitions for each constituent part > of a regular expression. Why is this? Same goes for V_STRING, > V_CADL_TEXT, V_RULES_TEXT, V_ODIN_TEXT. At the moment, i am matching > these > with a non-greedy operator. Would this be a problem? for example this ----------/* V_REGEXP */ ------------------------------------------------- "{/" { last_token := SYM_START_CBLOCK set_start_condition (IN_REGEXP1) in_buffer.append_character ('/') } <IN_REGEXP1> { [^/[]* { -- match segment consisting of non / or [ in_buffer.append_string (text) } "["[^]]*"]" { -- match [] segment in_buffer.append_string (text) } [^/]*\\\/ { -- match segment ending in quoted slashes '\/' in_buffer.append_string (text) } [^/[]*"/" { -- match final segment in_buffer.append_string (text) create str_.make (in_buffer.count) str_.append_string (in_buffer) in_buffer.wipe_out last_string_value := str_ last_token := V_REGEXP set_start_condition (INITIAL) } } \^[^^\n]*\^ { -- regexp formed using '^' delimiters last_token := V_REGEXP last_string_value := text } This kind of thing is a pretty standard approach for dealing with strings, regexes and any other chunks of content that can easily contain normal keywords and syntax inside, but where of course the syntax has no meaning, so you don't want to hit any of the normal rules for {}, keywords etc. So you need to include a way to consume these chunks to the right end point (and don't forget in Strings, that means going past quoted " to get to the real "). I don't know what way this is done in Antlr, but there must be a standard way to replicate it. In general, don't be afraid to find a better scanner or production rule approach than what you see in the current compiler. Some of those rules are very old now. - thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.openehr.org/pipermail/openehr-technical_lists.openehr.org/attachments/20140403/1418e4df/attachment-0001.html>

