ADL 1.5 ANTLR definitions...and a few questions.

Thomas Beale Thu, 03 Apr 2014 10:05:09 +0100

On 03/04/2014 09:21, Athanasios Anastasiou wrote:
>
>
> POINTS OF NOTICE REGARDING ADL 1.5:
>
> 1) Are there any ADL 1.5-specific files available out there for testing
> purposes?


Hopefully the starting point for all resources is now here 
<http://www.openehr.org/wiki/pages/viewpage.action?pageId=196633>. 
Regression test archetypes are here 
<https://github.com/openEHR/adl-archetypes/tree/master/ADL15-reference>.

>
> 2) How significant is whitespace such as "[ \t\n\r]*" for ADL?
> If no assumptions are made (like, "New line marks the start of a new
> statement"), i would like to include a universal rule that skips such
> whitespace.

it isn't. However, the current tools in some places assume that the 
outer level of keywords - the section names 'languages', 'description', 
'definition' etc are all against the left hand edge of the file and that 
no other content is. This is to make it easier to detect the separate 
sections in a simple way, so that the 'description' section keyword is 
not mistaken for the word 'description' elsewhere. Of course this is a 
hack, and no-one should repeat it, but I'm mentioning it here, since you 
will see it in the ADL lexical scanner spec 
<https://github.com/openEHR/adl-tools/blob/master/components/adl_compiler/src/syntax/adl/parser/adl_15_scanner.l>.
 
At some point, I'll remove it, hopefully copying some nicer production 
rules that someone else comes up with.

>
> 3) Is ID_CODE_LEADER always going to be 'id'? Is it then considered a
> SYMbol?

good question. Although the usual computer science thinking is: make 
everything generic all the time, in this case, the precise reason for 
these code 'leaders' (i.e. 'id', 'ac', 'at') is to separate codes into 
different semantic groups (i.e. ids, value set codes and value codes). 
In the current conception of ADL I think they should be preserved, 
because they make parsing (and error reporting) much easier, and make 
archetypes much easier to read.

In the long term future, I think we might move to a different system 
where all the codes are external, and archetypes have a managed online 
terminology. We are quite some way off from that technology, and I would 
therefore assume that moving to it, if we ever get there, means a 
/conversion/ of existing archetypes. Consequently, I think the current 
coding approach should be treated as reliable for now (this is not to 
say that the current system can't be improved).

>
> 4) There is something odd about the definition of
> V_ISO8601_DURATION_CONSTRAINT_PATTERN. The definition seems to include a
> trailing '}' but then the parser puts it back in the stream. In the
> ANTLR definition i have omitted the '}', would this be a problem? Can we
> clarify this rule a bit?

I guess you mean this rule:

----------/* V_ISO8601_DURATION_CONSTRAINT_PATTERN */ 
-------------------------------------------------
-- the following is an erroroneous form of the one below to cope with 
the AE bug that causes a duration
-- constraint of the form {pattern/} to be written out, i.e. trailing 
'/'. For now we provide a dedicated
-- syntax error on this, so at least modellers know what to fix
P[yY]?[mM]?[Ww]?[dD]?(T[hH]?[mM]?[sS]?)?\/\} {
last_token := V_ISO8601_DURATION_CONSTRAINT_PATTERN_ERR
unread_character(last_string_value.item(last_string_value.count)) -- put 
back the last character
last_string_value := text
}


This is a scanner rule to get around a bug in the Archetype Editor which 
may or may not still exist. You should not replicate this one, or any 
other rule that has similar comments!


>
> 5) Some rules contain comments such as: "rule to be removed once
> archetypes containing "T" are gone"...Are these archetypes gone by now?
> Can i clean up those rules?

yes, I would not replicate any such rules. I have not had the time to 
determine which tool errors have been fixed, but in any case, I think 
you should go 'clean'.

>
> 6) Throughout the yacc definitions there are some conditionals whose
> purpose i do not entirely understand. For example, during the definition
> of V_REGEXP, there are conditional definitions for each constituent part
> of a regular expression. Why is this? Same goes for V_STRING,
> V_CADL_TEXT, V_RULES_TEXT, V_ODIN_TEXT. At the moment, i am matching 
> these
> with a non-greedy operator. Would this be a problem?

for example this

----------/* V_REGEXP */ -------------------------------------------------
"{/" {
last_token := SYM_START_CBLOCK
set_start_condition (IN_REGEXP1)
in_buffer.append_character ('/')
}
<IN_REGEXP1> {
[^/[]* { -- match segment consisting of non / or [
in_buffer.append_string (text)
}
"["[^]]*"]" { -- match [] segment
in_buffer.append_string (text)
}
[^/]*\\\/ { -- match segment ending in quoted slashes '\/'
in_buffer.append_string (text)
}
[^/[]*"/" { -- match final segment
in_buffer.append_string (text)
create str_.make (in_buffer.count)
str_.append_string (in_buffer)
in_buffer.wipe_out
   last_string_value := str_
   last_token := V_REGEXP
set_start_condition (INITIAL)
}
}
\^[^^\n]*\^ { -- regexp formed using '^' delimiters
   last_token := V_REGEXP
   last_string_value := text
}



This kind of thing is a pretty standard approach for dealing with 
strings, regexes and any other chunks of content that can easily contain 
normal keywords and syntax inside, but where of course the syntax has no 
meaning, so you don't want to hit any of the normal rules for {}, 
keywords etc. So you need to include a way to consume these chunks to 
the right end point (and don't forget in Strings, that means going past 
quoted " to get to the real "). I don't know what way this is done in 
Antlr, but there must be a standard way to replicate it.

In general, don't be afraid to find a better scanner or production rule 
approach than what you see in the current compiler. Some of those rules 
are very old now.

- thomas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/pipermail/openehr-technical_lists.openehr.org/attachments/20140403/1418e4df/attachment-0001.html>

ADL 1.5 ANTLR definitions...and a few questions.

Reply via email to