Hi All,

Today was "Stress Test Day" for my Python ADL 1.4. parser.  The goal was
to see how well I could parse ADL 1.4 in order to create a data
structure that can be used for building in memory Python archetype
objects from the reference model so they can be persisted in an
archetype repository for local use.

I used almost all of the archetypes in SVN control on openEHR.org.  The
exceptions were the test ones involving automotive information and the
duplicates between dev-uk-nhs, dev-uk-nhs-scotland and dev-nl-tno. The
dev-uk-nhs tree took precedence.    I'll probably do another test later
on just the dev tree depending on any feedback I get.

I think that my findings are interesting for the following groups of
people:
1) Archetype tools developers
2) Archetype tools users
3) Clinical governance members
4) Archetype consumers
5) Python openEHR developers

... therefore, my reason for cross-posting.

Of the 755 ADL files tested, 216 of them failed parsing.

These failures are a combination of:

a) missing required sections such as 'archetype', 'definition',
'language', etc. 

b) missing structure closings such as '}' and '>' (these very well could
be errors in the parser itself?)

c) unexpected characters in certain locations (I still have to evaluate
these to see if they are legal or not in the spec)

The main reason I think this is interesting enough to cross post is that
several ADL files pass parsing between version numbers.  My question is;
were they edited with different versions of the same tool or different
tools or was the output just somehow different?  I will propose that the
description section of the archetype carry an attribute that identifies
the tool and version number used for creation/editing. 

While all of these ADL files are listed as being in the development
tree.  I hope that this information may inform the Clinical Governance
group about differences in files and especially about gross missing
items.

I also note that some of the ADL files are intended to be tests.  My
feeling is that some of these have been around for some time and may not
be truly 'test' files.  Should they be removed or updated?  I vote for
updated.  A set of really good test ADL files are essential and will
really simplify the work of openEHR developers.

I have uploaded (see link below) the following files for information:

ADLParserTest.txt -- shows the console output of all the files tested.

parser_errors.log-1st -- shows the files that failed along with why and the
line # and col #

The *.osh file is mostly for Python developers.  It shows the nested
list structure of the parsed ADL.

adl_1_4.py -- the parser
parsertester.py -- calls the parser and logs errors.

tested_adl_files.zip -- the 755 ADL files.

The dev-tree_adl_files.zip and parser_errors.log-2nd are the input/output 
from the ADL files in the dev tree (knowledge/archetypes/dev/*).  69 Failed and 
at first
glance many are the same ones that failed before and for the same
reasons.

I want to also thank Paul McGuire for writing Pyparsing and for working
so hard on this ADL parser.


**********IMPORTANT*****************************************************
The ADL files in this archive MUST not be used for any other purpose.
They are now outside of the openEHR knowledge management framework and
cannot be relied upon for any other use.
************************************************************************

Download the ParserTestResults package 
https://sourceforge.net/project/showfiles.php?group_id=152993



Sincerely,
Tim





-- 
Timothy Cook, MSc
Health Informatics Research & Development Services
LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook 
Skype ID == timothy.cook 
**************************************************************
*You may get my Public GPG key from  popular keyservers or   *
*from this link http://timothywayne.cook.googlepages.com/home*
**************************************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: 
<http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20080506/21f766ae/attachment.asc>

Reply via email to