How about creating an openEHR test base?

Seref Arikan Tue, 8 May 2012 12:16:17 +0100

Hi Athanasios,
The problem is always about time. If someone is willing to model an
existing clinical data set, then for those who do not know about it, the
UCI machine learning repository has some interesting clinical data sets.
They're freely available for research, and I think it would be fairly easy
to use them for the type of test based we're discussing. Just google UCI
machine learning repository, and you should see what I'm talking about.
If the openEHR community has members who can put time into creating models
for any of these (or other) data sets, and then turning them to valid RM
serializations, I for one will not say no to that :)


Kind regards
Seref


On Tue, May 8, 2012 at 11:38 AM, Athanasios Anastasiou <
athanasios.anastasiou at plymouth.ac.uk> wrote:

> Dear Erik and all
>
> (This email might appear a bit long but it actually makes just two points
> a) Data Synthesizer Tool, b)Availability of Realistic Subject data)
>
> A) Data Synthesizer Tool
> I absolutely agree on the "data synthesizer" tool.
>
> It is something i would like to do as a test case for parsing an
> archetype's definition node and generating a representative object because
> in this case, each and every node defined in the spec would have to be
> handled.
>
> It's not that much of a time consuming task if you already have the RM
> builder. The AM provides everything that is needed (For example:
> http://postimage.org/image/**mcytss26f/<http://postimage.org/image/mcytss26f/>bounds
>  for primitive types, cardinality / multiplicity for other data
> structures), so instead of just creating an object from the RM and
> attaching it in a hierarchy (just by calling its constructor maybe), some
> values would have to be generated and attached to its fields as well.
>
> Once the RM object is constructed it can be serialized to anything (XML
> included) (and there goes a first "test base")
>
> From this perspective, it is absolutely essential that the XSDs are valid
> (to ensure a valid structure) and also (Seref's got a very good point) that
> the archetypes are valid to ensure a valid content.
>
> B) Availability of Realistic Subject Data
> As far as clinically realistic datasets are concerned, i would like to
> suggest the following:
>
> The Alzheimer's Disease Neuroimaging Initiative (ADNI) in the US is a long
> term project that collects, longitudinally, various clinical parameters
> from subjects at various stages in the disease (http://adni.loni.ucla.edu/
> ).
>
> At the moment, the dataset contains about 800 subjects. Each subject would
> have 4-5 sessions associated with it (at 6 month intervals usually) and for
> each session a number of parameters would be collected such as MMSE scores,
> ADAS Cog scores, received medication, lab tests and others as well as
> imaging biomarkers (MRI mostly). A basic "demographics" section is also
> available for each subject.
>
> (To put it in the context of a visualisation, the story that these data
> reveal is the progression of AD on a subject / population of subjects which
> is very interesting.)
>
> The data are made available as CSV files (about 12 MB just for the
> numerical data). An application must be made to ADNI to obtain the data. As
> redistribution of the data is prohibited (http://adni.loni.ucla.edu/wp-**
> content/uploads/how_to_apply/**ADNI_DSP_Policy.pdf<http://adni.loni.ucla.edu/wp-content/uploads/how_to_apply/ADNI_DSP_Policy.pdf>)
> we would be working towards a tool that would accept a set of ADNI CSV
> files and transform them into a local openEHR enabled repository.
>
> The task here would be to create some archetypes / templates that reflect
> the structure of the data shared by ADNI and then scan the CSVs and
> populate the openEHR enabled repository.
>
> The CSV files are not in the best of conditions (the structure has been
> changed from version to version, certain fields (such as dates) might be in
> a number of different formats, the terminology is not exactly standardised,
> etc).
>
> For us (ctmnd.org) to work on these files we have created an SQL database
> and a set of scripts that sanitize and import the CSVs.
>
> I would be interested in turning this database into an openEHR enabled
> repository (whether a set of XML files or "proper" openEHR database)
> because it can be used for a number of things (especially for testing AQL).
>
> If you think that this can be of help, let me know how we can progress
> with it.
>
> Obviously the tool can be made available to everybody who can then apply
> to download the ADNI data locally.
>
> I am not so sure about the data (even if they become totally anonymised),
> i will have to check, but in any case, going from "I have nothing" to "I
> have a database of multi-modal data from 800 subjects that is more
> realistic than test data" is got to worth the trouble of converting the
> CSVs.
>
> Looking forward to hearing from you
> Athanasios Anastasiou
>
>
> ______________________________**_________________
> openEHR-technical mailing list
> openEHR-technical at lists.**openehr.org<openEHR-technical at 
> lists.openehr.org>
> http://lists.openehr.org/**mailman/listinfo/openehr-**
> technical_lists.openehr.org<http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/pipermail/openehr-technical_lists.openehr.org/attachments/20120508/78c98d66/attachment.html>

How about creating an openEHR test base?

Reply via email to