How about creating an openEHR test base?

Athanasios Anastasiou Tue, 8 May 2012 13:33:25 +0100

Hello Seref

Many thanks for the UCI reference, i was personally not aware of it and 
it's a great resource.


Well, as it seems there are plenty of "dummy but realistic" (!) dataset 
opportunities out there for creating a "test-base", it is indeed a 
matter of time and i am sorry to not have more experience with actually 
building archetypes, i can see the value in this and i'd definitely give 
it a try.

Perhaps we can create drafts though and even if these are not entirely 
correct they would be edited by others (?)

All the best
Athanasios Anastasiou



On 08/05/2012 12:16, Seref Arikan wrote:
> Hi Athanasios,
> The problem is always about time. If someone is willing to model an
> existing clinical data set, then for those who do not know about it, the
> UCI machine learning repository has some interesting clinical data sets.
> They're freely available for research, and I think it would be fairly
> easy to use them for the type of test based we're discussing. Just
> google UCI machine learning repository, and you should see what I'm
> talking about.
> If the openEHR community has members who can put time into creating
> models for any of these (or other) data sets, and then turning them to
> valid RM serializations, I for one will not say no to that :)
>
> Kind regards
> Seref
>
>
> On Tue, May 8, 2012 at 11:38 AM, Athanasios Anastasiou
> <athanasios.anastasiou at plymouth.ac.uk
> <mailto:athanasios.anastasiou at plymouth.ac.uk>> wrote:
>
>     Dear Erik and all
>
>     (This email might appear a bit long but it actually makes just two
>     points a) Data Synthesizer Tool, b)Availability of Realistic Subject
>     data)
>
>     A) Data Synthesizer Tool
>     I absolutely agree on the "data synthesizer" tool.
>
>     It is something i would like to do as a test case for parsing an
>     archetype's definition node and generating a representative object
>     because in this case, each and every node defined in the spec would
>     have to be handled.
>
>     It's not that much of a time consuming task if you already have the
>     RM builder. The AM provides everything that is needed (For example:
>     http://postimage.org/image/__mcytss26f/
>     <http://postimage.org/image/mcytss26f/> bounds for primitive types,
>     cardinality / multiplicity for other data structures), so instead of
>     just creating an object from the RM and attaching it in a hierarchy
>     (just by calling its constructor maybe), some values would have to
>     be generated and attached to its fields as well.
>
>     Once the RM object is constructed it can be serialized to anything
>     (XML included) (and there goes a first "test base")
>
>      From this perspective, it is absolutely essential that the XSDs are
>     valid (to ensure a valid structure) and also (Seref's got a very
>     good point) that the archetypes are valid to ensure a valid content.
>
>     B) Availability of Realistic Subject Data
>     As far as clinically realistic datasets are concerned, i would like
>     to suggest the following:
>
>     The Alzheimer's Disease Neuroimaging Initiative (ADNI) in the US is
>     a long term project that collects, longitudinally, various clinical
>     parameters from subjects at various stages in the disease
>     (http://adni.loni.ucla.edu/).
>
>     At the moment, the dataset contains about 800 subjects. Each subject
>     would have 4-5 sessions associated with it (at 6 month intervals
>     usually) and for each session a number of parameters would be
>     collected such as MMSE scores, ADAS Cog scores, received medication,
>     lab tests and others as well as imaging biomarkers (MRI mostly). A
>     basic "demographics" section is also available for each subject.
>
>     (To put it in the context of a visualisation, the story that these
>     data reveal is the progression of AD on a subject / population of
>     subjects which is very interesting.)
>
>     The data are made available as CSV files (about 12 MB just for the
>     numerical data). An application must be made to ADNI to obtain the
>     data. As redistribution of the data is prohibited
>     
> (http://adni.loni.ucla.edu/wp-__content/uploads/how_to_apply/__ADNI_DSP_Policy.pdf
>     
> <http://adni.loni.ucla.edu/wp-content/uploads/how_to_apply/ADNI_DSP_Policy.pdf>)
>     we would be working towards a tool that would accept a set of ADNI
>     CSV files and transform them into a local openEHR enabled repository.
>
>     The task here would be to create some archetypes / templates that
>     reflect the structure of the data shared by ADNI and then scan the
>     CSVs and populate the openEHR enabled repository.
>
>     The CSV files are not in the best of conditions (the structure has
>     been changed from version to version, certain fields (such as dates)
>     might be in a number of different formats, the terminology is not
>     exactly standardised, etc).
>
>     For us (ctmnd.org <http://ctmnd.org>) to work on these files we have
>     created an SQL database and a set of scripts that sanitize and
>     import the CSVs.
>
>     I would be interested in turning this database into an openEHR
>     enabled repository (whether a set of XML files or "proper" openEHR
>     database) because it can be used for a number of things (especially
>     for testing AQL).
>
>     If you think that this can be of help, let me know how we can
>     progress with it.
>
>     Obviously the tool can be made available to everybody who can then
>     apply to download the ADNI data locally.
>
>     I am not so sure about the data (even if they become totally
>     anonymised), i will have to check, but in any case, going from "I
>     have nothing" to "I have a database of multi-modal data from 800
>     subjects that is more realistic than test data" is got to worth the
>     trouble of converting the CSVs.
>
>     Looking forward to hearing from you
>     Athanasios Anastasiou
>
>
>     _________________________________________________
>     openEHR-technical mailing list
>     openEHR-technical at lists.__openehr.org
>     <mailto:openEHR-technical at lists.openehr.org>
>     
> http://lists.openehr.org/__mailman/listinfo/openehr-__technical_lists.openehr.org
>     
> <http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org>
>
>

How about creating an openEHR test base?

Reply via email to