Hello Seref Many thanks for the UCI reference, i was personally not aware of it and it's a great resource.
Well, as it seems there are plenty of "dummy but realistic" (!) dataset opportunities out there for creating a "test-base", it is indeed a matter of time and i am sorry to not have more experience with actually building archetypes, i can see the value in this and i'd definitely give it a try. Perhaps we can create drafts though and even if these are not entirely correct they would be edited by others (?) All the best Athanasios Anastasiou On 08/05/2012 12:16, Seref Arikan wrote: > Hi Athanasios, > The problem is always about time. If someone is willing to model an > existing clinical data set, then for those who do not know about it, the > UCI machine learning repository has some interesting clinical data sets. > They're freely available for research, and I think it would be fairly > easy to use them for the type of test based we're discussing. Just > google UCI machine learning repository, and you should see what I'm > talking about. > If the openEHR community has members who can put time into creating > models for any of these (or other) data sets, and then turning them to > valid RM serializations, I for one will not say no to that :) > > Kind regards > Seref > > > On Tue, May 8, 2012 at 11:38 AM, Athanasios Anastasiou > <athanasios.anastasiou at plymouth.ac.uk > <mailto:athanasios.anastasiou at plymouth.ac.uk>> wrote: > > Dear Erik and all > > (This email might appear a bit long but it actually makes just two > points a) Data Synthesizer Tool, b)Availability of Realistic Subject > data) > > A) Data Synthesizer Tool > I absolutely agree on the "data synthesizer" tool. > > It is something i would like to do as a test case for parsing an > archetype's definition node and generating a representative object > because in this case, each and every node defined in the spec would > have to be handled. > > It's not that much of a time consuming task if you already have the > RM builder. The AM provides everything that is needed (For example: > http://postimage.org/image/__mcytss26f/ > <http://postimage.org/image/mcytss26f/> bounds for primitive types, > cardinality / multiplicity for other data structures), so instead of > just creating an object from the RM and attaching it in a hierarchy > (just by calling its constructor maybe), some values would have to > be generated and attached to its fields as well. > > Once the RM object is constructed it can be serialized to anything > (XML included) (and there goes a first "test base") > > From this perspective, it is absolutely essential that the XSDs are > valid (to ensure a valid structure) and also (Seref's got a very > good point) that the archetypes are valid to ensure a valid content. > > B) Availability of Realistic Subject Data > As far as clinically realistic datasets are concerned, i would like > to suggest the following: > > The Alzheimer's Disease Neuroimaging Initiative (ADNI) in the US is > a long term project that collects, longitudinally, various clinical > parameters from subjects at various stages in the disease > (http://adni.loni.ucla.edu/). > > At the moment, the dataset contains about 800 subjects. Each subject > would have 4-5 sessions associated with it (at 6 month intervals > usually) and for each session a number of parameters would be > collected such as MMSE scores, ADAS Cog scores, received medication, > lab tests and others as well as imaging biomarkers (MRI mostly). A > basic "demographics" section is also available for each subject. > > (To put it in the context of a visualisation, the story that these > data reveal is the progression of AD on a subject / population of > subjects which is very interesting.) > > The data are made available as CSV files (about 12 MB just for the > numerical data). An application must be made to ADNI to obtain the > data. As redistribution of the data is prohibited > > (http://adni.loni.ucla.edu/wp-__content/uploads/how_to_apply/__ADNI_DSP_Policy.pdf > > <http://adni.loni.ucla.edu/wp-content/uploads/how_to_apply/ADNI_DSP_Policy.pdf>) > we would be working towards a tool that would accept a set of ADNI > CSV files and transform them into a local openEHR enabled repository. > > The task here would be to create some archetypes / templates that > reflect the structure of the data shared by ADNI and then scan the > CSVs and populate the openEHR enabled repository. > > The CSV files are not in the best of conditions (the structure has > been changed from version to version, certain fields (such as dates) > might be in a number of different formats, the terminology is not > exactly standardised, etc). > > For us (ctmnd.org <http://ctmnd.org>) to work on these files we have > created an SQL database and a set of scripts that sanitize and > import the CSVs. > > I would be interested in turning this database into an openEHR > enabled repository (whether a set of XML files or "proper" openEHR > database) because it can be used for a number of things (especially > for testing AQL). > > If you think that this can be of help, let me know how we can > progress with it. > > Obviously the tool can be made available to everybody who can then > apply to download the ADNI data locally. > > I am not so sure about the data (even if they become totally > anonymised), i will have to check, but in any case, going from "I > have nothing" to "I have a database of multi-modal data from 800 > subjects that is more realistic than test data" is got to worth the > trouble of converting the CSVs. > > Looking forward to hearing from you > Athanasios Anastasiou > > > _________________________________________________ > openEHR-technical mailing list > openEHR-technical at lists.__openehr.org > <mailto:openEHR-technical at lists.openehr.org> > > http://lists.openehr.org/__mailman/listinfo/openehr-__technical_lists.openehr.org > > <http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org> > >

