> It seems like a reasonable use case!
I'm actually working on a project that's doing basically this (indexing and
aggregating data abstracted from single-visit and multi-site epi studies), and
I agree that this is a great use case. Right now, we're using a relational data
model, but I am firmly convinced that doing this the "Right Way" would require
the flexibility and richness of an ontology, simply because of how complex the
data is, both in terms of how it should be represented as well as in terms of
the complexities involved in its aggregation. I'm working with a team of
systematic reviewers and epidemiologists, and so far, pretty much every data
element that we've added to the system has some sort of bearing on whether and
how one goes about aggregating data, which means that any system trying to do
this in a generalizable way has to have some way of encoding *that* knowledge,
as well (e.g., "data points with attributes X, Y, and Z are aggregated thusly,
whereas data points with X, Y, and A are aggregated in some other way").
> It seems like you'd need to identify the factors of interest--e.g. disease,
> selection criteria, research questions--and aggregate on those. Someone who
> actually does metaanalyses would be more aware of what factors are
> relevant/important.
In case anybody out there is thinking of doing this, here's some of what we
store in our system (in addition to the data elements you've identified above,
all of which are also relevant and included in our system):
- study design (which we model using several different attributes-
prospective/retrospective, case-control/cohort, etc.),
- study setting (geographic location, hospital/outpatient, some data
about who the study population was (military, pediatric, etc.), plus a fair bit
of domain-specific stuff that's related to the particular medical topic with
which we're working)
- observation time points (both fixed ("we measured the prevalence of
symptom X at Y days" as well as date ranges ("we measured the incidence of
disease X between Y and Z days", sometimes reported as means with standard
deviations or confidence intervals instead of explicit time points))
- whether a given observation was a mean, a proportion, something else,
with confidence intervals, without confidence intervals, sometimes with sample
size (sometimes broken out by treatment and control group status, often for
multiple treatment groups),
- etc. etc. etc... and down the rabbit hole we go, and thus far I've
only talked about the different kinds of metadata we have to store- not even
about the data itself that we wanted to aggregate! :-)
In spite of all of this, it's a really great domain to be working in- there's a
ton of low-hanging data management fruit out there for systematic reviewers.
The ones I work with (at a major evidence-based practice center that does tons
of AHRQ and USPSTF reviews) basically live in EndNote and Microsoft Word, and
use those as their data management platform. Coming up with tools to help them
work more effectively is really satisfying- I wish I'd had a camera running the
day I told them that they could all use the system to enter data simultaneously
(instead of having to keep track of who had the EndNote file open at any given
time). They were like kids at Christmas...
-SB