Hi Alan and Pablo,

You have mitigated some of the problems that emerge from trying to combine
relational data with structured document data. I'll be very interested to
see your progress.

The choices are, as you yourselves have just noted: (1) some sort of object
database, (2) regular relational database, (3) EAV, a subset of
relational, (4) structured documents, such as XML or JSON, and (5) some mix
of the preceding four. I'd only add that the more of these you end up mixing
together, the more complicated and difficult the result will be. Conversely,
if you could get it all into just one homogenous system, using just one or
two of the "preceding four" there would be huge advantages.

The problem we all face is that one cannot physically represent the
thousands of tables and columns and relations between them, as required by
medical information, within a traditional database design. So what you both
are understandably trying to do is to capture what you can within a
traditional database up to the point where the data diversity explodes, and
then, from there on, graft in one, two or maybe even three other data
containment systems to manage the diversity, trying, all the while, to build
pointers and indexes to keep it all taped and wired together. Every added
layer brings its own impedance barriers that only pain will allow you to
breach.

Properly done, EAV would allow you to avoid that expedient and thus keep
your entire system homogeneous and natively interconnected and indexed. Try
looking up the works of Prakash Nadkarni of Yale, a leading EAV and medical
informatics authority--and physician. In addition to articles reaching back
to the 1990s, he has recently authored an EAV book that may be on the
shelves soon, if not already (I'll have to check with him). I believe he
wrote the Wikipedia article cited earlier on this thread. I, too, have had
six years of experience with our own EAV system and can say that EAV can be
designed to be efficient, compact, queryable and maintainable. In fact, I
use hybrid EAV where it should (according to most experts) *not* be used,
including demographic, scheduling and transactional information typical of
medical billing. Medical records will be the next step. Some EAV designs are
foolish and unusable. But others aren't. One of our databases has 2.4
million EAV nodes. That, however, is modest, and I frankly do not know how
it would scale to an order of magnitude bigger than that. So far no
problems.

All real life data is non-sparse only at a small core. Something as "simple"
as a patient demographic table, for instance, will have a few non-sparse
columns, such as name, date of birth, address, etc., but then there are
dozens of other less-used columns that could also be included, and that are,
thanks to EAV, included in our patient demographic "virtual" table. This
allows the dozens of seldom-used fields to be included without having to put
up with a hugely wide table with no end of nullable columns. This means that
you are not caching on the DB tons of wide tables larded with little-used
columns. A simple insurance payment record, in our system, has over 40
fields, all but six of them sparse. Again, EAV works well for this,
especially if the six non-sparse columns are within a single physical table
integrated into the larger EAV system. You just have to master relations
between virtual tables and to automate query construction and execution.

Yes, my SQL statements are longer than they would be with traditional data
modeling; there are more joins and self-joins, but that hasn't exacted a
noticeable performance penalty--yet. Time may prove me wrong, but so far I'm
convinced that a regular relational database can support infinite data
diversity without forcing one to the expedient of XML, JSON, blobs, or
whatever else.

The EAV in use with our system is only partially motivated by the problem of
sparse data. Equally important has been the standardization of all querying,
over-the-wire serialization and deserialization, persistence management, and
UI management. The problem for me, at the moment, is not so much an issue
of performance as it is that our current metadata system, though containing
hundreds of elements, is still not rich enough to accomodate full-throated
EMR (working on this). I can only say that it is proof of concept.

There is no perfect answer. You have to pay some sort of performance price
for sparse data management no matter what option you choose.

Thanks,

Randy Neall
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20110607/5e1b81b1/attachment.html>

Reply via email to