Hi Alan and Pablo, You have mitigated some of the problems that emerge from trying to combine relational data with structured document data. I'll be very interested to see your progress.
The choices are, as you yourselves have just noted: (1) some sort of object database, (2) regular relational database, (3) EAV, a subset of relational, (4) structured documents, such as XML or JSON, and (5) some mix of the preceding four. I'd only add that the more of these you end up mixing together, the more complicated and difficult the result will be. Conversely, if you could get it all into just one homogenous system, using just one or two of the "preceding four" there would be huge advantages. The problem we all face is that one cannot physically represent the thousands of tables and columns and relations between them, as required by medical information, within a traditional database design. So what you both are understandably trying to do is to capture what you can within a traditional database up to the point where the data diversity explodes, and then, from there on, graft in one, two or maybe even three other data containment systems to manage the diversity, trying, all the while, to build pointers and indexes to keep it all taped and wired together. Every added layer brings its own impedance barriers that only pain will allow you to breach. Properly done, EAV would allow you to avoid that expedient and thus keep your entire system homogeneous and natively interconnected and indexed. Try looking up the works of Prakash Nadkarni of Yale, a leading EAV and medical informatics authority--and physician. In addition to articles reaching back to the 1990s, he has recently authored an EAV book that may be on the shelves soon, if not already (I'll have to check with him). I believe he wrote the Wikipedia article cited earlier on this thread. I, too, have had six years of experience with our own EAV system and can say that EAV can be designed to be efficient, compact, queryable and maintainable. In fact, I use hybrid EAV where it should (according to most experts) *not* be used, including demographic, scheduling and transactional information typical of medical billing. Medical records will be the next step. Some EAV designs are foolish and unusable. But others aren't. One of our databases has 2.4 million EAV nodes. That, however, is modest, and I frankly do not know how it would scale to an order of magnitude bigger than that. So far no problems. All real life data is non-sparse only at a small core. Something as "simple" as a patient demographic table, for instance, will have a few non-sparse columns, such as name, date of birth, address, etc., but then there are dozens of other less-used columns that could also be included, and that are, thanks to EAV, included in our patient demographic "virtual" table. This allows the dozens of seldom-used fields to be included without having to put up with a hugely wide table with no end of nullable columns. This means that you are not caching on the DB tons of wide tables larded with little-used columns. A simple insurance payment record, in our system, has over 40 fields, all but six of them sparse. Again, EAV works well for this, especially if the six non-sparse columns are within a single physical table integrated into the larger EAV system. You just have to master relations between virtual tables and to automate query construction and execution. Yes, my SQL statements are longer than they would be with traditional data modeling; there are more joins and self-joins, but that hasn't exacted a noticeable performance penalty--yet. Time may prove me wrong, but so far I'm convinced that a regular relational database can support infinite data diversity without forcing one to the expedient of XML, JSON, blobs, or whatever else. The EAV in use with our system is only partially motivated by the problem of sparse data. Equally important has been the standardization of all querying, over-the-wire serialization and deserialization, persistence management, and UI management. The problem for me, at the moment, is not so much an issue of performance as it is that our current metadata system, though containing hundreds of elements, is still not rich enough to accomodate full-throated EMR (working on this). I can only say that it is proof of concept. There is no perfect answer. You have to pay some sort of performance price for sparse data management no matter what option you choose. Thanks, Randy Neall -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20110607/5e1b81b1/attachment.html>

