On 26/01/2016 09:51, Bert Verhees wrote:
On 26-01-16 10:38, Jan-Marc Verlinden wrote:
# Our first version was Java based with a postgres DB, everything
stored as path/values.
Every query would take about a second. We did not even try complex
queries..:-). Also the GUI side did not know what to do with the
pathvalues.
Hi Jan-Marc,
There where some problems handling the path/values, most problems were
based on giving a semantic meaning to the paths.
Storing path and an according a value is very, very quick. I asked
database specialists, and they say this is the best way to go until
billions of records.
this is also what I would expect. Path-based storage does rely on very
smart ways to figure match terms in a query to paths of course. There
are some tricks to use here. For example, the path to systolic BP
DV_QUANTITY node from the archetype is
/data[id2|history|]/events[id7|any
event|]/data[id4]/items[id5|Systolic|]/value
In the whole of CKM there are probably about 7,000 'interesting' leaf
paths (if you assume that you crunch DATA_VALUE subtypes into little
blobs). That's a tiny number. Assume that when they've modelled
everything in medicine (outside of genomics and proteomics) that we have
50,000 such 'paths of interest'. That's a very small number. These paths
can be mapped in smart ways to a 64-but number space so that finding out
if a specific query term is in some EHR is very quick. When you include
a coded list of archetype ids in the mix, I think querying can be made
extremly quick.
The devil is in the details. Various large DBs used path-based approachs
in the past, Informix was one.
Also easy to migrate to another database, for clustering or other reasons.
But there are some problems to solve, which were harder to solve five
years ago.
One problem is the GUI builders, they are looking at a difficult to
understand database-approach, and also easy to create errors in, hard
to debug.
They need JSON to write their datasets in.
The other problem is querying. As long as it are predefined queries,
you can do anything, but then you are no different from an old
monolithic system.
But writing new templates heavily relies on on the fly query building
There are however, some technological progresses, also in the open
source domain.
The path/value storage could come to a better life again with help of
ANTLR, which can help to interpret AQL for this purpose. I even think
this is promising.
Let engineers read the Definitive ANTLR4 Reference by Terence Parr,
and read it with path/values in the back of the mind. Both the GUI
problem as the query problem can be solved.
It should be worth the spent time and the price of the book ;-)
It is.
- thomas
_______________________________________________
openEHR-technical mailing list
openEHR-technical@lists.openehr.org
http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org