This discussion document is produced by UCL (CHIME) to highlight issues
that have arisen when designing a Demographics Service component. This
service component will need to conform to the openEHR demographics
package and be able to support the requirements of live demonstrator
sites in north London. Any feedback and comments are welcome.
________________________________________________________________________
__________________________
Demographics server
Requirements for Demographics server
______________________________________
1. A demographics server is designed to serve requests regarding
demographics data related to a 'Person' entity. A 'Person' can be a
user or a patient. The specific set of data stored in a back-end
repository should be shareable. The access to demographics data is less
restricted than that of clinicial data stored in record server, in the
sense that it should be possible for more than one organisation to make
edits to the data that is then immediately visible to other users.
2. Because of the nature of demographics data, it should ideally be
stored in a container that readily facilitates a lookup or search.
3. The container technology should ideally faciliate:
- versioning (the change and history of demographics data);
- archetyping (the addition or edit of types of demographics data).
The Analysis
_____________
The two technologies most suitable for the implementation of a
demographics server are based on a Directory or Relational Database.
Both technologies possess data storage and provide a means to retrieve
the data stored. As both technologies in many ways are rather similar,
the following guidelines serve as general rules for making a suitable
choice between Directory and Database:
1. The nature of data: is this mostly static or very dynamic?
Demographics data by nature is more static. People do not change their
name, address or telephone number very often, at least in comparison to
a patient's clinical data.
2. The type of access to the data: predominantly read or predominantly
write?
Due to the nature of demographics data, a write access is less likely
to be required than a read access (a directory assumes a very high
ratio of read to write access).
3. The structure of the data: hierarchical or flat?
There is normally a tendency to store a person's data in a hierarchical
manner, that is, to organise persons in groups of user or patient, or
in containers based on which organisation they belong to. This is done
because it is a convenient and natural way to organise a person's data,
and facilitates authentication and authorisation decisions. It is not
necessary to organise demographics data in hierarchical manner but the
current implementation does make use of hierarchical structures to
narrow down the scope of searches for a person, as well as to
facilitate the authorisation of a user account.
4. The Location: stand-alone, or required to be replicated?
Ideally one healthcare community should have only one demographics data
service which is shared by each member of the community. This implies
the need to replicate the demographics data to avoid heavy traffic to a
particular point of the network. Replication and synchronisation of
data among organisations is typically managed by the directory server.
However, this advantage is paid for by the consequent data latency and
inconsistency.
5. The accuracy and consistency of data: can latency or inconsistency
be tolerated?
If it is impossible for a given recipient to accept slightly outdated
or out of sync demographics data, then the benefit of the replication
service provided by the directory server may not be gained.
6. The use of the data: will it be used in many disparate systems or
applications?
A directory service will typically be employed in more disparate
systems over a wider area, whereas a database would me more
restrcitively employed in the local area.
Not all the factors outlined above are part of the consideration for
the choice of technology, to the extent that not all of the factors are
critically important in the choice of a suitable technology for
demographics storage. In terms of the persistence itself, a directory
seems to provide a better solution over a relational database. However,
this excludes consideration of the other two important areas cited
above, the ability of these technologies to cope with versioning and
archetyping.
Versioning: this means the ability to keep track of the modifications
that happen to demographics data, for example, by who and when the
modification was performed and what were the changes/the old values?
Not every directory server keeps a change record. In order to keep
track of the changes that happen directly to a demographics repository
(as opposed to a service simply using that repository as a storage
mechanism), we can make use of an "EventListener" to propagate changes
back from the server to our own code. There is a disadvantage using
this method - any changes performed in the directory server while the
listening code is unavailable may not be captured without complex
signalling. This may include the use of a change log.
But how and where to store the change log? to make things easier, the
change log can be stored in the same directory used for storing
demographics data. A similar implementation is done by Netscape
Directory Server. Each change log is stored as an entry under a change
log context, and can be uniquely identified using a change log number.
The change log can either be organised in a similar manner, or stored
as a sub-entry for each 'Person.' That means all the demographics
changes related to a particular person can be found as a sub-entry
under the person entry.
Archetyping: this means the ability to edit the demographics data
types, i.e. to add new demographics related attributes (for example,
carer information or religion) to a 'Patient.'
Re-archetyping an object in terms of directory objects means redefining
the directory schema. It is true that the directory schemas are
extensible. However, because a fundamental assumption of directory
servers is that there will be more lookups and fewer updates, the
ability to dynamically define and add new attributes to an object is
not standardised. If a schema cannot be modified dynamically, then the
schema file must be accessed statically, modified and the directory
server restarted to allow the changes to take effect. OpenLDAP does not
support dynamic modification of the schema yet, though SunOne, Netscape
4.1 (and later) and IBM directory server do support this feature.
This also means each participating organisation must use the same model
of demographics data. If it is decided that an address should be made
of 6 strings, for example, then an organisation can't choose to have 5
or 7 lines intead. But since everyone is sharing the demographics data,
why would an organisation like to have different representation of
demographics data anyway?
On the other hand, archetyping can be handled easily in record data
because the record already supports dynamic extension through use of an
indirection in the reference model. In other words, any new archetype
that represents a new type of data will have a unique archetype ID that
can be stored dynamically.
Possible solutions:
___________________
Directory...
So far, directories have seemed to be a good choice if the requirement
for archetyping is ignored. Not all directory servers available allow
archetyping (in terms of dynamic directory schema changing) to be
performed easily. Since the possibility to archetype demographics data
is not as frequent as for clinical data, should we compromise on this
feature for the moment?
Database...
On the other hand, though not an ideal storage medium for person
details, and with a lookup that is not as optimised for the purpose as
is the case in a directory, a database seems to be able to cope with
most of the requirements outlined in the requirements section. The
database is actually less suitable for storing details about a 'person'
compared to a directory, because of the need to have multiple values
for most of the attributes, and multi-valued attributes in a database
can mean replication of key data. A database also does not typically
provide for the distributed replication of data. If having distributed
copies of demographics data is not essential, then should we say the
database is a better option than the directory?
Directory and Database...
Another possible solution is to make use of both technologies for what
they do best. Since a directory can allow lookups to be performed
quickly, critical attributes that are normally used for person-related
searches (for example, name, ID, phone number) could be stored in the
directory. The rest of the data (versions, re-archetyped data, and the
rest of the demographics data that are not included in the directory,
(which we might call "administrative data") are then stored in a local
database in each organisation. The benefit of this option is that each
organisation can remain control over its own demographics archetype
(the administrative data, at any rate).
Arguably, this may reduce the work load of the DBMS' search module but
whether or not the whole search and data retrieval process will
ultimately be quicker is another question. This method will no doubt
complicate the middleware component as a few changes in a person's
details may involve a trip to both the directory and the database. If
any changes involve the modification of administrative data, then the
changes potentially need to be propagated to all other databases in the
network. A retrieval of the full set of demographics data also means an
access to both the directory and the database. Another issue is that
the administrative data will not be used to perform a typical (fast,
directory-only) lookup or search. In particular, it will exclude
historical demographics data which is held in the database.
Also, if administrative data is stored in the local copy of a database
in each organisation, then the completeness of the demographics data
that an organisation has in their database actually depends on how
early they join the network. The change of surname for Ms. Mary Lea to
Ms. Mary Leigh will not be shown in the database of say, Superdrug's
pharmacy, if they became part of the network two years after the change
of name happened. It is always possible to take a copy of the old
administrative data from any of the existing databases on the network
to try to minimise the gap, if this bit of data is shareable amongst
network nodes from the database.
Lookup Storage
Versioning Archetyping
Database: Easy Not so good
Possible Easy
Directory: Quick Good
Possible Difficult
Database for administrative data, Directory for critical data to
facilitate search:
Search may be improved, versioning and archetyping problem may be
solved but the middleware becomes complicated and there may be a
synchronisation problem
Others: No matter which technology is chosen, the demographics package
will most likely be implemented using EJB and deployed in JBoss. This
is to take advantage of what is readily offered by JBoss, since the
Record Server will be deployed in JBoss.
-
If you have any questions about using this list,
please send a message to d.lloyd at openehr.org