Demographics service

Yin Su Lim Wed, 2 Mar 2005 13:37:34 +0000

This discussion document is produced by UCL (CHIME) to highlight issues  
that have arisen when designing a Demographics Service component. This  
service component will need to conform to the openEHR demographics  
package and be able to support the requirements of live demonstrator  
sites in north London. Any feedback and comments are welcome.
________________________________________________________________________ 
__________________________


Demographics server

Requirements for Demographics server
______________________________________

1. A demographics server is designed to serve requests regarding  
demographics data related to a 'Person' entity. A 'Person' can be a  
user or a patient. The specific set of data stored in a back-end  
repository should be shareable. The access to demographics data is less  
restricted than that of clinicial data stored in record server, in the  
sense that it should be possible for more than one organisation to make  
edits to the data that is then immediately visible to other users.

2. Because of the nature of demographics data, it should ideally be  
stored in a container that readily facilitates a lookup or search.

3. The container technology should ideally faciliate:
- versioning (the change and history of demographics data);
- archetyping (the addition or edit of types of demographics data).

The Analysis
_____________

The two technologies most suitable for the implementation of a  
demographics server are based on a Directory or Relational Database.  
Both technologies possess data storage and provide a means to retrieve  
the data stored. As both technologies in many ways are rather similar,  
the following guidelines serve as general rules for making a suitable  
choice between Directory and Database:

1. The nature of data: is this mostly static or very dynamic?
Demographics data by nature is more static. People do not change their  
name, address or telephone number very often, at least in comparison to  
a patient's clinical data.

2. The type of access to the data: predominantly read or predominantly  
write?
Due to the nature of demographics data, a write access is less likely  
to be required than a read access (a directory assumes a very high  
ratio of read to write access).

3. The structure of the data: hierarchical or flat?
There is normally a tendency to store a person's data in a hierarchical  
manner, that is, to organise persons in groups of user or patient, or  
in containers based on which organisation they belong to. This is done  
because it is a convenient and natural way to organise a person's data,  
and facilitates authentication and authorisation decisions. It is not  
necessary to organise demographics data in hierarchical manner but the  
current implementation does make use of hierarchical structures to  
narrow down the scope of searches for a person, as well as to  
facilitate the authorisation of a user account.

4. The Location: stand-alone, or required to be replicated?
Ideally one healthcare community should have only one demographics data  
service which is shared by each member of the community. This implies  
the need to replicate the demographics data to avoid heavy traffic to a  
particular point of the network. Replication and synchronisation of  
data among organisations is typically managed by the directory server.  
However, this advantage is paid for by the consequent data latency and  
inconsistency.

5. The accuracy and consistency of data: can latency or inconsistency  
be tolerated?
If it is impossible for a given recipient to accept slightly outdated  
or out of sync demographics data, then the benefit of the replication  
service provided by the directory server  may not be gained.

6. The use of the data: will it be used in many disparate systems or  
applications?
A directory service will typically be employed in more disparate  
systems over a wider area, whereas a database would me more  
restrcitively employed in the local area.

Not all the factors outlined above are part of the consideration for  
the choice of technology, to the extent that not all of the factors are  
critically important in the choice of a suitable technology for  
demographics storage. In terms of the persistence itself, a directory  
seems to provide a better solution over a relational database. However,  
this excludes consideration of the other two important areas cited  
above, the ability of these technologies to cope with versioning and  
archetyping.

Versioning: this means the ability to keep track of the modifications  
that happen to demographics data, for example, by who and when the  
modification was performed and what were the changes/the old values?

Not every directory server keeps a change record. In order to keep  
track of the changes that happen directly to a demographics repository  
(as opposed to a service simply using that repository as a storage  
mechanism), we can make use of an "EventListener" to propagate changes  
back from the server to our own code. There is a disadvantage using  
this method - any changes performed in the directory server while the  
listening code is unavailable may not be captured without complex  
signalling. This may include the use of a change log.

But how and where to store the change log? to make things easier, the  
change log can be stored in the same directory used for storing  
demographics data. A similar implementation is done by Netscape  
Directory Server. Each change log is stored as an entry under a change  
log context, and can be uniquely identified using a change log number.  
The change log can either be organised in a similar manner, or stored  
as a sub-entry for each 'Person.' That means all the demographics  
changes related to a particular person can be found as a sub-entry  
under the person entry.

Archetyping: this means the ability to edit the demographics data  
types, i.e. to add new demographics related attributes (for example,  
carer information or religion) to a 'Patient.'
Re-archetyping an object in terms of directory objects means redefining  
the directory schema. It is true that the directory schemas are  
extensible. However, because a fundamental assumption of directory  
servers is that there will be more lookups and fewer updates, the  
ability to dynamically define and add new attributes to an object is  
not standardised. If a schema cannot be modified dynamically, then the  
schema file must be accessed statically, modified and the directory  
server restarted to allow the changes to take effect. OpenLDAP does not  
support dynamic modification of the schema yet, though SunOne, Netscape  
4.1 (and later) and IBM directory server do support this feature.

This also means each participating organisation must use the same model  
of demographics data. If it is decided that an address should be made  
of 6 strings, for example, then an organisation can't choose to have 5  
or 7 lines intead. But since everyone is sharing the demographics data,  
why would an organisation like to have different representation of  
demographics data anyway?

On the other hand, archetyping can be handled easily in record data  
because the record already supports dynamic extension through use of an  
indirection in the reference model. In other words, any new archetype  
that represents a new type of data will have a unique archetype ID that  
can be stored dynamically.

Possible solutions:
___________________

Directory...
So far, directories have seemed to be a good choice if the requirement  
for archetyping is ignored. Not all directory servers available allow  
archetyping (in terms of dynamic directory schema changing) to be  
performed easily. Since the possibility to archetype demographics data  
is not as frequent as for clinical data, should we compromise on this  
feature for the moment?

Database...
On the other hand, though not an ideal storage medium for person  
details, and with a lookup that is not as optimised for the purpose as  
is the case in a directory, a database seems to be able to cope with  
most of the requirements outlined in the requirements section. The  
database is actually less suitable for storing details about a 'person'  
compared to a directory, because of the need to have multiple values  
for most of the attributes, and multi-valued attributes in a database  
can mean replication of key data. A database also does not typically  
provide for the distributed replication of data. If having distributed  
copies of demographics data is not essential, then should we say the  
database is a better option than the directory?

Directory and Database...
Another possible solution is to make use of both technologies for what  
they do best. Since a directory can allow lookups to be performed  
quickly, critical attributes that are normally used for person-related  
searches (for example, name, ID, phone number) could be stored in the  
directory. The rest of the data (versions, re-archetyped data, and the  
rest of the demographics data that are not included in the directory,  
(which we might call "administrative data") are then stored in a local  
database in each organisation. The benefit of this option is that each  
organisation can remain control over its own demographics archetype  
(the administrative data, at any rate).

Arguably, this may reduce the work load of the DBMS' search module but  
whether or not the whole search and data retrieval process will  
ultimately be quicker is another question. This method will no doubt  
complicate the middleware component as a few changes in a person's  
details may involve a trip to both the directory and the database. If  
any changes involve the modification of administrative data, then the  
changes potentially need to be propagated to all other databases in the  
network. A retrieval of the full set of demographics data also means an  
access to both the directory and the database. Another issue is that  
the administrative data will not be used to perform a typical (fast,  
directory-only) lookup or search. In particular, it will exclude  
historical demographics data which is held in the database.

Also, if administrative data is stored in the local copy of a database  
in each organisation, then the completeness of the demographics data  
that an organisation has in their database actually depends on how  
early they join the network. The change of surname for Ms. Mary Lea to  
Ms. Mary Leigh will not be shown in the database of say, Superdrug's  
pharmacy, if they became part of the network two years after the change  
of name happened. It is always possible to take a copy of the old  
administrative data from any of the existing databases on the network  
to try to minimise the gap, if this bit of data is shareable amongst  
network nodes from the database.



                                Lookup                  Storage                 
Versioning      Archetyping
Database:               Easy                            Not so good             
Possible        Easy
Directory:                      Quick                   Good                    
Possible        Difficult

Database for administrative data, Directory for critical data to  
facilitate search:
Search may be improved, versioning and archetyping problem may be  
solved but the middleware becomes complicated and there may be a  
synchronisation problem

Others: No matter which technology is chosen, the demographics package  
will most likely be implemented using EJB and deployed in JBoss. This  
is to take advantage of what is readily offered by JBoss, since the  
Record Server will be deployed in JBoss.

-
If you have any questions about using this list,
please send a message to d.lloyd at openehr.org

Demographics service

Reply via email to