Each EMR needs a unique person identifier code, I think nobody will object that. To
allow exchange of EMRs (portability), these UPICs have to be not only unique but also
allocated always by the same algorithm. It is obvious, that the chances for duplicates
will still be high if you don't use a clearinghouse for those UPICs when the
population is large enough.
Now, some countries like the Scandinavians have already unique person identifier
codes, which each and every single inhabitant, be it native, immigrant or temporary
resident, gets allocated either at time of birth or arrival in the counrty. Initially
I though these national UPICs could be used as a base for an internatiional one (just
by appending the two leter domaincode for example), but then, what happens with those
who move f.e. from Norway to Sweden? Problem is the lack of persistence. You could use
the code at the first time any code was allocated, but people would forget the older
codes.
I searched the net, usenet archives, statistical yearbooks, ... and could not find a
good system yet (Hey,maybe I just did not stumble over it - does anybody know a good
one?)
Ergo, we need a (P)UPIC, a persistent unique personal identification code. Maybe we
have to accept something less than perfect, something like a PPUPIC, a persistent
pseudo-unique p.i.c. This would be a code "as unique as possible" (= duplicates
unlikely but possible) that can be constructed out of information any patient in any
country would know. What information should we use? Should be information most of the
patients would know and at the same time would be discriminant enough to help building
up the "uniqueness".
* sex: (at date of birth, changes disregarded)
* date of birth: in some coutries still a problem, but a good candidate
* country of birth: name of the country at time of birth
* city of birth: again, some may not know, but a good discriminator
* name initials: name given at birth, later changes disregarded
* initials of parents first names: if known
An early proposal in the GNUMed project was the following:
character (unicode!) position
* 1..8 date of birth (yyyymmdd)
* 9 gender (c [m|f|?])
* 10..14 initials (ccccc [first2 + middle + last2])
* 15..16 mothers initials (cc [first+maiden|uu (unknown))
* 17..18 country of birth (cc [country code])
* 19..20 city of birth (cc [first two letters])
My ppupic would then be "19630224mhophemsdeme": looks long and ugly, but can be
reconstructed anywhere at any time out of persistent information known to me.
I checked this against a database I have from a European study regarding
colorectal carcinoma,where I happened to write the database & statistics
package for. In the 289.000 entries, although fairly incomplete regarding
some of the details above, there was not a single double entry using this
simple algorithm. Could be a starting point, but it is not good enough. One of the
main critics towards this code was the mothers initials, as the maiden name is not
known / not disclosable often enough in some countries. Looking closer at the
proposal, there still a nightmare of difficult definitions "under the hood", despite
the apparent simplicity.
Once again, I ask the list for help, proposals, and criticism.
Horst