On 12/15/2010 5:07 PM, Alan Millar wrote:

>>> Would it make a difference to you if, instead of re-direction, the previous
>>> identifiers were included in the record itself?
>>
>> No. For purposes of a relational database I need an identifier that is unique
>> and exclusive.
>
> You won't find it, except for single-point-in-time snapshots of data,
> which are not changed and do not interrelate with other datasets.

I /will/ find it if I create and manage it (on-going synchronization 
with other data sets is not part of my use case). My point, however, was 
that I cannot rely on OL ids to meet this requirement.

[snip]

> One could argue that Mark Twain and Samuel Clemens should not have
> been given different OL ID's because somebody should have known they
> were the same person.  But the basic scenario is still valid.   Even
> if you go with event-based identification, there will be situations
> where you epistemologically can't know whether John Smith, born 1793,
> is the same person as John Smith, flourished 1845, until/unless more
> information becomes available later. When it does, you are back to the
> same problem of needing to merge two entities, and somehow be able to
> refer to the new single entity.  How do you plan to do that?

My strategy is two-fold. First, as you suggest, the problem of 
incomplete data can be significantly mitigated by appropriate input 
validation. In OL's case perhaps millions of records were loaded before 
any attempt to de-dupe them was attempted, and as of a year ago no 
effort was being made to de-dupe the repository; only new records were 
being checked. Second, when records must be merged, all referring 
records are updated rather than relying on a chain of redirection. Using 
my SQL database this is easily performed by the simple query "UPDATE 
works_contributors SET entity_idn = [merged entity identifier] WHERE 
entity_idn = [obsolete entity identifier]; DELETE FROM entities WHERE 
entity_idn = [obsolete entity identifier]". Updating the database when a 
record needs to be split is slightly more complicated, but not 
significantly so.
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to