On 12/15/2010 5:07 PM, Alan Millar wrote: >>> Would it make a difference to you if, instead of re-direction, the previous >>> identifiers were included in the record itself? >> >> No. For purposes of a relational database I need an identifier that is unique >> and exclusive. > > You won't find it, except for single-point-in-time snapshots of data, > which are not changed and do not interrelate with other datasets.
I /will/ find it if I create and manage it (on-going synchronization with other data sets is not part of my use case). My point, however, was that I cannot rely on OL ids to meet this requirement. [snip] > One could argue that Mark Twain and Samuel Clemens should not have > been given different OL ID's because somebody should have known they > were the same person. But the basic scenario is still valid. Even > if you go with event-based identification, there will be situations > where you epistemologically can't know whether John Smith, born 1793, > is the same person as John Smith, flourished 1845, until/unless more > information becomes available later. When it does, you are back to the > same problem of needing to merge two entities, and somehow be able to > refer to the new single entity. How do you plan to do that? My strategy is two-fold. First, as you suggest, the problem of incomplete data can be significantly mitigated by appropriate input validation. In OL's case perhaps millions of records were loaded before any attempt to de-dupe them was attempted, and as of a year ago no effort was being made to de-dupe the repository; only new records were being checked. Second, when records must be merged, all referring records are updated rather than relying on a chain of redirection. Using my SQL database this is easily performed by the simple query "UPDATE works_contributors SET entity_idn = [merged entity identifier] WHERE entity_idn = [obsolete entity identifier]; DELETE FROM entities WHERE entity_idn = [obsolete entity identifier]". Updating the database when a record needs to be split is slightly more complicated, but not significantly so. _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
