On Sat, May 19, 2012 at 11:01 AM, Ben Companjen <[email protected]> wrote: > On 19 May 2012 16:11, Karen Coyle <[email protected]> wrote: >> I took a look at some of these. Many are corporate authors, and I saw >> some where the author names are identical. It isn't clear to me why >> these didn't get auto-merged. Maybe it's worth having one of the >> programmers take a look at this before we hand merge them all. > > Since most duplicates were last edited by ImportBot in 2008, I get the > impression that maybe ImportBot wasn't capable of checking whether the > author was already in the database. Or, but this is speculation, it's > a long-running April Fool's joke (seeing that many were last edited > April 1st, 2008). ;)
For corporate authors with very long structured names, it may be possible to merge them automatically across sources, but from a general processing perspective, it's dangerous to assume much about authors with the same name from different sources (e.g. two different libraries). My "Acme Corp" and your "Acme Corp" might be entirely different entities. My "Smith, John" (b. 1936) might be your "Smith, John 1936-" while your "Smith, John" was born in 1950. Unless strong identifiers are used or the names are in a form from some central source (e.g. the Library of Congress Name Authority File), pretty much all bets are off. Because of this, I think it's dangerous to be too aggressive about attempting to merge authors without additional supporting and corroborating information. Tom _______________________________________________ Ol-discuss mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to [email protected]
