On Wed, Jul 21, 2010 at 1:56 PM, Edward Betts <[email protected]> wrote:

> Library MARC records use birth and death dates to disambiguate authors
> with the same name. The problem is that some MARC records aren't that
> great, they contain mistakes, or are missing the dates. We also load
> data from non-MARC sources. We use some heuristics to try and guess if
> the author represents the same person or not. We're always trying to
> improve these heuristics. For example we should be looking at the type
> of subjects that an author writes about and see if the new book we're
> loading matches the profile of an existing author with that name.

You should never assume that authors are the same based on name alone.
 That's the source of a huge number of errors.

It's a lot easier to merge duplicates than it is to tease apart bad
merges, so it is, in my opinion, much better to be very conservative
in any automatic matching process.

Tom
_______________________________________________
Ol-discuss mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to