Re: [ol-discuss] Disambiguating authors

Colin Sloss Thu, 22 Jul 2010 01:26:09 -0700

Richaard,

It seems that my copyright has been abused. I don't mind  if I don't get money 
but they should give  details of publicatiom.


Colin Sloss


Richard Light さんは書きました:
>In message 
><[email protected]>, Tom 
>Morris <[email protected]> writes
>>On Wed, Jul 21, 2010 at 1:56 PM, Edward Betts <[email protected]> wrote:
>>
>>> Library MARC records use birth and death dates to disambiguate authors
>>> with the same name. The problem is that some MARC records aren't that
>>> great, they contain mistakes, or are missing the dates. We also load
>>> data from non-MARC sources. We use some heuristics to try and guess if
>>> the author represents the same person or not. We're always trying to
>>> improve these heuristics. For example we should be looking at the type
>>> of subjects that an author writes about and see if the new book we're
>>> loading matches the profile of an existing author with that name.
>>
>>You should never assume that authors are the same based on name alone.
>
>Agreed. A person's name is just a property of that person: it isn't an 
>identity. For a start it's (clearly) not unique; second it's not 
>immutable (women's married names; life peers; Charles Dodgson). To have 
>a reasonable guarantee of a person's identity you need to match on a 
>number of properties.
>
>The larger the pool of people you are dealing with, the more properties 
>- and the more precision - you need. Thus in a typical workplace you can 
>often get away with just using first names as identifiers (with a 
>surname initial as a disambiguator where you have two or more "Richard"s 
>etc.). For authors, the conventional wisdom is that name and dates is a 
>sufficient set of properties, but again there is the question of 
>precision: do you include all names, and do the dates go down to the 
>year, or the day, of birth and death?
>
>>It's a lot easier to merge duplicates than it is to tease apart bad
>>merges, so it is, in my opinion, much better to be very conservative
>>in any automatic matching process.
>
>How far you go with this is a matter of choice, but I would certainly 
>suggest that you don't limit yourself to MARC practices if you want to 
>offer a useful Linked Data resource.
>
>The first question to address is whether you want to create an author 
>authority file, or a person authority file which happens to contain lots 
>of authors. If it's just an author file, presumably you then require a 
>separate file for people who are the subjects of works?  So Winston 
>Churchill, for example, might then end up with two identifiers, because 
>he was both an author and the subject of books, e.g.:
>
>http://openlibrary.org/authors/OL123456A
>http://openlibrary.org/subject/OL456789B
>
>I would argue for designing a person authority framework which allows 
>you to record a number of properties (where they are known) about any 
>person, and assigns a unique, persistent identifier only when you have 
>enough properties to be "sure" of the person's unique identity. As 
>properties, I would certainly include name (repeatable, with some sort 
>of type qualifier), date _and_place_ of birth and death.  Then you can 
>choose whether to use this framework for a single person authority or 
>for separate author and "people as subject" authorities.
>
>Remember that the author statement can be "this book was written by a 
>person whose name was Richard Light", which is true even when you aren't 
>sure of the identity of "Richard Light".
>
>Richard
>-- 
>Richard Light
>_______________________________________________
>Ol-discuss mailing list
>[email protected]
>http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
>To unsubscribe from this mailing list, send email to 
>[email protected]
>
>

----
Colin Sloss  [email protected]

_______________________________________________
Ol-discuss mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
[email protected]

Re: [ol-discuss] Disambiguating authors

Reply via email to