I think there needs to be some human review involved in this unless these
are super high confidence merges.

This is probably more appropriate to ol-tech, so bcc'ing ol-discuss

On Thu, Aug 29, 2013 at 6:44 AM, Richard Light <[email protected]>wrote:

>
> In a general spirit of exploration, I took the OL author dump, extracted
> authors with dates, converted them to XML and fed them into a Modes [1]
> database.  I have spent some time tidying up said dates so that they are,
> as far as possible, meaningful and indexable.  I have limited my attention
> to authors with a death date and/or a birth date of 1950 or earlier.
>
> One potential use of this work, I thought, might be to find duplicate OL
> author records which represent the same person.  I have discovered the
> de-duplication magic wand, and have done a few by hand.  However, I am
> rather puzzled.  For example, the last person I looked at was A. Hamon
> (1860-1939).  In my Modes data I have two records for him, both with dates:
>
> http://openlibrary.org/authors/OL5218117A
> and
> http://openlibrary.org/authors/OL5358432A
>
> Both of these URLs dereference to an actual page, with associated works.
> However, in the de-duplication listing only the first of these identifiers
> is present (though I did find another A. Hamon entry to merge).  So, two
> questions:
>
> 1. Is there a format in which I can express a set of instructions to merge
> authors programmatically, to avoid having to do this by hand?  The
> excitement of doing this manually has already worn off, but Modes could
> easily tell me where authors have the same name and same DoB/DoD and help
> me to generate a list of identifiers to merge.
>

You can look at the URLs produced by my app
http://ol-dupes.freebaseapps.com/authors (which needs to be updated with
more current data) or just look at the URLs in your browser address bar
when you're in the final stage of a dedupe.


> 2. Why don't all the potential mergees appear in the merge listing,
> despite the fact that loads of clearly irrelevant entries do appear there?
>

Which dedupe listing?  Are you starting from search?  This search:
http://openlibrary.org/search/authors?q=a.+hamon produces three candidates
for me and they all show up in the merge dialog.  The merge URL (which is
the same one you could generate programatically) is
http://openlibrary.org/authors/merge?key=OL5218117A&key=OL3466239A&key=OL5358432A
The proposed merge target goes first, followed by all the other candidates.

Tom
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to