On 18 May 2012 23:12, Tom Morris <[email protected]> wrote:
> On Thu, May 17, 2012 at 9:14 PM, Ben Companjen <[email protected]> wrote:
>> So for those who like to take on a 'challenge': I just uploaded 1098
>> files containing 100 merge links each. These are the authors with "en"
>> somewhere in their names, sorted by number of possible duplicates. I
>> removed the Shirley conference (10046 duplicates), since the URL was
>> too long (~140kB).
>>
>> Since these files contain a lot more personal names than the file of
>> United States names, please note that these names are more likely to
>> belong to multiple people (i.e. "duplicate authors" may be different
>> authors). My strategy for when I'm uncertain whether some name belongs
>> to multiple people, is to not merge those. There is enough to do
>> anyway :)
>
> It's hugely dangerous to be proposing author merges based on name
> alone.  OpenLibrary has enough conflated author records without adding
> to the mess!

That's true, and it's the main reason for me to start with
organizations like parts of US government and conferences. I try my
best to watch out when reviewing proposed people merges and hope, by
issuing warnings in my emails, that others do so too.
>
> For example, this URL
> http://openlibrary.org/authors/merge?key=OL4313974A&key=OL4718276A&key=OL5123244A&key=OL5654080A&key=OL5757638A&key=OL6996482A&;
>
> proposes to merge six different authors, of whom five have distinct
> birth dates (and the last is undated).

I would back away from that one :)
>
> Birth and death dates should be used where they are available and
> authors without them shouldn't be merged automatically at all.

It's not all automatic: you choose a link
(unitedstatescongresssenatecommitteeoninteroceaniccanals is very
likely safe to merge, for example), review the proposed merge, tick
the boxes (or have them ticked using the bookmarklet), click "merge"
and finally click "yes".

That said, it makes sense to not propose obviously different authors.
I'll update my scripts, but don't expect new files in the next hour :)

Ben
>
> Tom
> _______________________________________________
> Ol-discuss mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
> To unsubscribe from this mailing list, send email to 
> [email protected]
_______________________________________________
Ol-discuss mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to