On Thu, May 17, 2012 at 9:14 PM, Ben Companjen <[email protected]> wrote: > So for those who like to take on a 'challenge': I just uploaded 1098 > files containing 100 merge links each. These are the authors with "en" > somewhere in their names, sorted by number of possible duplicates. I > removed the Shirley conference (10046 duplicates), since the URL was > too long (~140kB). > > Since these files contain a lot more personal names than the file of > United States names, please note that these names are more likely to > belong to multiple people (i.e. "duplicate authors" may be different > authors). My strategy for when I'm uncertain whether some name belongs > to multiple people, is to not merge those. There is enough to do > anyway :)
It's hugely dangerous to be proposing author merges based on name alone. OpenLibrary has enough conflated author records without adding to the mess! For example, this URL http://openlibrary.org/authors/merge?key=OL4313974A&key=OL4718276A&key=OL5123244A&key=OL5654080A&key=OL5757638A&key=OL6996482A& proposes to merge six different authors, of whom five have distinct birth dates (and the last is undated). Birth and death dates should be used where they are available and authors without them shouldn't be merged automatically at all. Tom _______________________________________________ Ol-discuss mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to [email protected]
