On 18 May 2012 23:12, Tom Morris <[email protected]> wrote: > On Thu, May 17, 2012 at 9:14 PM, Ben Companjen <[email protected]> wrote: >> So for those who like to take on a 'challenge': I just uploaded 1098 >> files containing 100 merge links each. These are the authors with "en" >> somewhere in their names, sorted by number of possible duplicates. I >> removed the Shirley conference (10046 duplicates), since the URL was >> too long (~140kB). >> >> Since these files contain a lot more personal names than the file of >> United States names, please note that these names are more likely to >> belong to multiple people (i.e. "duplicate authors" may be different >> authors). My strategy for when I'm uncertain whether some name belongs >> to multiple people, is to not merge those. There is enough to do >> anyway :) > > It's hugely dangerous to be proposing author merges based on name > alone. OpenLibrary has enough conflated author records without adding > to the mess!
That's true, and it's the main reason for me to start with organizations like parts of US government and conferences. I try my best to watch out when reviewing proposed people merges and hope, by issuing warnings in my emails, that others do so too. > > For example, this URL > http://openlibrary.org/authors/merge?key=OL4313974A&key=OL4718276A&key=OL5123244A&key=OL5654080A&key=OL5757638A&key=OL6996482A& > > proposes to merge six different authors, of whom five have distinct > birth dates (and the last is undated). I would back away from that one :) > > Birth and death dates should be used where they are available and > authors without them shouldn't be merged automatically at all. It's not all automatic: you choose a link (unitedstatescongresssenatecommitteeoninteroceaniccanals is very likely safe to merge, for example), review the proposed merge, tick the boxes (or have them ticked using the bookmarklet), click "merge" and finally click "yes". That said, it makes sense to not propose obviously different authors. I'll update my scripts, but don't expect new files in the next hour :) Ben > > Tom > _______________________________________________ > Ol-discuss mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss > To unsubscribe from this mailing list, send email to > [email protected] _______________________________________________ Ol-discuss mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to [email protected]
