So for those who like to take on a 'challenge': I just uploaded 1098 files containing 100 merge links each. These are the authors with "en" somewhere in their names, sorted by number of possible duplicates. I removed the Shirley conference (10046 duplicates), since the URL was too long (~140kB).
Since these files contain a lot more personal names than the file of United States names, please note that these names are more likely to belong to multiple people (i.e. "duplicate authors" may be different authors). My strategy for when I'm uncertain whether some name belongs to multiple people, is to not merge those. There is enough to do anyway :) The first file is at http://companjen.name/ol/ol_merge_links_1.html and it contains a link to the next file. The files are numbered 1 through 1098, so you could pick a random number and start there. Good luck and have fun ;) Regards, Ben On 18 May 2012 00:41, Ben Companjen <[email protected]> wrote: > Hi, > > Based on April's datadump I created a file [1] with links to merge > duplicate (6 or more) authors with "United States" in their names. > It's a long list of links, all labeled "Merge" for now, that take you > to the form that ask which authors you want to merge and which is the > "master". > > I got carried away a little, hence perhaps more than 100 links have > been clicked already. You can tell if there is only one author left in > the form and there aren't any duplicates to merge. If you do it the > way I did (the number of authors whose names contain "United States" > has gone down from 33500+ to under 28000), watch out for carpal tunnel > syndrome... I guess merging a couple can't hurt though. > > I'm creating a list of many more authors to merge, including authors > with just one (possible) duplicate. That list will contain over 100000 > author names, so well over 200000 authors. If you have ideas on how to > share that work, I'd be interested to hear them :) > > Regards, > > Ben > > [1] http://companjen.name/ol/mergeurls.html _______________________________________________ Ol-discuss mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to [email protected]
