I took a look at some of these. Many are corporate authors, and I saw some where the author names are identical. It isn't clear to me why these didn't get auto-merged. Maybe it's worth having one of the programmers take a look at this before we hand merge them all.
It's also entirely unclear to me why we ended up with these in authors when they are coded as a corporate authors: 110 00 $aPUNJAB. GOVERNMENT 110 1 $aWest Sussex (England).$bCounty Planning Department. I thought that corporate authors were moved to "collaborator" rather than the author field. So I'm thinking that something went wrong with the loading. The two above are from Toronto library and Talis, so it's not just one source. kc On 5/17/12 6:14 PM, Ben Companjen wrote: > So for those who like to take on a 'challenge': I just uploaded 1098 > files containing 100 merge links each. These are the authors with "en" > somewhere in their names, sorted by number of possible duplicates. I > removed the Shirley conference (10046 duplicates), since the URL was > too long (~140kB). > > Since these files contain a lot more personal names than the file of > United States names, please note that these names are more likely to > belong to multiple people (i.e. "duplicate authors" may be different > authors). My strategy for when I'm uncertain whether some name belongs > to multiple people, is to not merge those. There is enough to do > anyway :) > > The first file is at http://companjen.name/ol/ol_merge_links_1.html > and it contains a link to the next file. The files are numbered 1 > through 1098, so you could pick a random number and start there. > > Good luck and have fun ;) > > Regards, > > Ben > > On 18 May 2012 00:41, Ben Companjen<[email protected]> wrote: >> Hi, >> >> Based on April's datadump I created a file [1] with links to merge >> duplicate (6 or more) authors with "United States" in their names. >> It's a long list of links, all labeled "Merge" for now, that take you >> to the form that ask which authors you want to merge and which is the >> "master". >> >> I got carried away a little, hence perhaps more than 100 links have >> been clicked already. You can tell if there is only one author left in >> the form and there aren't any duplicates to merge. If you do it the >> way I did (the number of authors whose names contain "United States" >> has gone down from 33500+ to under 28000), watch out for carpal tunnel >> syndrome... I guess merging a couple can't hurt though. >> >> I'm creating a list of many more authors to merge, including authors >> with just one (possible) duplicate. That list will contain over 100000 >> author names, so well over 200000 authors. If you have ideas on how to >> share that work, I'd be interested to hear them :) >> >> Regards, >> >> Ben >> >> [1] http://companjen.name/ol/mergeurls.html > _______________________________________________ > Ol-discuss mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss > To unsubscribe from this mailing list, send email to > [email protected] > -- Karen Coyle [email protected] http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet _______________________________________________ Ol-discuss mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to [email protected]
