Re: [ol-discuss] New power tool for author merges
Thanks, Tom. I took a quick look at a few pages of your list, and have 2 immediate observations: 1 - there are a LOT of corporate authors, and most of those will not have come in on library records (the ones on library records we moved to the contributor field). This makes me wonder if we don't want to do something consistent here -- moving these to contributor? Or are folks comfortable with corporate names in the author field? 2 - of the ones I looked at, a number have already been merged. (Yeah OL users!). You wouldn't by any chance want to update this list? (she asks sheepishly) kc Quoting Tom Morris tfmor...@gmail.com: I've been totally unsuccessful in getting any of the OpenLibrary staff interested in the list of duplicate authors that I generated last spring, so I've decided to open it up to the community. I've modified my duplicate listing program to automatically generate an OpenLibrary author merge URL with all the duplicate IDs. If you are logged in to OpenLibrary and you click on the URL, it will take you to the author merge dialog page where you can select which authors should be merged, which one should be the master, etc. Please note that this is a *power* tool and should be used with great care. There *are* errors in the listing of duplicates, so you should review carefully the set of authors that are being proposed for merger to make sure it's accurate. I've done the first 50 or so, so you'll want to skip ahead in the list to find some that still need work. I'll see if I can enhance the program to skip authors who have already been processed, but for now if you click a link and end up on a page with just one author (or zero authors), that means someone else already took care of this author. Don't worry about running out of work, there are over 7,000 sets of duplicates (with 20K total records), so there'll be plenty for everyone to work on. Here's the tool: http://ol-dupes.freebaseapps.com/ Play carefully! Tom ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org
Re: [ol-discuss] New power tool for author merges
On Sun, Feb 13, 2011 at 11:39 AM, Karen Coyle kco...@kcoyle.net wrote: 2 - of the ones I looked at, a number have already been merged. (Yeah OL users!). You wouldn't by any chance want to update this list? (she asks sheepishly) Yes, in addition the ones the OL users had already done (mostly famous authors), I've also churned through probably a couple of hundred by now. There are two updates in the works: 1. Removing merges which have already been processed. A perfectly reasonable request, so no need to be sheepish, and I already collected redirect data last night to do this. Because this is driven off the live Freebase data, I'm waiting on comments from the Freebase community about removing the dead OL author keys (ie the ones that redirect now). If that's not forthcoming, I'll use a separate list. 2. Removing the sorting by duplicate count. This causes performance problems that cause the app to time out after 4-5 pages. I'm open to other suggestions if people have ideas for ways to make the process easier. Tom ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org
Re: [ol-discuss] New power tool for author merges
This list is helpfull, but the work has still to be done by hand: For example http://www.freebase.com/view//m/05whw9w Freebase: Ann Williams (b. 1938), aka Ann Williams Gascon Mergeing: Ann Williams-Gascon (no year of birth, no bio) with Williams, Ann. (no year of birth, no bio) You first have to check all (!) 13 works of Ann Williams. Am 13.02.2011 02:34, schrieb Tom Morris: I've been totally unsuccessful in getting any of the OpenLibrary staff interested in the list of duplicate authors that I generated last spring, so I've decided to open it up to the community. I've modified my duplicate listing program to automatically generate an OpenLibrary author merge URL with all the duplicate IDs. If you are logged in to OpenLibrary and you click on the URL, it will take you to the author merge dialog page where you can select which authors should be merged, which one should be the master, etc. Please note that this is a *power* tool and should be used with great care. There *are* errors in the listing of duplicates, so you should review carefully the set of authors that are being proposed for merger to make sure it's accurate. I've done the first 50 or so, so you'll want to skip ahead in the list to find some that still need work. I'll see if I can enhance the program to skip authors who have already been processed, but for now if you click a link and end up on a page with just one author (or zero authors), that means someone else already took care of this author. Don't worry about running out of work, there are over 7,000 sets of duplicates (with 20K total records), so there'll be plenty for everyone to work on. Here's the tool: http://ol-dupes.freebaseapps.com/ Play carefully! Tom ___ Ol-discuss mailing list Ol-discuss@archive.org http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to ol-discuss-unsubscr...@archive.org