Re: [ol-discuss] New power tool for author merges

2011-02-13 Thread Karen Coyle

Thanks, Tom. I took a quick look at a few pages of your list, and have  
2 immediate observations:

1 - there are a LOT of corporate authors, and most of those will not  
have come in on library records (the ones on library records we moved  
to the contributor field). This makes me wonder if we don't want to  
do something consistent here -- moving these to contributor? Or are  
folks comfortable with corporate names in the author field?

2 - of the ones I looked at, a number have already been merged. (Yeah  
OL users!). You wouldn't by any chance want to update this list? (she  
asks sheepishly)

kc

Quoting Tom Morris tfmor...@gmail.com:

 I've been totally unsuccessful in getting any of the OpenLibrary staff
 interested in the list of duplicate authors that I generated last
 spring, so I've decided to open it up to the community.

 I've modified my duplicate listing program to automatically generate
 an OpenLibrary author merge URL with all the duplicate IDs.  If you
 are logged in to OpenLibrary and you click on the URL, it will take
 you to the author merge dialog page where you can select which authors
 should be merged, which one should be the master, etc.

 Please note that this is a *power* tool and should be used with great
 care.  There *are* errors in the listing of duplicates, so you should
 review carefully the set of authors that are being proposed for merger
 to make sure it's accurate.

 I've done the first 50 or so, so you'll want to skip ahead in the list
 to find some that still need work.  I'll see if I can enhance the
 program to skip authors who have already been processed, but for now
 if you click a link and end up on a page with just one author (or zero
 authors), that means someone else already took care of this author.
 Don't worry about running out of work, there are over 7,000 sets of
 duplicates (with 20K total records), so there'll be plenty for
 everyone to work on.

 Here's the tool: http://ol-dupes.freebaseapps.com/

 Play carefully!

 Tom
 ___
 Ol-discuss mailing list
 Ol-discuss@archive.org
 http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
 To unsubscribe from this mailing list, send email to  
 ol-discuss-unsubscr...@archive.org




-- 
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

___
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org


Re: [ol-discuss] New power tool for author merges

2011-02-13 Thread Tom Morris
On Sun, Feb 13, 2011 at 11:39 AM, Karen Coyle kco...@kcoyle.net wrote:

 2 - of the ones I looked at, a number have already been merged. (Yeah
 OL users!). You wouldn't by any chance want to update this list? (she
 asks sheepishly)

Yes, in addition the ones the OL users had already done (mostly famous
authors), I've also churned through probably a couple of hundred by
now.

There are two updates in the works:

1. Removing merges which have already been processed.  A perfectly
reasonable request, so no need to be sheepish, and I already collected
redirect data last night to do this.  Because this is driven off the
live Freebase data, I'm waiting on comments from the Freebase
community about removing the dead OL author keys (ie the ones that
redirect now).  If that's not forthcoming, I'll use a separate list.

2. Removing the sorting by duplicate count. This causes performance
problems that cause the app to time out after 4-5 pages.

I'm open to other suggestions if people have ideas for ways to make
the process easier.

Tom
___
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org


Re: [ol-discuss] New power tool for author merges

2011-02-12 Thread Patrick Conley
This list is helpfull, but the work has still to be done by hand:

For example http://www.freebase.com/view//m/05whw9w

Freebase: Ann Williams (b. 1938), aka Ann Williams Gascon

Mergeing:
Ann Williams-Gascon (no year of birth, no bio)
with Williams, Ann. (no year of birth, no bio)


You first have to check all (!) 13 works of Ann Williams.


Am 13.02.2011 02:34, schrieb Tom Morris:
 I've been totally unsuccessful in getting any of the OpenLibrary staff
 interested in the list of duplicate authors that I generated last
 spring, so I've decided to open it up to the community.

 I've modified my duplicate listing program to automatically generate
 an OpenLibrary author merge URL with all the duplicate IDs.  If you
 are logged in to OpenLibrary and you click on the URL, it will take
 you to the author merge dialog page where you can select which authors
 should be merged, which one should be the master, etc.

 Please note that this is a *power* tool and should be used with great
 care.  There *are* errors in the listing of duplicates, so you should
 review carefully the set of authors that are being proposed for merger
 to make sure it's accurate.

 I've done the first 50 or so, so you'll want to skip ahead in the list
 to find some that still need work.  I'll see if I can enhance the
 program to skip authors who have already been processed, but for now
 if you click a link and end up on a page with just one author (or zero
 authors), that means someone else already took care of this author.
 Don't worry about running out of work, there are over 7,000 sets of
 duplicates (with 20K total records), so there'll be plenty for
 everyone to work on.

 Here's the tool: http://ol-dupes.freebaseapps.com/

 Play carefully!

 Tom

___
Ol-discuss mailing list
Ol-discuss@archive.org
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
ol-discuss-unsubscr...@archive.org