Interesting - thank you again!

Would also be nice to make a note of the freebase ID/link in our author 
description field...

g

Tom Morris wrote:
> On Thu, Apr 29, 2010 at 1:48 PM, George Oates <[email protected]> wrote:
>> Super fantastic, Tom!!! Thanks!
>> How many pages are there?
> 
> The web app seems to run into query timeouts around 5 or 6 pages,
> perhaps because of the way I'm sorting things, but the grand total is
> more than you want to be paging through anyway.  I count 7138 authors
> after de-duping (18,445 records on the OL side).
> 
> Here's the histogram of counts by number of duplicates:
> 
> 2 6496
> 3   494
> 4   89
> 5   32
> 6   10
> 7    9
> 8    3
> 9    3
> 11  1
> 12  1
> 
> It's trivial to generate a file of these dupes, but I'd also like to
> figure out how this evolves going forward (ie as the Freebase
> community identifies additional merges).
> 
> Tom
>>
>> Tom Morris wrote:
>>> Rather than just complain about the data quality, here's a small
>>> contribution to help improve it.  I put together a little application
>>> which shows all authors who have multiple Open Library author records,
>>> as identified by the Freebase community.
>>>
>>> You can find it at http://ol-dupes.freebaseapps.com/authors
>>>
>>> The list is sorted by from most to least number of duplicates and each
>>> entry is linked to all OL records as well as the Freebase record.
>>> Freebase uses a slightly different schema, so the authors are linked
>>> to Books ("works" in FRBR lingo) and those are linked to Book Editions
>>> which equate to the Open Library book records.
>>>
>>> I also included all the known names for the authors.  Most of these
>>> will have come from the merger of multiple records.  I haven't looked
>>> in detail, but it wouldn't surprise me if some of the bad names are
>>> from munging on the Freebase side of things.  You can see what the
>>> name associated with each OL record is by clicking on the ID link.
>>>
>>> The app is better for browsing than actual data cleanup, but I'd be
>>> happy to show someone how to extract the data in a form that could be
>>> used in the OL processes (or do it for you).  The app is BSD licensed
>>> so anyone's free to hack on it as well.
>>>
>>> Tom
>>> _______________________________________________
>>> Ol-discuss mailing list
>>> [email protected]
>>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
>>> To unsubscribe from this mailing list, send email to 
>>> [email protected]
>> _______________________________________________
>> Ol-discuss mailing list
>> [email protected]
>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
>> To unsubscribe from this mailing list, send email to 
>> [email protected]
>>
> _______________________________________________
> Ol-discuss mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
> To unsubscribe from this mailing list, send email to 
> [email protected]
_______________________________________________
Ol-discuss mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to