On Thu, Apr 29, 2010 at 1:48 PM, George Oates <[email protected]> wrote: > Super fantastic, Tom!!! Thanks! > How many pages are there?
The web app seems to run into query timeouts around 5 or 6 pages, perhaps because of the way I'm sorting things, but the grand total is more than you want to be paging through anyway. I count 7138 authors after de-duping (18,445 records on the OL side). Here's the histogram of counts by number of duplicates: 2 6496 3 494 4 89 5 32 6 10 7 9 8 3 9 3 11 1 12 1 It's trivial to generate a file of these dupes, but I'd also like to figure out how this evolves going forward (ie as the Freebase community identifies additional merges). Tom > > > Tom Morris wrote: >> Rather than just complain about the data quality, here's a small >> contribution to help improve it. I put together a little application >> which shows all authors who have multiple Open Library author records, >> as identified by the Freebase community. >> >> You can find it at http://ol-dupes.freebaseapps.com/authors >> >> The list is sorted by from most to least number of duplicates and each >> entry is linked to all OL records as well as the Freebase record. >> Freebase uses a slightly different schema, so the authors are linked >> to Books ("works" in FRBR lingo) and those are linked to Book Editions >> which equate to the Open Library book records. >> >> I also included all the known names for the authors. Most of these >> will have come from the merger of multiple records. I haven't looked >> in detail, but it wouldn't surprise me if some of the bad names are >> from munging on the Freebase side of things. You can see what the >> name associated with each OL record is by clicking on the ID link. >> >> The app is better for browsing than actual data cleanup, but I'd be >> happy to show someone how to extract the data in a form that could be >> used in the OL processes (or do it for you). The app is BSD licensed >> so anyone's free to hack on it as well. >> >> Tom >> _______________________________________________ >> Ol-discuss mailing list >> [email protected] >> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss >> To unsubscribe from this mailing list, send email to >> [email protected] > _______________________________________________ > Ol-discuss mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss > To unsubscribe from this mailing list, send email to > [email protected] > _______________________________________________ Ol-discuss mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to [email protected]
