Rather than just complain about the data quality, here's a small contribution to help improve it. I put together a little application which shows all authors who have multiple Open Library author records, as identified by the Freebase community.
You can find it at http://ol-dupes.freebaseapps.com/authors The list is sorted by from most to least number of duplicates and each entry is linked to all OL records as well as the Freebase record. Freebase uses a slightly different schema, so the authors are linked to Books ("works" in FRBR lingo) and those are linked to Book Editions which equate to the Open Library book records. I also included all the known names for the authors. Most of these will have come from the merger of multiple records. I haven't looked in detail, but it wouldn't surprise me if some of the bad names are from munging on the Freebase side of things. You can see what the name associated with each OL record is by clicking on the ID link. The app is better for browsing than actual data cleanup, but I'd be happy to show someone how to extract the data in a form that could be used in the OL processes (or do it for you). The app is BSD licensed so anyone's free to hack on it as well. Tom _______________________________________________ Ol-discuss mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to [email protected]
