Just had a chance to get back to this. Sorry for the long delay. On Sat, May 1, 2010 at 1:14 AM, Michael Engel <[email protected]> wrote: > Freebase id: /m/05wk45p > Author name: Don Dinkmeyer > Aliases: > Don Dinkmeyer Jr.,Don Dinkmeyer Sr.,Don Sr Dinkmeyer, > Open Library records: > OL2624799A,OL302305A,OL2757673A,OL2757574A,OL2686700A, > > Looks like the Junior and the Senior are two different authors, see one > example:
Good catch. I certainly didn't mean to imply that I think Freebase is error-free. I think it's generally higher quality than what's in Open Library, but not in this case. I think it also provides a nice combination of machine-powered and human powered-reconciliation processes. At a minimum though, the listing can be used to identify areas that need cleanup. There were actually two Freebase records and six Open Library records for what is, most likely, two authors: Freebase name: Don Dinkmeyer http://www.freebase.com/view/m/05wk45p Don Dinkmeyer http://openlibrary.org/a/OL2624799A Dinkmeyer, Don C. http://openlibrary.org/a/OL302305A Don Dinkmeyer Jr. http://openlibrary.org/a/OL2757673A (0 books) Don Dinkmeyer Sr. http://openlibrary.org/a/OL2757574A Don Sr Dinkeyer http://openlibrary.org/a/OL2686700A Freebase name: Don C Dinkmeyer http://www.freebase.com/view/m/05wyhcb Don C Dinkmeyer http://openlibrary.org/a/OL3821345A The Don Dinkmeyer Jr author record on Open Library has no books associated with it, so I'm not even sure why it got created. Some of the other OL records (e.g. Don Sr Dinkeyer) were obviously munged at some stage in the processing pipe before getting to Freebase (perhaps before getting to Open Library too). It doesn't look like any of the Freebase community edited the conflated record, so that's all apparently the result of overly aggressive machine-based merging. I flagged the two separate records for merger, which has since been voted on and completed, but now comes the hard part - teasing apart the two authors. I looked at the LoC and WorldCat and they do not appear to use Jr. and Sr. at all. They use "Don Dinkmeyer" for the father, presumably because he was the first and only at the time, and "Don Dinkmeyer, 1958-" for the son. This is apparently a variation on the bizarre cataloging practices that librarians use, discussed a while back by Karen. (Why not birth years for both? Why not Sr./Jr.? Why not ...?) Here are the LoC authority records: Dinkmeyer, Don C. [They know the birth date and the fact that he's Sr., but don't include it in the main heading] http://authorities.loc.gov/cgi-bin/Pwebrecon.cgi?AuthRecID=2362233&v1=1&HC=1&SEQ=20100501103657&PID=W1x3SwNKrlJizonRsJ0SQ7NKGR91 Dinkmeyer, Don C., 1952- http://authorities.loc.gov/cgi-bin/Pwebrecon.cgi?AuthRecID=946372&v1=1&HC=1&SEQ=20100501103549&PID=DvTGLGauLzedNKB8tuqgiZXm6K7Y There's more strangeness in the Open Library records for one of the books co-authored with Gary McKay, STET (http://openlibrary.org/books/OL11407090M/Stet). The database lists the wrong Gary McKay (combat author http://openlibrary.org/authors/OL370554A/Gary_McKay) on the book, but if you click through to the author page, the book isn't listed, so the database is internally inconsistent. I'm sure if you continued to browse around you could find other problems, but I'm less concerned about how bad the data is than with a) how it got that way and, more importantly, b) how it can be cleaned up. Unfortunately, there doesn't appear to be much forthcoming in the way of concrete plans. Tom p.s. Only tangentially related, but one of the cool things about Freebase is that it's not limited to books and authors, so you can now see father and son linked to each other and the other son James S. Dinkmeyer and the book series article from Wikipedia is linked in as well. Over time this mesh of data should get denser. It's most interesting for people whom writing isn't their primary profession - naval architects who mainly design fast sailboats, but also write about how to do it, etc. _______________________________________________ Ol-discuss mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to [email protected]
