Hi all, I was cross-linking some files from the latest editions and works data dumps, and I noticed that there are a number of frequently-reprinted works that seem to have an internally inconsistent 'first_publish_date' entry in the dump files. For example, the works web page for The Wings of the Dove<http://openlibrary.org/works/OL276397W/The_wings_of_the_dove>correctly gives 1902 as the first publication, but the json file, <http://openlibrary.org/works/OL276397W.json> the rdf, and the data dump all say 1999. Similar problems (with different years, obviously) seem to exist for a lot of James' works, and more rarely for other authors. (I count about 1,800 editions out of the 800,000 entries I've pulled from the data dump that have a first_publish_date that is later than the edition publish date: it looks like the works field, not the edition one, is usually the culprit.) This was true back in the January dump too.
Is there a reason for this? If not, could all the files to reflect whatever data the web pages use to determine earliest date, if that's actually of higher quality? (I assume in most cases it's just pulling the earliest published date from the associated edition records?) Thanks to everyone for their contributions, I've been really impressed by everything so far. Apologies if I'm missing something obvious here about what these fields are supposed to mean, etc. Ben
_______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
