Hi all,

I was cross-linking some files from the latest editions and works data
dumps, and I noticed that there are a number of frequently-reprinted works
that seem to have an internally inconsistent 'first_publish_date' entry in
the dump files. For example, the works web page for The Wings of the
Dove<http://openlibrary.org/works/OL276397W/The_wings_of_the_dove>correctly
gives 1902 as the first publication, but the json
file, <http://openlibrary.org/works/OL276397W.json> the rdf, and the data
dump all say 1999. Similar problems (with different years, obviously) seem
to exist for a lot of James' works, and more rarely for other authors. (I
count about 1,800 editions out of the 800,000 entries I've pulled from the
data dump that have a first_publish_date that is later than the edition
publish date: it looks like the works field, not the edition one, is usually
the culprit.) This was true back in the January dump too.

Is there a reason for this? If not, could all the files to reflect whatever
data the web pages use to determine earliest date, if that's actually of
higher quality? (I assume in most cases it's just pulling the earliest
published date from the associated edition records?)

Thanks to everyone for their contributions, I've been really impressed by
everything so far. Apologies if I'm missing something obvious here about
what these fields are supposed to mean, etc.

Ben
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to