Hi Dave, The dump file formats are partially explained on http://openlibrary.org/developers/dumps (i.e. the lines are tab separated, with records at the end), partially documented by record type on http://openlibrary.org/type and partially undocumented (there is an issue on GitHub for this: https://github.com/internetarchive/openlibrary/issues/100). In the past months I have created some statistics for each dump (not a complete dump with all revisions, just the most recent revisions). For September's datadump it's here: https://gist.github.com/3863520 (with a short readme). It doesn't give a full file structure but gives an impression what you can find in the dumps next to the "ideal" record format defined by http://openlibrary.org/type.
Regards, Ben On 8 October 2012 18:45, Dave Holmes-Kinsella <[email protected]> wrote: > Hi: > > I've gotten hol dog the 6GB or so of tarred goodness that is the OL dump > from July of this year. Trying to parse my way through it with Python and > MySQL. > > Questions are > a) what's the difference between Works and Editions? > b) can anyone elaborate the file formats for me? > > Am wiilling.happy to turn into documents or class/libraries in return for > wisdom of same. > > If it helps, I'm 4 blocks from Internet Archive and am willing to bribe > with coffee and/or chicken wings from that place on 16th and Clement… > > --- dhk > [email protected] > > > > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > To unsubscribe from this mailing list, send email to > [email protected] > _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
