Hi Dave,

The dump file formats are partially explained on
http://openlibrary.org/developers/dumps (i.e. the lines are tab
separated, with records at the end), partially documented by record
type on http://openlibrary.org/type and partially undocumented (there
is an issue on GitHub for this:
https://github.com/internetarchive/openlibrary/issues/100).
In the past months I have created some statistics for each dump (not a
complete dump with all revisions, just the most recent revisions). For
September's datadump it's here: https://gist.github.com/3863520 (with
a short readme). It doesn't give a full file structure but gives an
impression what you can find in the dumps next to the "ideal" record
format defined by http://openlibrary.org/type.

Regards,

Ben

On 8 October 2012 18:45, Dave Holmes-Kinsella <[email protected]> wrote:
> Hi:
>
> I've gotten hol dog the 6GB or so of tarred goodness that is the OL dump
> from July of this year. Trying to parse my way through it with Python and
> MySQL.
>
> Questions are
> a) what's the difference between Works and Editions?
> b) can anyone elaborate the file formats for me?
>
> Am wiilling.happy to turn into documents or class/libraries in return for
> wisdom of same.
>
> If it helps, I'm 4 blocks  from Internet Archive and am willing to bribe
> with coffee and/or chicken wings from that place on 16th and Clement…
>
> --- dhk
> [email protected]
>
>
>
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> To unsubscribe from this mailing list, send email to
> [email protected]
>
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to