p.s. Depending on what type of analysis you're trying to do, you may find
that you can get what you need using jq, cut, sort, etc from the command
line quicker than messing around with a database.

For example, here's a one-liner that will tell you the breakdown between
paperbacks and hardcovers for the editions:

$ zgrep -E ^/type/edition ol_dump_2015-10-31.txt.gz | cut -f 5 | jq
.physical_format | sort | uniq -c | sort -r -n | head

Alternatively, you could dump the JSON into ElasticSearch or a "NoSQL"
database that has JSON documents as its native format.

Tom

On Fri, Feb 5, 2016 at 12:31 PM, Tom Morris <[email protected]> wrote:

> [this just arrived despite being postmarked 3 days ago - held for
> moderation perhaps?]
>
> On Tue, Feb 2, 2016 at 2:15 PM, jason buckner <[email protected]>
> wrote:
>
>> I have downloaded the latest complete openlibrary data dump from
>> http://openlibrary.org/data/ol_dump_latest.txt.gz, and I am hoping to
>> import this data to a local postgres db instance for some raw SQL
>> analysis.  I have created the postgres db instance and then created the
>> schema using the schema.sql from the github developer instance bootstrap.
>> The data dump, however, does not seem to be able to be imported without
>> some type of transform being applied.  I am curious if there is a standard
>> protocol for such a transform and import process?
>>
>
> I see several schema.sql files, but none of them match the dump.  The
> format of the dump is documented here:
> https://openlibrary.org/developers/dumps
>
> Did you follow a link to the dump from somewhere else that doesn't have
> the documentation?  If so, we should fix it up.
>
> Tom
>
>
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
Archives: http://www.mail-archive.com/[email protected]/
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to