Thanks, guys! This is all really helpful. I have the update method written,
so now I'm just refining the tests and adding some utility methods to turn
Ruby pseudo-objects back into JSON objects.

On Fri, Mar 15, 2013 at 7:43 AM, Lee Passey <[email protected]> wrote:

> On 3/14/2013 7:17 PM, John Shutt wrote
> >  From Ben's bot, I think I have the answer my main question: You need to
> > back send /complete/ Open Library objects when saving, not just partial
> > objects with the modified fields. Is that correct?
>
> It might be helpful to understand how the OL archive is actually
> implemented.
>
> While OL data is technically stored in a relational database,
> practically it is not. The JSON object serialization that you get as a
> result of a query is what is actually stored in the database. When a
> "record" is updated no modifications are actually made to the record;
> instead, a new record is created with the new data serialized as a JSON
> object and stored as a BLOB (more accurately a TLOB) in a single field
> in the database record. The new record has the same OLID but a new
> time/date stamp so if you collect all the records with the same OLID you
> can determine the "current" record by looking at the timestamp.
>
> As a consequence of this design, there is no defined database schema--or
> perhaps it is more accurate to say that each and every record has its
> own schema which may or may not be similar to the schema of some other
> record. When OL decides to change the data stored for any particular
> record the JSON object reflects that change, but there is no
> modification to any previously stored object. Thus, the OL archive is
> full of all sorts of deprecated data, and some newer records contain
> data that some older records do not. This is not a problem if your only
> goal is to present "one web page per book," but it does make reuse of
> the data problematic for anything other than a single presentation for
> human viewing.
>
> This also explains why searching for changes can fail if performed too
> soon after an update: the design requires an indexing method external to
> the DBMS implementation. OL uses SOLR for this purpose. To completely
> reindex the archive you must read each record in the archive, parse the
> JSON object to create name/value pairs, then add each of these values to
> the stand-alone index. My experiments a few years ago demonstrated that
> on older hardware this process required a couple of days to complete. Of
> course, the process can be optimized by doing incremental updates where
> only those records are indexed which are new since the last time the
> indexing software is run; but this could also lead to false positives
> when the "current" record no longer contains a term that the index had
> previously recorded.
>
> My experiments also demonstrated that archive performance was just as
> good, if not better, if the JSON TLOBs where simple stored as files in
> the files system instead of as records in a database.
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> To unsubscribe from this mailing list, send email to
> [email protected]
>
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to