It doesn't look like the source_record field is reliable.  Only about 20%
of the editions (5.6M of 25M) have them.

For example, this record came in from Amazon:
http://openlibrary.org/books/OL10000135M.json?v=1
but has no source_field, although it is noted in the machine comment for
the change history.

For the editions that do have source_records, 6.2M reference MARC records,
1.8M came from IA, and 468K from Amazon (the overcount is due to editions
having multiple source records).

If there's not a dump which includes the machine comments, perhaps someone
could create one so that we don't have to try and suck all this information
through the API.

Tom

On Wed, Mar 6, 2013 at 3:02 PM, Ben Companjen <[email protected]>wrote:

>
> http://openlibrary.org/books/OL100043M/EX-IM_Bank_oversight_hearing?b=2&a=1&_compare=Compare&m=diff
> shows that the ImportBot only removed periods (.) from subjects and
> added the source record in revision 2. Indeed, the comment with the
> change is "found a matching MARC record", but it's not added as a
> machine comment.
>
> It looks like the source_records field may be more useful in this case.
>
> On 6 March 2013 20:36, Tom Morris <[email protected]> wrote:
> > Thanks Ben!  I tried ?rev=1, but didn't it didn't occur to me that the
> > history mode would work.
> >
> > On closer inspection, I actually did find this information in the dump in
> > the source_records field e.g.
> >
> > "source_records": [
> >   "marc:marc_records_scriblio_net/part28.dat:62741961:1376",
> >   "marc:marc_loc_updates/v36.i33.records.utf8:3020091:1375"
> > ]
> >
> > for http://openlibrary.org/books/OL100043M.json
> >
> > I think that will provide enough information to do a tally.  The first
> > source record matches the source listed in the initial import.  I'm not
> sure
> > what the second source record is since I don't see it referenced in the
> > change history http://openlibrary.org/books/OL100043M.json?m=history
> >
> > Thanks for setting me on the right path!
> >
> > Tom
> >
> > On Wed, Mar 6, 2013 at 2:22 PM, Ben Companjen <[email protected]>
> > wrote:
> >>
> >> Hi Tom,
> >>
> >> By replacing the title in the URL with ".json", which gives [0], I got
> >> the JSON view of the history. This is the "History API" [1]. It looks
> >> like there is a record for each revision, that stores modification
> >> date and comment, amongst other things. The first revision has
> >> "machine_comment": "bpl_marc/bpl114.mrc:36416736:831". I assume this
> >> is the (only) place where source record links are stored (i.e. in
> >> metarecords linked to normal records).
> >>
> >> I haven't looked at a complete dump with all revisions, but if you say
> >> there is no history information included, I trust that that
> >> information is not in the dump. I'm unaware of other dumps that could
> >> include it.
> >>
> >> Perhaps the information can more easily be extracted from the database
> >> underneath Infogami?
> >>
> >> Ben
> >>
> >> [0] http://openlibrary.org/books/OL13446224M.json?m=history
> >> [1] http://openlibrary.org/dev/docs/restful_api#history
> >>
> >> On 6 March 2013 19:45, Tom Morris <[email protected]> wrote:
> >> > If I look at the history for an edition, I can see the source MARC
> >> > record
> >> > and the contributor of that record associated with rev 1 of the
> history.
> >> >
> >> >
> >> >
> http://openlibrary.org/books/OL13446224M/The_history_of_the_Yorubas?m=history
> >> >
> >> > For the above record, the links are to:
> >> >
> >> > http://openlibrary.org/show-records/bpl_marc/bpl114.mrc:36416736:831
> >> > http://archive.org/details/bpl_marc
> >> >
> >> > Where is this information stored?  Is it available in the dump? (not
> as
> >> > far
> >> > as I can see)
> >> >
> >> > Also, while investigating this, I thought perhaps the recentchanges
> API
> >> > might have the info (can't really tell), but it doesn't seem that the
> >> > data
> >> > that backs that API is available in dump form.  Is that correct?  Is
> it
> >> > intentional?
> >> >
> >> > I'd like to use this information to figure out which are the largest
> >> > sources
> >> > of data to help prioritize vetting efforts.
> >> >
> >> > Tom
> >> >
> >> > _______________________________________________
> >> > Ol-tech mailing list
> >> > [email protected]
> >> > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> >> > To unsubscribe from this mailing list, send email to
> >> > [email protected]
> >> >
> >> _______________________________________________
> >> Ol-tech mailing list
> >> [email protected]
> >> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> >> To unsubscribe from this mailing list, send email to
> >> [email protected]
> >
> >
> >
> > _______________________________________________
> > Ol-tech mailing list
> > [email protected]
> > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> > To unsubscribe from this mailing list, send email to
> > [email protected]
> >
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> To unsubscribe from this mailing list, send email to
> [email protected]
>
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to