Yes, we use both machine_comment and source_records fields for obtaining that 
info.

I took a dump of machine comments from the database and put it at:

http://home.us.archive.org/~anand/openlibrary/2013-03-07-machine-comment.txt.gz

It contains edition key and machine_comment columns. 

Hope that helps.

Anand


On 07-Mar-2013, at 3:58 AM, Tom Morris wrote:

> It doesn't look like the source_record field is reliable.  Only about 20% of 
> the editions (5.6M of 25M) have them.
> 
> For example, this record came in from Amazon: 
> http://openlibrary.org/books/OL10000135M.json?v=1
> but has no source_field, although it is noted in the machine comment for the 
> change history.
> 
> For the editions that do have source_records, 6.2M reference MARC records, 
> 1.8M came from IA, and 468K from Amazon (the overcount is due to editions 
> having multiple source records).
> 
> If there's not a dump which includes the machine comments, perhaps someone 
> could create one so that we don't have to try and suck all this information 
> through the API.
> 
> Tom
> 
> On Wed, Mar 6, 2013 at 3:02 PM, Ben Companjen <[email protected]> wrote:
> http://openlibrary.org/books/OL100043M/EX-IM_Bank_oversight_hearing?b=2&a=1&_compare=Compare&m=diff
> shows that the ImportBot only removed periods (.) from subjects and
> added the source record in revision 2. Indeed, the comment with the
> change is "found a matching MARC record", but it's not added as a
> machine comment.
> 
> It looks like the source_records field may be more useful in this case.
> 
> On 6 March 2013 20:36, Tom Morris <[email protected]> wrote:
> > Thanks Ben!  I tried ?rev=1, but didn't it didn't occur to me that the
> > history mode would work.
> >
> > On closer inspection, I actually did find this information in the dump in
> > the source_records field e.g.
> >
> > "source_records": [
> >   "marc:marc_records_scriblio_net/part28.dat:62741961:1376",
> >   "marc:marc_loc_updates/v36.i33.records.utf8:3020091:1375"
> > ]
> >
> > for http://openlibrary.org/books/OL100043M.json
> >
> > I think that will provide enough information to do a tally.  The first
> > source record matches the source listed in the initial import.  I'm not sure
> > what the second source record is since I don't see it referenced in the
> > change history http://openlibrary.org/books/OL100043M.json?m=history
> >
> > Thanks for setting me on the right path!
> >
> > Tom
> >
> > On Wed, Mar 6, 2013 at 2:22 PM, Ben Companjen <[email protected]>
> > wrote:
> >>
> >> Hi Tom,
> >>
> >> By replacing the title in the URL with ".json", which gives [0], I got
> >> the JSON view of the history. This is the "History API" [1]. It looks
> >> like there is a record for each revision, that stores modification
> >> date and comment, amongst other things. The first revision has
> >> "machine_comment": "bpl_marc/bpl114.mrc:36416736:831". I assume this
> >> is the (only) place where source record links are stored (i.e. in
> >> metarecords linked to normal records).
> >>
> >> I haven't looked at a complete dump with all revisions, but if you say
> >> there is no history information included, I trust that that
> >> information is not in the dump. I'm unaware of other dumps that could
> >> include it.
> >>
> >> Perhaps the information can more easily be extracted from the database
> >> underneath Infogami?
> >>
> >> Ben
> >>
> >> [0] http://openlibrary.org/books/OL13446224M.json?m=history
> >> [1] http://openlibrary.org/dev/docs/restful_api#history
> >>
> >> On 6 March 2013 19:45, Tom Morris <[email protected]> wrote:
> >> > If I look at the history for an edition, I can see the source MARC
> >> > record
> >> > and the contributor of that record associated with rev 1 of the history.
> >> >
> >> >
> >> > http://openlibrary.org/books/OL13446224M/The_history_of_the_Yorubas?m=history
> >> >
> >> > For the above record, the links are to:
> >> >
> >> > http://openlibrary.org/show-records/bpl_marc/bpl114.mrc:36416736:831
> >> > http://archive.org/details/bpl_marc
> >> >
> >> > Where is this information stored?  Is it available in the dump? (not as
> >> > far
> >> > as I can see)
> >> >
> >> > Also, while investigating this, I thought perhaps the recentchanges API
> >> > might have the info (can't really tell), but it doesn't seem that the
> >> > data
> >> > that backs that API is available in dump form.  Is that correct?  Is it
> >> > intentional?
> >> >
> >> > I'd like to use this information to figure out which are the largest
> >> > sources
> >> > of data to help prioritize vetting efforts.
> >> >
> >> > Tom
> >> >
> >> > _______________________________________________
> >> > Ol-tech mailing list
> >> > [email protected]
> >> > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> >> > To unsubscribe from this mailing list, send email to
> >> > [email protected]
> >> >
> >> _______________________________________________
> >> Ol-tech mailing list
> >> [email protected]
> >> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> >> To unsubscribe from this mailing list, send email to
> >> [email protected]
> >
> >
> >
> > _______________________________________________
> > Ol-tech mailing list
> > [email protected]
> > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> > To unsubscribe from this mailing list, send email to
> > [email protected]
> >
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> To unsubscribe from this mailing list, send email to 
> [email protected]
> 
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> To unsubscribe from this mailing list, send email to 
> [email protected]

_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to