Yes, we use both machine_comment and source_records fields for obtaining that info.
I took a dump of machine comments from the database and put it at: http://home.us.archive.org/~anand/openlibrary/2013-03-07-machine-comment.txt.gz It contains edition key and machine_comment columns. Hope that helps. Anand On 07-Mar-2013, at 3:58 AM, Tom Morris wrote: > It doesn't look like the source_record field is reliable. Only about 20% of > the editions (5.6M of 25M) have them. > > For example, this record came in from Amazon: > http://openlibrary.org/books/OL10000135M.json?v=1 > but has no source_field, although it is noted in the machine comment for the > change history. > > For the editions that do have source_records, 6.2M reference MARC records, > 1.8M came from IA, and 468K from Amazon (the overcount is due to editions > having multiple source records). > > If there's not a dump which includes the machine comments, perhaps someone > could create one so that we don't have to try and suck all this information > through the API. > > Tom > > On Wed, Mar 6, 2013 at 3:02 PM, Ben Companjen <[email protected]> wrote: > http://openlibrary.org/books/OL100043M/EX-IM_Bank_oversight_hearing?b=2&a=1&_compare=Compare&m=diff > shows that the ImportBot only removed periods (.) from subjects and > added the source record in revision 2. Indeed, the comment with the > change is "found a matching MARC record", but it's not added as a > machine comment. > > It looks like the source_records field may be more useful in this case. > > On 6 March 2013 20:36, Tom Morris <[email protected]> wrote: > > Thanks Ben! I tried ?rev=1, but didn't it didn't occur to me that the > > history mode would work. > > > > On closer inspection, I actually did find this information in the dump in > > the source_records field e.g. > > > > "source_records": [ > > "marc:marc_records_scriblio_net/part28.dat:62741961:1376", > > "marc:marc_loc_updates/v36.i33.records.utf8:3020091:1375" > > ] > > > > for http://openlibrary.org/books/OL100043M.json > > > > I think that will provide enough information to do a tally. The first > > source record matches the source listed in the initial import. I'm not sure > > what the second source record is since I don't see it referenced in the > > change history http://openlibrary.org/books/OL100043M.json?m=history > > > > Thanks for setting me on the right path! > > > > Tom > > > > On Wed, Mar 6, 2013 at 2:22 PM, Ben Companjen <[email protected]> > > wrote: > >> > >> Hi Tom, > >> > >> By replacing the title in the URL with ".json", which gives [0], I got > >> the JSON view of the history. This is the "History API" [1]. It looks > >> like there is a record for each revision, that stores modification > >> date and comment, amongst other things. The first revision has > >> "machine_comment": "bpl_marc/bpl114.mrc:36416736:831". I assume this > >> is the (only) place where source record links are stored (i.e. in > >> metarecords linked to normal records). > >> > >> I haven't looked at a complete dump with all revisions, but if you say > >> there is no history information included, I trust that that > >> information is not in the dump. I'm unaware of other dumps that could > >> include it. > >> > >> Perhaps the information can more easily be extracted from the database > >> underneath Infogami? > >> > >> Ben > >> > >> [0] http://openlibrary.org/books/OL13446224M.json?m=history > >> [1] http://openlibrary.org/dev/docs/restful_api#history > >> > >> On 6 March 2013 19:45, Tom Morris <[email protected]> wrote: > >> > If I look at the history for an edition, I can see the source MARC > >> > record > >> > and the contributor of that record associated with rev 1 of the history. > >> > > >> > > >> > http://openlibrary.org/books/OL13446224M/The_history_of_the_Yorubas?m=history > >> > > >> > For the above record, the links are to: > >> > > >> > http://openlibrary.org/show-records/bpl_marc/bpl114.mrc:36416736:831 > >> > http://archive.org/details/bpl_marc > >> > > >> > Where is this information stored? Is it available in the dump? (not as > >> > far > >> > as I can see) > >> > > >> > Also, while investigating this, I thought perhaps the recentchanges API > >> > might have the info (can't really tell), but it doesn't seem that the > >> > data > >> > that backs that API is available in dump form. Is that correct? Is it > >> > intentional? > >> > > >> > I'd like to use this information to figure out which are the largest > >> > sources > >> > of data to help prioritize vetting efforts. > >> > > >> > Tom > >> > > >> > _______________________________________________ > >> > Ol-tech mailing list > >> > [email protected] > >> > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > >> > To unsubscribe from this mailing list, send email to > >> > [email protected] > >> > > >> _______________________________________________ > >> Ol-tech mailing list > >> [email protected] > >> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > >> To unsubscribe from this mailing list, send email to > >> [email protected] > > > > > > > > _______________________________________________ > > Ol-tech mailing list > > [email protected] > > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > > To unsubscribe from this mailing list, send email to > > [email protected] > > > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > To unsubscribe from this mailing list, send email to > [email protected] > > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > To unsubscribe from this mailing list, send email to > [email protected]
_______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
