On 21 May 2012 21:18, Tom Morris <[email protected]> wrote: > On Mon, May 21, 2012 at 1:36 PM, Karen Coyle <[email protected]> wrote: >> On 5/21/12 8:29 AM, Tom Morris wrote: >>> >>> These problems may have been introduced upstream and, if so, won't be >>> fixable because there's been too much information lost, but if they >>> were introduced on import to OpenLibrary, they could be fixed by >>> re-examining/converting the source record. >> >> It's unfortunately not unusual to receive bib data where the character >> set is garbled, so this is a good avenue of exploration. > > It looks like all 34 records came from Talis. The ones with spaces > instead of accents (32 of them) are correctly encoded in the source > but were corrupted on import to OpenLibrary. (Cataloging source 040 > $aSK$cSK$dUK-BiTAL) > > The record with the funky accents is misencoded in the source: > http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2622438813:451 > > =LDR 00451nam a22001692a 4500 > =001 6313e0175b35468e80d05404b2983fce > =003 UK-BiTAL > =005 20050705231026.0 > =008 750219s1957\\\\xxk\\\\\\\\\\\000\||eng|d > =015 \\$aGB5705334$2bnb > =035 \\$a()b5705334 > =040 \\$aUK-BiTAL$cUK-BiTAL$dUK-BiTAL > =082 04$a823.91$218 > =100 1\$aBarcynska, Hel̇e`ne,$cCountess. > =245 10$aAngel's eyes. > =260 \\$bHurst & Blackett,$c1957. > =300 \\$a192p.,19cm > > One record is missing an accent, but otherwise appears correctly > encoded: > http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2640855886:441 > (cataloging source 040 $dBSS$dUK-BiTAL) > >>> Is there any way of finding the source for the alternate names (or >>> what other author records Ben merged) to figure out where the problem >>> was introduced? >> >> If you find the OL edition record, click on "history" in the upper >> right, and the bottom row of history usually has a link to the original >> MARC record. > > Unfortunately there's no link to the history for merged author > records. It's kind of a convoluted manual process, but you can get to > this information by doing the following: > > 1. Append ?m=history to the merged author record to get a list of edits: > > http://openlibrary.org/authors/OL5264776A.json?m=history > > 2. Extract the list of merged author records from .data.duplicates[] > > 3. Fetch the change histories for each of those > > 4. Parse the JSON and get first change record (last in the array) > > 5. Look at .data.machine_comment to get the reference to the original > MARC record and look at it using a URL of the form: > > http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2622438813:451 > > 6. If you're a doubter like me, download the raw MARC using a URL of the form: > > http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2622438813:451?format=raw > and parse it using pymarc or other known software > > Except for the last step with pymarc, I did all of this using Google > Refine so that I could quickly fetch and process all the JSON for all > 34 records without having to break out Python. > > Hope that helps someone who next needs to explore this or a similar path!
This information should be in the documentation section of the website. Not that many people need this on a daily basis, but together with some parsing instructions this would make useful reference information. Thanks for explaining! It should be possible to program these steps (of course) or better yet: have this information shown when looking at the author history web page. Perhaps we should open an issue for this? > > Tom > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > To unsubscribe from this mailing list, send email to > [email protected] _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
