On Mon, May 21, 2012 at 1:36 PM, Karen Coyle <[email protected]> wrote: > On 5/21/12 8:29 AM, Tom Morris wrote: >> >> These problems may have been introduced upstream and, if so, won't be >> fixable because there's been too much information lost, but if they >> were introduced on import to OpenLibrary, they could be fixed by >> re-examining/converting the source record. > > It's unfortunately not unusual to receive bib data where the character > set is garbled, so this is a good avenue of exploration.
It looks like all 34 records came from Talis. The ones with spaces instead of accents (32 of them) are correctly encoded in the source but were corrupted on import to OpenLibrary. (Cataloging source 040 $aSK$cSK$dUK-BiTAL) The record with the funky accents is misencoded in the source: http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2622438813:451 =LDR 00451nam a22001692a 4500 =001 6313e0175b35468e80d05404b2983fce =003 UK-BiTAL =005 20050705231026.0 =008 750219s1957\\\\xxk\\\\\\\\\\\000\||eng|d =015 \\$aGB5705334$2bnb =035 \\$a()b5705334 =040 \\$aUK-BiTAL$cUK-BiTAL$dUK-BiTAL =082 04$a823.91$218 =100 1\$aBarcynska, Hel̇e`ne,$cCountess. =245 10$aAngel's eyes. =260 \\$bHurst & Blackett,$c1957. =300 \\$a192p.,19cm One record is missing an accent, but otherwise appears correctly encoded: http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2640855886:441 (cataloging source 040 $dBSS$dUK-BiTAL) >> Is there any way of finding the source for the alternate names (or >> what other author records Ben merged) to figure out where the problem >> was introduced? > > If you find the OL edition record, click on "history" in the upper > right, and the bottom row of history usually has a link to the original > MARC record. Unfortunately there's no link to the history for merged author records. It's kind of a convoluted manual process, but you can get to this information by doing the following: 1. Append ?m=history to the merged author record to get a list of edits: http://openlibrary.org/authors/OL5264776A.json?m=history 2. Extract the list of merged author records from .data.duplicates[] 3. Fetch the change histories for each of those 4. Parse the JSON and get first change record (last in the array) 5. Look at .data.machine_comment to get the reference to the original MARC record and look at it using a URL of the form: http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2622438813:451 6. If you're a doubter like me, download the raw MARC using a URL of the form: http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2622438813:451?format=raw and parse it using pymarc or other known software Except for the last step with pymarc, I did all of this using Google Refine so that I could quickly fetch and process all the JSON for all 34 records without having to break out Python. Hope that helps someone who next needs to explore this or a similar path! Tom _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
