Re: [ol-tech] Accents in author names

Tom Morris Mon, 21 May 2012 12:19:01 -0700

On Mon, May 21, 2012 at 1:36 PM, Karen Coyle <[email protected]> wrote:
> On 5/21/12 8:29 AM, Tom Morris wrote:
>>
>> These problems may have been introduced upstream and, if so, won't be
>> fixable because there's been too much information lost, but if they
>> were introduced on import to OpenLibrary, they could be fixed by
>> re-examining/converting the source record.
>
> It's unfortunately not unusual to receive bib data where the character
> set is garbled, so this is a good avenue of exploration.


It looks like all 34 records came from Talis.  The ones with spaces
instead of accents (32 of them) are correctly encoded in the source
but were corrupted on import to OpenLibrary.  (Cataloging source 040
 $aSK$cSK$dUK-BiTAL)

The record with the funky accents is misencoded in the source:
http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2622438813:451

=LDR  00451nam a22001692a 4500
=001  6313e0175b35468e80d05404b2983fce
=003  UK-BiTAL
=005  20050705231026.0
=008  750219s1957\\\\xxk\\\\\\\\\\\000\||eng|d
=015  \\$aGB5705334$2bnb
=035  \\$a()b5705334
=040  \\$aUK-BiTAL$cUK-BiTAL$dUK-BiTAL
=082  04$a823.91$218
=100  1\$aBarcynska, Hel̇e`ne,$cCountess.
=245  10$aAngel's eyes.
=260  \\$bHurst & Blackett,$c1957.
=300  \\$a192p.,19cm

One record is missing an accent, but otherwise appears correctly
encoded: 
http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2640855886:441
(cataloging source 040    $dBSS$dUK-BiTAL)

>> Is there any way of finding the source for the alternate names (or
>> what other author records Ben merged) to figure out where the problem
>> was introduced?
>
> If you find the OL edition record, click on "history" in the upper
> right, and the bottom row of history usually has a link to the original
> MARC record.

Unfortunately there's no link to the history for merged author
records.  It's kind of a convoluted manual process, but you can get to
this information by doing the following:

1. Append ?m=history to the merged author record to get a list of edits:

    http://openlibrary.org/authors/OL5264776A.json?m=history

2. Extract the list of merged author records from .data.duplicates[]

3. Fetch the change histories for each of those

4. Parse the JSON and get first change record (last in the array)

5. Look at .data.machine_comment to get the reference to the original
MARC record and look at it using a URL of the form:
    
http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2622438813:451

6. If you're a doubter like me, download the raw MARC using a URL of the form:
      
http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2622438813:451?format=raw
    and parse it using pymarc or other known software

Except for the last step with pymarc, I did all of this using Google
Refine so that I could quickly fetch and process all the JSON for all
34 records without having to break out Python.

Hope that helps someone who next needs to explore this or a similar path!

Tom
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Re: [ol-tech] Accents in author names

Reply via email to