On 21 May 2012 21:18, Tom Morris <[email protected]> wrote:
> On Mon, May 21, 2012 at 1:36 PM, Karen Coyle <[email protected]> wrote:
>> On 5/21/12 8:29 AM, Tom Morris wrote:
>>>
>>> These problems may have been introduced upstream and, if so, won't be
>>> fixable because there's been too much information lost, but if they
>>> were introduced on import to OpenLibrary, they could be fixed by
>>> re-examining/converting the source record.
>>
>> It's unfortunately not unusual to receive bib data where the character
>> set is garbled, so this is a good avenue of exploration.
>
> It looks like all 34 records came from Talis.  The ones with spaces
> instead of accents (32 of them) are correctly encoded in the source
> but were corrupted on import to OpenLibrary.  (Cataloging source 040
>  $aSK$cSK$dUK-BiTAL)
>
> The record with the funky accents is misencoded in the source:
> http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2622438813:451
>
> =LDR  00451nam a22001692a 4500
> =001  6313e0175b35468e80d05404b2983fce
> =003  UK-BiTAL
> =005  20050705231026.0
> =008  750219s1957\\\\xxk\\\\\\\\\\\000\||eng|d
> =015  \\$aGB5705334$2bnb
> =035  \\$a()b5705334
> =040  \\$aUK-BiTAL$cUK-BiTAL$dUK-BiTAL
> =082  04$a823.91$218
> =100  1\$aBarcynska, Hel̇e`ne,$cCountess.
> =245  10$aAngel's eyes.
> =260  \\$bHurst & Blackett,$c1957.
> =300  \\$a192p.,19cm
>
> One record is missing an accent, but otherwise appears correctly
> encoded: 
> http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2640855886:441
> (cataloging source 040    $dBSS$dUK-BiTAL)
>
>>> Is there any way of finding the source for the alternate names (or
>>> what other author records Ben merged) to figure out where the problem
>>> was introduced?
>>
>> If you find the OL edition record, click on "history" in the upper
>> right, and the bottom row of history usually has a link to the original
>> MARC record.
>
> Unfortunately there's no link to the history for merged author
> records.  It's kind of a convoluted manual process, but you can get to
> this information by doing the following:
>
> 1. Append ?m=history to the merged author record to get a list of edits:
>
>    http://openlibrary.org/authors/OL5264776A.json?m=history
>
> 2. Extract the list of merged author records from .data.duplicates[]
>
> 3. Fetch the change histories for each of those
>
> 4. Parse the JSON and get first change record (last in the array)
>
> 5. Look at .data.machine_comment to get the reference to the original
> MARC record and look at it using a URL of the form:
>    
> http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2622438813:451
>
> 6. If you're a doubter like me, download the raw MARC using a URL of the form:
>      
> http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:2622438813:451?format=raw
>    and parse it using pymarc or other known software
>
> Except for the last step with pymarc, I did all of this using Google
> Refine so that I could quickly fetch and process all the JSON for all
> 34 records without having to break out Python.
>
> Hope that helps someone who next needs to explore this or a similar path!

This information should be in the documentation section of the
website. Not that many people need this on a daily basis, but together
with some parsing instructions this would make useful reference
information. Thanks for explaining!

It should be possible to program these steps (of course) or better
yet: have this information shown when looking at the author history
web page. Perhaps we should open an issue for this?

>
> Tom
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> To unsubscribe from this mailing list, send email to 
> [email protected]
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to