Brent, The records are mostly likely not MARC-8 or UTF-8. The example you shared looks like a Windows-1252 "smart" quote. I would not be surprised if the records have characters from multiple character sets in them. I've seen that before.
I don't have any useful suggestions for you, other than suggesting that staff not try to import records from those sources. Jason On 12/02/2016 04:52 PM, Brent Mills wrote: > Hello, > > I’ve recently noticed some issues with imported MARC records from a > specific set of Z39.50 servers. > > A noticeable amount of records that are imported through > Prospector/MaineCat targets have mangled characters when diacritics, > symbols,etc.. are present in the record. > > Does anyone have some ideas on what could be causing the character > encoding problems from these particular targets? Or run into this at > their own site? > > - dgo.conf has <charset>marc-8</charset>. changing that to usmarc, utf8 > has had no effect > - xml2marc-yaz.cfg is setup like described > in https://wiki.evergreen-ils.org/doku.php?id=evergreen-admin:sru_and_z39.50 > changing > the charset options hasn’t had any effect either > - the encoding/translation problems do not happen with OCLC and Library > of Congress targets, it seems to mainly affect servers with the INNOPAC > db type. I’m not sure if that’s related. > > Going through the logs I can see things like: > > open-ils.search.z3950.search_class: no mapping found for [0x80] at > position 56 in Kurt and Joe tangle with the most > determined enemy theyâve ever encountered when a ruthless > powerbroker schemes to build a new Egyptian empire as glorious as > those of the Pharaohs. Part of his plan rests on the manipulation of > a newly discovered aquifer beneath the Sahara, but an even > more devastating weapon at his disposal may threaten the entire > world: a plant extract known as the black mist, discovered in the > City of the Dead and rumored to have the power to take life from the > living and restore it to the dead. With the balance of power > in Africa and Europe on the verge of tipping, Kurt, Joe, and the > rest of the NUMA team will have to fight to discover the > truth behind the legendsâbut to do that, they have to confront in > person the greatest legend of them all: Osiris, the ruler of > the Egyptian underworld. g0=ASCII_DEFAULT g1=EXTENDED_LATIN at > /usr/share/perl5/MARC/Charset.pm line 308. > > > So I’m thinking something is happening in the MARC8 to UTF8 conversion? > > Attaching a screenshot of what it looks like in the Z39.50 Import > screen. The 264s have been the most obvious place to see the issue, but > it happens in any field with special characters. > > Been banging my head trying to figure out what’s causing this. Any help > would be appreciated! > > Thank you, > > -Brent > > ----------------------------- > > Brent Mills > Systems Librarian | Sage Library System > > email: [email protected] <mailto:[email protected]> > tickets: https://sagelib.org/support >
