Jason and Mike, Thanks so much for the help! Glad to know that it’s a remote issue and not something set up incorrectly on our side.
-Brent ----------------------------- Brent Mills Systems Librarian | Sage Library System email: br...@hoodriverlibrary.org tickets: https://sagelib.org/support phone: 541.610.8384 > On Dec 2, 2016, at 2:30 PM, Mike Rylander <mrylan...@gmail.com> wrote: > > Jason hit on (almost certainly) the answer: bad records from sources that > don't restrict cataloging to valid character sets. I'll add a couple > comments below for general clarification, as well... > > On Fri, Dec 2, 2016 at 4:52 PM, Brent Mills <br...@hoodriverlibrary.org > <mailto:br...@hoodriverlibrary.org>> wrote: > Hello, > > I’ve recently noticed some issues with imported MARC records from a specific > set of Z39.50 servers. > > A noticeable amount of records that are imported through Prospector/MaineCat > targets have mangled characters when diacritics, symbols,etc.. are present in > the record. > > Does anyone have some ideas on what could be causing the character encoding > problems from these particular targets? Or run into this at their own site? > > - dgo.conf has <charset>marc-8</charset>. changing that to usmarc, utf8 has > had no effect > - xml2marc-yaz.cfg is setup like described in > https://wiki.evergreen-ils.org/doku.php?id=evergreen-admin:sru_and_z39.50 > <https://wiki.evergreen-ils.org/doku.php?id=evergreen-admin:sru_and_z39.50> > changing the charset options hasn’t had any effect either > > The reason this doesn't change anything is that it's only used to describe > how Evergreen will server records to /others/ as a z39.50 server. Those are > not client settings. > > - the encoding/translation problems do not happen with OCLC and Library of > Congress targets, it seems to mainly affect servers with the INNOPAC db type. > I’m not sure if that’s related. > > > This and the log message below are the smoking guns. OCLC and LoC are > generally very good about making sure records really are in the character set > they advertise, and that that character set is one of only MARC-8 or UTF8. > > So, Jason nailed it -- there are non-UTF8, non-MARC-8 characters in those > records, as served by the INNOPAC sources. That's a (remote) cataloging > issue. > > HTH, > > --Mike > > Going through the logs I can see things like: > > open-ils.search.z3950.search_class: no mapping found for [0x80] at position > 56 in Kurt and Joe tangle with the most determined enemy they’ve ever > encountered when a ruthless powerbroker schemes to build a new Egyptian > empire as glorious as those of the Pharaohs. Part of his plan rests on the > manipulation of a newly discovered aquifer beneath the Sahara, but an even > more devastating weapon at his disposal may threaten the entire world: a > plant extract known as the black mist, discovered in the City of the Dead and > rumored to have the power to take life from the living and restore it to the > dead. With the balance of power in Africa and Europe on the verge of tipping, > Kurt, Joe, and the rest of the NUMA team will have to fight to discover the > truth behind the legends—but to do that, they have to confront in person > the greatest legend of them all: Osiris, the ruler of the Egyptian > underworld. g0=ASCII_DEFAULT g1=EXTENDED_LATIN at > /usr/share/perl5/MARC/Charset.pm line 308. > > So I’m thinking something is happening in the MARC8 to UTF8 conversion? > > Attaching a screenshot of what it looks like in the Z39.50 Import screen. The > 264s have been the most obvious place to see the issue, but it happens in any > field with special characters. > > Been banging my head trying to figure out what’s causing this. Any help would > be appreciated! > > Thank you, > > -Brent > > <bad264.jpg> > ----------------------------- > > Brent Mills > Systems Librarian | Sage Library System > > email: br...@hoodriverlibrary.org <mailto:br...@hoodriverlibrary.org> > tickets: https://sagelib.org/support <https://sagelib.org/support> > >