Hello,

I’ve recently noticed some issues with imported MARC records from a specific 
set of Z39.50 servers.

A noticeable amount of records that are imported through Prospector/MaineCat 
targets have mangled characters when diacritics, symbols,etc.. are present in 
the record.

Does anyone have some ideas on what could be causing the character encoding 
problems from these particular targets? Or run into this at their own site?

- dgo.conf has <charset>marc-8</charset>. changing that to usmarc, utf8 has had 
no effect
- xml2marc-yaz.cfg is setup like described in 
https://wiki.evergreen-ils.org/doku.php?id=evergreen-admin:sru_and_z39.50 
<https://wiki.evergreen-ils.org/doku.php?id=evergreen-admin:sru_and_z39.50> 
changing the charset options hasn’t had any effect either
- the encoding/translation problems do not happen with OCLC and Library of 
Congress targets, it seems to mainly affect servers with the INNOPAC db type. 
I’m not sure if that’s related.

Going through the logs I can see things like:

open-ils.search.z3950.search_class: no mapping found for [0x80] at position 56 
in Kurt and Joe tangle with the most determined enemy they’ve ever 
encountered when a ruthless powerbroker schemes to build a new Egyptian empire 
as glorious as those of the Pharaohs. Part of his plan rests on the 
manipulation of a newly discovered aquifer beneath the Sahara, but an even more 
devastating weapon at his disposal may threaten the entire world: a plant 
extract known as the black mist, discovered in the City of the Dead and rumored 
to have the power to take life from the living and restore it to the dead. With 
the balance of power in Africa and Europe on the verge of tipping, Kurt, Joe, 
and the rest of the NUMA team will have to fight to discover the truth behind 
the legends—but to do that, they have to confront in person the greatest 
legend of them all: Osiris, the ruler of the Egyptian underworld. 
g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/share/perl5/MARC/Charset.pm line 308.

So I’m thinking something is happening in the MARC8 to UTF8 conversion?

Attaching a screenshot of what it looks like in the Z39.50 Import screen. The 
264s have been the most obvious place to see the issue, but it happens in any 
field with special characters.

Been banging my head trying to figure out what’s causing this. Any help would 
be appreciated!

Thank you,

-Brent


-----------------------------

Brent Mills
Systems Librarian | Sage Library System

email: br...@hoodriverlibrary.org
tickets: https://sagelib.org/support

Reply via email to