I'm trying to import a number of bib records with "special" characters
in the MARC fields. I've gotten as far running direct_ingest.pl but
I'm noticing that biblio_fingerprint.js chokes on a few of them.

Looking a bit closer, I noticed that biblio_fingerprint.js chops
character codes down to two least significant hex digits. For example,
biblio_fingerprint.js turns "Č" (Č) into "&#x0c" ("form feed"),
which causes the direct_ingest.pl to skip the record and output the
following error:

  "Couldn't process record: invalid character encountered while
parsing JSON string"

Attached is a sample record that causes this problem for me (the
tarball includes both the original MARCXML and the BRE file generated
from it by marc2bre.pl). Any help would be appreciated! I can open a
bug on Launchpad, too, if needed.

Cheers,
 Warren

Attachment: iii_prob_record.tar.gz
Description: GNU Zip compressed data

Reply via email to