[OPEN-ILS-DEV] direct_ingest.pl, biblio_fingerprint.js and Unicode chars

Warren Layton Fri, 27 Nov 2009 12:41:23 -0800

I'm trying to import a number of bib records with "special" characters
in the MARC fields. I've gotten as far running direct_ingest.pl but
I'm noticing that biblio_fingerprint.js chokes on a few of them.


Looking a bit closer, I noticed that biblio_fingerprint.js chops
character codes down to two least significant hex digits. For example,
biblio_fingerprint.js turns "&#x10C;" (Č) into "&#x0c" ("form feed"),
which causes the direct_ingest.pl to skip the record and output the
following error:

  "Couldn't process record: invalid character encountered while
parsing JSON string"

Attached is a sample record that causes this problem for me (the
tarball includes both the original MARCXML and the BRE file generated
from it by marc2bre.pl). Any help would be appreciated! I can open a
bug on Launchpad, too, if needed.

Cheers,
 Warren

iii_prob_record.tar.gz
Description: GNU Zip compressed data

[OPEN-ILS-DEV] direct_ingest.pl, biblio_fingerprint.js and Unicode chars

Reply via email to